[AI Infra/Seoul] Staff, Back-end Engineer (AI Infra)

AI overview

Own the reliability of multi-region NVIDIA DGX cloud, ensuring sustained performance for large-language models and real-time inference across various cloud providers.

반드시 첨부된사내공모지원서 양식’을 작성 후 제출하여 주시기 바랍니다.  

Please complete the attached Internal Transfer Request Form and submit.  

반드시 쿠팡 이메일 계정으로 지원해 주시기 바랍니다.  

Please make sure to apply with your Coupang e-mail address 

 

At Coupang we are building the future of commerce. Born out of an obsession to make shopping, eating, and living easier than ever, we’re collectively disrupting the multi-billion-dollar commerce industry from the ground up. We exist to wow our customers. We know we’re doing the right thing when we hear our customers say, “How did we ever live without Coupang?” We are one of the fastest-growing commerce companies that established an unparalleled reputation for being a dominant and reliable force in South Korean commerce.

We are proud to have the best of both worlds — a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been since our inception. We are all entrepreneurial surrounded by opportunities to drive new initiatives and innovations. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company grow every day.

Our mission to build the future of commerce is real. We push the boundaries of what’s possible to solve problems and break traditional tradeoffs. Join Coupang now to create an epic experience in this always-on, high-tech, and hyper-connected world.

 

About the Role:

You will own the day-to-day reliability of our multi-region NVIDIA DGX cloud. Your charter: keep every host, hypervisor and Kubernetes node battle-hardened so that large-language-model training runs for weeks without a hiccup and real-time inference always returns in milliseconds—whether the workload lives on-prem or bursts to one of several public-cloud providers. 

 

Key Responsibilities:

  • Host & firmware hardening — flash, validate and auto-baseline BIOS, BMC, network-interface and GPU firmware for DGX H100/H200 nodes. 
  • Virtualisation & container runtime — run KVM or ESXi at scale, expose VMs to Kubernetes via KubeVirt/Kata Containers, and tune vGPU passthrough, SR-IOV and NUMA pinning for maximum GPU utilisation. 
  • Kubernetes SRE — upgrade clusters with zero guest interruption, manage etcd quorum, tune kube-scheduler for GPU topology-aware placement, and operate service meshes (Istio or Ambient Cilium) for gRPC-heavy AI micro-services. 
  • High-speed networks — design and troubleshoot 200/400 Gb InfiniBand or RoCE v2 fabrics; enforce network policies with Cilium eBPF and optimise RDMA flows for multi-tenant isolation. 
  • Data-resilience flows — implement Velero- or Restic-based backup, cross-AZ snapshot orchestration and quarterly disaster-recovery drills covering control-plane, metadata and model artefacts. 
  • Automation first — write Go or Python to drive Terraform, Ansible and Argo CD pipelines; integrate with internal provisioning tool “Void” for end-to-end, push-button node builds. 
  • Operational leadership — rotate on high-severity incident duty, publish RCA documents within 72 hours and mentor L5 engineers in Kubernetes, GPU and RDMA debugging.

 

Qualifications:

  • 8 + years of production Linux, networking and virtualisation. 
  • Active CKA and CKS (or equivalent open-source contributions proving the same depth). 
  • At least one year running NVIDIA DGX or comparable GPU clusters at ≥ 1 PFLOP scale. 
  • Deep KVM or ESXi expertise including vMotion/live-migration, SR-IOV NICs and vGPU scheduling. 
  • Hands-on InfiniBand/RDMA troubleshooting with perfquery, ibstat, nvidia-smi nvlink topology, TCPDump on RDMA (diag mode). 
  • Professional-level cloud networking or architect certification (AWS Advanced Networking Specialty, Azure Network Engineer Expert, Google PCNE, etc.). 

 

Recruitment Process and Others

Recruitment Process

 

  • Application Review - Phone Interview - Onsite (or Virtual Onsite) Interview – Offer
  • The exact nature of the recruitment process may vary according to the specific job and may be changed due to scheduling or other circumstances.
  • Interview schedules and the results will be informed to the applicant via the e-mail address submitted at the application stage.

Details to Consider

  • This job posting may be closed prior to the stated end date for application if all openings are filled.
  • Coupang has the right to rescind an offer of employment if a candidate is found to have submitted false information as part of the application process.
  • Those eligible for employment protection (recipients of veteran’s benefits, the disabled, etc.) may receive preferential treatment for employment in accordance with applicable laws.
  • Hiring may be restricted in case the legal qualifications required for hiring and work performance is not met.
  • This is a full-time regular position and includes 12 weeks of probation period; provided, however, the probationary period may be either skipped, shortened or extended if necessary for business purposes.

Privacy Notice

  • Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice is located below.

https://privacy.coupang.com/en/land/jobs/

Document Return Policy (This notice MUST be included in a job posting in Korea only to comply with the Fair Hiring Procedure Act.)

  • This notification is given pursuant to Article 11 (6) of the Fair Hiring Procedure Act.
  • A job applicant, who has applied but not been finally selected for a position at Coupang (the “Company”), may request the Company to return his/her hiring documents submitted pursuant to the Fair Hiring Procedure Act. However, this will not apply where the hiring documents were submitted via the website of the Company or e-mail, or where the job applicant submitted those documents voluntarily without a request from the Company. In addition, if the hiring documents were destroyed due to a natural disaster or any other reasons not attributable to the Company, such documents will be deemed to have been returned to the job applicant.
  • A job applicant who wishes to request the return of his/her hiring documents pursuant to the main sentence of paragraph 2 above should fill out a “Request for Return of Hiring Documents” [Annex Form No. 3 in the Enforcement Rule of the Fair Hiring Procedure Act] and submit It by email ([email protected]). In such case, within fourteen (14) days from the date of identifying the receipt of the request, the Company will send the hiring documents to the job applicant’s designated address via registered mail. Please be informed that the job applicant is required to pay the postage on the registered mail.
  • In preparation for a job applicant’s request for the return of hiring documents pursuant to the main sentence of paragraph 2 above, the Company shall retain the original hiring documents submitted by the job applicant for 180 days from the completion of the recruiting process. If no request is made until the end of this period, all his/her hiring documents will be destroyed immediately in accordance with the Personal Information Protection Act.
  • The above paragraphs 1 - 4 shall only apply when the labor-related laws of Korea govern the application. They are otherwise not applicable.

 

Equal Opportunities for All

Coupang is an equal opportunity employer. Our unprecedented success could not be possible without the valuable inputs of our globally diverse team.

Coupang is a leading e-commerce company disrupting the industry with innovative services and a customer-centric approach.

View all jobs
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Backend Engineer Q&A's
Report this job
Apply for this job