Senior Datacenter Systems Architect

Hillsboro , United States

AI overview

Drive the architecture and optimization of high-density HPC/GPU datacenter systems, ensuring reliability and automation while mentoring junior engineers.

We are seeking a high-level UNIX/Linux Systems Engineer to architect, own, and operate our next-generation, on-premise private cloud and GPU compute infrastructure supporting global engineering teams in Hillsboro, OR. This is a strategic, deep-technical systems architect position responsible for designing, scaling, and optimizing a world-class HPC/GPU datacenter environment.

 

You will drive end-to-end systems design, oversee compute, network, and storage architecture, and take full ownership of reliability, automation, performance, and deep multi-layer troubleshooting across thousands of nodes running bare-metal and virtualized workloads.

 

What You’ll Lead

  • Architect, scale, and optimize complex UNIX/Linux-based compute clusters, GPU farms, and high-density datacenter systems.
  • Own the design and strategy for on-prem HPC/GPU compute environments including OS architecture, distributed storage, network tuning, and interconnects.
  • Perform deep-dive troubleshooting across all layers — kernel, network stack, RPC/NFS, storage protocols, firmware, drivers, bootloaders, and orchestration systems.
  • Lead automation efforts using Python, Bash, Ansible, and IaC to eliminate manual processes and improve system reliability.
  • Drive configuration standards for compute, network, and storage layers across bare-metal systems.
  • Collaborate with architects, system software teams, networking teams, and hardware engineering to ensure platform scalability.
  • Own operational excellence: uptime, performance tuning, incident response processes, and long-term platform strategy.
  • Mentor and technically lead junior engineers and datacenter technicians.

 

What We Need to See

  • 8–15+ years in UNIX/Linux systems engineering, system administration, or HPC/compute infrastructure roles.
  • Expert-level knowledge of Linux internals (kernel, storage subsystems, networking stack, groups, system, NUMA, etc.).
  • Proven experience architecting and running large-scale compute clusters or farms (HPC, HCI, GPU clusters, or bare-metal automation environments).
  • Deep understanding of compute, network, and storage architectures end-to-end.
  • Demonstrated skill in root-cause analysis at multiple layers, including:
    • NFSv3/v4 deep troubleshooting
    • Packet-level analysis
    • Kernel performance tuning
    • Distributed storage (NetApp, Ceph, Lustre, BeeGFS, etc.)
  • Strong networking fundamentals: TCP/IP, VLANs, BGP, LACP, RoCE/RDMA, NIC offloading.
  • Strong automation skills: Python, Bash, Ansible, Terraform, or IaC tools.
  • Experience with PXE provisioning, Kickstart, bare-metal deployments, and OS image pipelines.
  • Certifications strongly preferred:
    • UNIX/Linux certs (RHCE, RHCSA, Linux Foundation)
    • Networking certs (CCNP, CCIE, JNCIP, etc.)
    • Storage certs (NetApp NCIE/NCDA or similar)

 

What Makes You Stand Out

  • Experience designing GPU clusters or accelerator-dense environments.
  • Deep experience with distributed filesystems, block storage tuning, or NFS debugging.
  • Strong background in systems and platform performance engineering.
  • Ability to continuously evaluate emerging technologies and build long-term architectural recommendations.
  • Experience leading and mentoring infrastructure teams.

 

 

Sustainable Talent is a M/F+, disabled, and veteran equal employment opportunity and affirmative action employer.

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Systems Architect Q&A's
Report this job
Apply for this job