Senior DevOps Engineer, ML Infrastructure

AI overview

Design and maintain a petabyte-scale data and ML platform to enhance our autonomous capabilities and support a collaborative engineering team.

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses.

The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles, Miami, Dallas, Atlanta and Chicago while doing commercial deliveries. We’re looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity.

Who We Are

We are tech industry veterans in software, hardware, and design who are pooling our skills to build the future we want to live in. We are solving real-world problems leveraging robotics, machine learning and computer vision, among other disciplines, with a mindful eye towards the end-to-end user experience. Our team is agile, diverse, and driven. We believe that the best way to solve complicated dynamic problems is collaboratively and respectfully.

As a Senior DevOps Engineer on the Machine Learning (ML) Infrastructure team, you will help design, build, and maintain our petabyte-scale data and ML platform that powers data partnerships, ML research, and autonomy engineering. You will play a key role in ensuring reliability, security, scalability, and performance across our internal systems, and maintain a suite of internal tools used by dozens of engineers. Your work will make a significant impact on our autonomous capabilities and act as a catalyst for the entire autonomy team, helping us train our next generation of ML models.

Responsibilities

  • Deploy and maintain our ML training orchestration system that operates across multiple platforms.

  • Manage cloud and on-premise environments for large-scale distributed data processing and ml training/inference systems.

  • Automate deployment pipelines, monitoring, and alerting for ML and data services.

  • Collaborate closely with data scientists, ML engineers, and autonomy teams to streamline experimentation and model deployment.

  • Maintain and improve CI/CD systems to support rapid development and testing.

  • Implement best practices for system security, reliability, and observability.

  • Optimize infrastructure costs and ensure efficient resource utilization.

  • Support internal developer productivity through tooling, documentation, and support.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent experience.

  • 5+ years of experience as a DevOps, SRE, or Infrastructure Engineer, preferably supporting ML or data-intensive systems.

  • Strong experience with cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes, Docker).

  • Proficiency in infrastructure-as-code tools such as Terraform or Helm.

  • Solid understanding of CI/CD systems (GitLab CI, Jenkins, ArgoCD, etc.).

  • Experience with Python and SQL

  • Experience with cloud security, IAM (Identity and Access Management), and access control

  • Experience analysing and optimizing hardware performance

  • Experience with GPU cluster management

What Makes You Stand Out

  • Experience managing large-scale distributed data processing systems.

  • Experience analysing and optimizing ml training workloads

  • Background in observability stacks (Prometheus, Grafana, ELK, OpenTelemetry).

  • Contributions to open-source DevOps or ML infrastructure projects.

* Please note: The base salary range listed in this job description reflects compensation for candidates based in the United States. While we prefer candidates located in the U.S, we are also open to qualified talent working remotely across:

Canada - Base salary range (Canada - all locations): $130k - 160k CAD

Salary
$155,000 – $195,000 per year
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Senior DevOps Engineer Q&A's
Report this job
Apply for this job