Associate Staff Engineer, Devops

AI overview

Design and implement CI/CD pipelines while maintaining Kubernetes clusters and AWS/Azure environments for high-performance workloads, optimizing for cloud-native and AI infrastructures.

Requirement:

  • Experience: 5+ years
  • Strong experience in DevOps or Site Reliability Engineering (SRE) roles.
  • Strong knowledge of Docker, Kubernetes, Terraform, and CI/CD pipelines.
  • Hands-on experience with AWS, Azure, or other cloud platforms.
  • Familiarity with GPU infrastructure and ML workloads is a plus.
  • Good understanding of monitoring and logging systems (Prometheus, Grafana).
  • Ability to collaborate with ML teams for optimized inference and deployment.
  • Strong troubleshooting and problem-solving skills in high-scale environments.
  • Knowledge of infrastructure security best practices, cost optimization, and performance tuning.
  • Exposure to vector databases and AI/ML deployment pipelines is highly desirable.

Responsibilities:

  • Maintain and manage Kubernetes clusters, AWS/Azure environments, and GPU infrastructure for high-performance workloads.
  • Design and implement CI/CD pipelines for seamless deployments and faster release cycles.
  • Set up and maintain monitoring and logging systems using Prometheus and Grafana to ensure system health and reliability.
  • Support vector database scaling and model deployment for AI/ML workloads.
  • Collaborate with ML engineering teams to optimize inference performance and resource utilization.
  • Ensure high availability, security, and scalability of infrastructure across multiple environments.
  • Automate infrastructure provisioning and configuration using Terraform and other IaC tools.
  • Troubleshoot production issues and implement proactive measures to prevent downtime.
  • Continuously improve deployment processes and infrastructure reliability through automation and best practices.
  • Participate in architecture reviews, capacity planning, and disaster recovery strategies.
  • Drive cost optimization initiatives for cloud resources and GPU utilization.
  • Stay updated with emerging technologies in cloud-native, AI infrastructure, and DevOps automation.

Bachelor’s or master’s degree in computer science, Information Technology, or a related field

👋🏼 We're Nagarro.We are a digital product engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale — across all devices and digital mediums, and our people exist everywhere in the world (19,500+ experts across 36 countries, to be exact). Our work culture is dynamic and non-hierarchical. We're looking for great new colleagues. That's where you come in!By this point in your career, it is not just about the tech you know or how well you can code. It is about what more you want to do with that knowledge. Can you help your teammates proceed in the right direction? Can you tackle the challenges our clients face while always looking to take our solutions one step further to succeed at an even higher level? Yes? You may be ready to join us.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Staff Engineer Q&A's
Report this job
Apply for this job