Staff Engineer DevOps, Agentic AI

AI overview

Design, provision, and manage scalable cloud infrastructure for the Agentic AI platform while ensuring secure and efficient operations in a collaborative environment.

About Netskope

Today, there's more data and users outside the enterprise than inside, causing the network perimeter as we know it to dissolve. We realized a new perimeter was needed, one that is built in the cloud and follows and protects data wherever it goes, so we started Netskope to redefine Cloud, Network and Data Security. 

Since 2012, we have built the market-leading cloud security company and an award-winning culture powered by hundreds of employees spread across offices in Santa Clara, St. Louis, Bangalore, London, Paris, Melbourne, Taipei, and Tokyo. Our core values are openness, honesty, and transparency, and we purposely developed our open desk layouts and large meeting spaces to support and promote partnerships, collaboration, and teamwork. From catered lunches and office celebrations to employee recognition events and social professional groups such as the Awesome Women of Netskope (AWON), we strive to keep work fun, supportive and interactive.  Visit us at Netskope Careers. Please follow us on LinkedIn and Twitter@Netskope.

About the role: 

Please note, this team is hiring across all levels and candidates are individually assessed and appropriately leveled based upon their skills and experience.

As a DevOps Engineer, you will be critical to designing, provisioning, and managing scalable cloud infrastructure and environments for our Agenti AI platform. You will collaborate closely with application teams to build robust CI/CD pipelines, ensure reliable deployments, and maintain highly available Kubernetes clusters. Your expertise will extend to Infrastructure as Code (IaC), observability, cluster scaling, and release management across multiple environments. You will ensure production environments are secure, scalable, and efficiently managed while continuously improving automation and operational excellence.

What’s in it for you

You will be critical to deploying and managing core infrastructure and platform systems that power our products. This means you won't just maintain existing systems; you will be building and standardizing foundational environments using Infrastructure as Code. Your role is crucial in enabling engineering teams to ship reliably and at scale. If you thrive on solving complex distributed systems challenges, improving deployment velocity, and operating large-scale Kubernetes clusters, this is the environment for you.

What you will be doing

  • Work closely with the engineering team, AI/ML engineers to design and architect scalable, secure cloud environments for Agentic Applications using Infrastructure as Code (Terraform).
  • Design, implement, and manage CI/CD pipelines to ensure safe, repeatable, and reliable deployments across environments.
  • Manage and improve release processes including versioning, rollback strategies, blue/green and canary deployments.
  • Provision and manage Kubernetes clusters across multiple environments, ensuring high availability and scalability.
  • Implement auto-scaling strategies for infrastructure and workloads to optimize performance and cost.
  • Set up and manage monitoring, logging, and alerting systems for infrastructure and application workloads.
  • Operate and oversee large Kubernetes clusters supporting production workloads.
  • Improve reliability, quality, and time-to-market of our software delivery lifecycle.
  • Measure and optimize system performance, proactively identifying bottlenecks and implementing improvements.
  • Provide primary operational support and engineering for multiple large-scale distributed systems and cloud environments.
  • Operate and oversee large Kubernetes clusters with GPU workloads.

 

Required skills and experience

  • 8+ years of professional experience building and operating core infrastructure systems.
  • Strong hands-on experience with Infrastructure as Code tools such as Terraform.
  • Deep experience with Kubernetes and container orchestration at scale.
  • Experience with major cloud providers (AWS, Google Cloud, or Azure).
  • Experience designing and managing CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or similar).
  • Strong scripting skills using languages like Python or Bash, and experience with Git and GitHub workflows.
  • Experience implementing monitoring and observability solutions using tools such as Prometheus, Grafana, or similar.
  • Proven track record of building and operating scalable, reliable, and secure production systems.
  • Strong troubleshooting skills across distributed systems and cloud-native architectures.
  • Proactive attitude in identifying reliability risks, performance bottlenecks, and automation opportunities.
  • Comfortable working with ambiguity and rapid change in a dynamic environment.
  • Familiarity with LLM development, deployment, and optimization techniques 
  • Familiarity with high-performance, large-scale ML systems and their unique infrastructure needs.

Education

  • BSCS or equivalent required, MSCS or equivalent strongly preferred

#LI-SM1

Netskope is committed to implementing equal employment opportunities for all employees and applicants for employment. Netskope does not discriminate in employment opportunities or practices based on religion, race, color, sex, marital or veteran statues, age, national origin, ancestry, physical or mental disability, medical condition, sexual orientation, gender identity/expression, genetic information, pregnancy (including childbirth, lactation and related medical conditions), or any other characteristic protected by the laws or regulations of any jurisdiction in which we operate.

Netskope respects your privacy and is committed to protecting the personal information you share with us, please refer to Netskope's Privacy Policy for more details.

The application window for this position is expected to close within 50 days. You may apply by filling out the below information, or visiting our Netskope Careers site.

Netskope, a global cybersecurity leader, is redefining cloud, data, and network security to help organizations apply zero trust principles to protect data.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Staff Engineer Q&A's
Report this job
Apply for this job