Arista Networks is hiring a

Kubernetes Systems Engineer, EngProd

Full-Time
Remote

Arista Networks is looking for a skilled professional for our Engineering Productivity team to help maintain and support our rapidly expanding infrastructure and internal user base. The ideal candidate is someone who can wear many hats, can be versatile and is enthusiastic about learning new technologies. As a part of the software engineering team, you will work with other team members to design, build and administer secure, scalable and fault-tolerant tools and infrastructure in a hybrid cloud environment.

Working in the Engineering Productivity (EngProd) group, you will collaborate and work with other engineers to design, build, scale, and operate the systems that the rest of Arista’s development teams use.  The EngProd team uses industry-standard systems like Ansible, Jenkins, Kubernetes, Grafana, Spinnaker, MySQL, ElasticSearch, Google Cloud, and Varnish and also internal systems that we’ve built from the ground-up to automate CI/CD, testing, analysis, and visualization.

Responsibilities:

  • Work with existing k8s admin team to own different aspects of managing a production k8s cluster (eg: upgrades, monitoring, capacity planning, security, developer experience etc)
  • Proactively monitor, respond to, and enhance alerts and set up automated alert handling where applicable
  • Create and maintain the incident response runbooks working with the service dev teams
  • Debug and resolve issues impacting developer user experience and infrastructure stability around the k8s platform
  • Adopt current best practices in k8s cluster management. Evaluate and adopt OSS projects that simplify k8s cluster management. 
  • Set up guidelines and paved paths for service dev teams improving developer experience around the k8s platform.
  • Work with Arista’s software engineers to identify bottlenecks and limitations in our workflows, tooling, and infrastructure around k8s and provide fixes for those problems.
  • Engage with 3rd party vendor support as part of triage
  • At least BSc Computer Science or Engineering + 3 years’ experience, MS Computer Science or Engineering + 2 years’ experience, or Ph.D. in Computer Science or equivalent work experience.
  • Knowledge of one or more of Go, Python, Javascript. Experience with shell Scripting to be able to implement medium complexity automation workflows.
  • Knowledge of Linux (or UNIX).
  • Experience in operating software systems at scale.
  • Strong understanding of the fundamentals of storage and networking.
  • Comfortable with Ansible and GitOps.
  • Strong expertise with managing on-prem/baremetal Kubernetes clusters.
  • Applied understanding of software engineering principles.
  • Strong problem solving and software troubleshooting skills.
  • Ability to design a solution and implement features independently. Ability to work in small teams.
  • Comfortable with security principles and able to study source code of OSS projects, conduct experiments as necessary to debug issues.
  • Proven expertise with debugging complex issues that span the technology stack.
  • Experience dealing with network proxies and containerized storage.

All your information will be kept confidential according to EEO guidelines.

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Systems Engineer Q&A's
Report this job
Apply for this job