Site Reliability Engineer (SRE/ DevOps) - Engineering Productivity

Working in Engineering Productivity (EngProd), you will collaborate and work with other engineers to design, build, scale, and operate the systems that the rest of Arista’s development teams use.  The EngProd team uses industry-standard systems like Ansible, Jenkins, Kubernetes, Grafana, Spinnaker, MySQL, ElasticSearch, Google Cloud, and Varnish and also internal systems that we’ve built from the ground-up to automate CI/CD, testing, analysis, and visualization.

Responsibilities:

  • Keeping the production status green all the time

  • Proactively monitor, respond to, and enhance alerts

  • Build automated responses to the most common alerts or work with the rest of the EngProd team to build them

  • Create and maintain the incident response runbooks working with the service dev teams

  • Debug and resolve issues impacting developer user experience and infrastructure stability

  • Develop patterns to support system reliability and socialize them within the EngProd team

  • Review and contribute to the specifications and implementations written by other team members.

  • Work with Arista’s software engineers to identify bottlenecks and limitations in our workflows, tooling, and infrastructure and provide fixes for those problems.

  • Provide support for our tools and infrastructure to Arista’s development team.

  • At least BS Computer Science or Engineering +5 years’ experience, MS Computer Science or Engineering + 4 years’ experience, or Ph.D.  in Computer Science or equivalent work experience.

  • Knowledge of one or more of Go, Python, Javascript, Shell Scripting.

  • Knowledge of Linux (or UNIX).

  • Experience operating software systems at scale

  • Strong understanding of the fundamentals of storage and networking

  • Comfortable with Ansible and GitOps 

  • Applied understanding of software engineering principles.

  • Strong problem solving and software troubleshooting skills.

  • Ability to design a solution and implement features independently. Ability to work in small teams.

All your information will be kept confidential according to EEO guidelines.

Arista Networks is the leader in software driven networking solutions for today’s largest Data Center (DC), Cloud, Internet/WAN, Service Provider (SP) and Campus environments. Arista has over 7500 customers ranging from the largest cloud providers, to healthcare, government, carrier, finance, education, and production web/SaaS companies. Arista's products are the foundation underpinning much of modern society's operations.Arista has ambitious plans and an unprecedented opportunity for growth and we are looking for many more engineers and designers to join us in building and innovating the world's networks. Arista is a profitable, publicly quoted company with revenues of over $2B with a culture of invention, quality, respect, and fun.

View all jobs
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job

This job is no longer available

Enter your email address below to get notified whenever we find a similar job post.

Unsubscribe at any time.