Senior Site Reliability Engineer

TLDR

Drive engineering efforts to enhance reliability, security, and automation in a Kubernetes-based cloud platform supporting Diagrid's core products.

Who We Are:

We believe that open-source software, open standards and APIs are the greatest transformational tools for organizations in the modern software development era.

Our mission at Diagrid is to provide developers with APIs and tools that help them focus on their code and not on infrastructure. As online digital services need to handle more load and run on multiple clouds or on-premises environments, programming to higher level abstractions delivers consistent, secure and reliable code that is easily portable and helps organizations de-risk their projects.

Diagrid is founded by the creators of the Dapr and KEDA open-source projects, and who led the user experience, design and development of hyper-scale infrastructure cloud services, serverless platforms and open-source projects at Microsoft. Dapr has now achieved graduated status within the CNCF, joining renowned projects like Kubernetes, Prometheus, and Istio.

We are backed by top VC firms and supported with industry leading investors and advisors including Joe Beda (Kubernetes Co-Founder, former Co-Founder / CTO at Heptio) Matt Klein (Creator of Envoy, Lyft), Mark Russinovich (CTO, Microsoft Azure), William Morgan (Creator of Linkerd, CEO Buoyant), Seth Vargo (Senior Staff Engineer, Google), Adam Gross (Former CEO, Heroku), Sri Viswanath (Former CTO, Atlassian), Adam Frankl (Dev Marketing Expert, Neo4J & JFrog), and Roopak Venkatakrishnan (Head of platforms, Bolt).

About the role:

As a Senior Site Reliability Engineer, you will be part of a team that is driving engineering efforts required to provide reliability, security, automation, and lifecycle management to a state-of-the-art Kubernetes-based cloud platform with a managed state store and message broker.

This role is crucial in providing business continuity for our users and upholding SLAs and SLOs for Diagrid's core products. You will work in a multi-cloud environment, and knowledge of GCP, AWS, or Azure is a must.

You're a good fit if you live and breathe Kubernetes, have experience with an open-source tech stack, can write code to complement industry standard tools, and have experience operating cloud services at scale.

Responsibilities:

  • Build and operate cutting-edge cloud infrastructure to support Diagrid's core products

  • Define standards, deliver tools, processes, and frameworks to make our products secure, reliable, efficient, and highly available

  • Build and maintain CI/CD pipelines that enable delivering software quickly and securely across clouds

  • Continuously optimize our services and cloud infrastructure for performance and availability

  • Design and document operational knowledge and best practices

  • Be part of the on-call rotation and lead via example



Qualifications:

  • 8+ years of experience provisioning and managing cloud resources on Google Cloud, AWS or Azure, preferably multi-cloud.

  • Experience building processes and using industry standard tools for managing applications on Kubernetes, preferably Terraform

  • Experience setting up and operating stateful software on Kubernetes, preferably Kafka, Redis, MySQL, and MongoDB

  • Comprehensive knowledge of Kubernetes best practices for cluster management, security, troubleshooting, and ongoing operations with failover, backup & restore

  • Ability to debug issues in Kubernetes clusters and complex distributed applications

  • Experience developing and supporting CI/CD production processes

  • Experience with Git-based version control systems

  • Experience with scripting and programming, preferably bash, Python, and Go

  • Bonus: experience operating Postgres or Kafka at scale in production

  • Bonus: experience with multi-tenant services

Diagrid Benefits:

  • Competitive compensation

  • Company equity

  • Remote first & flexible work environment

  • Flexible paid time off

  • Comprehensive healthcare for you and your dependents

  • Choice of hardware

  • $1000 for home office setup

  • Monthly WFH stipend

  • Team events & gatherings

  • Chance to collaborate with industry-leading figures 

Diagrid, Inc. is an Equal Opportunity Employer. We do not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit, and business need. We embrace and celebrate differences and diversity.

Benefits

Flexible Work Hours

Remote first & flexible work environment

Health Insurance

Comprehensive healthcare for you and your dependents

Home Office Stipend

$1000 for home office setup

Team events

Team events & gatherings

Paid Time Off

Flexible paid time off

Diagrid builds powerful APIs and tools designed specifically for developers, allowing them to streamline their focus on coding while leveraging open-source software and cloud services. Our platform empowers tech teams to optimize their workflow and enhance productivity, making it a vital resource in the developer community.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Senior Site Reliability Engineer Q&A's
Report this job
Apply for this job