Site Reliability Engineer

AI overview

Ensure highly available, performant, and scalable infrastructure while automating deployments and collaborating with product teams to enhance reliability and developer productivity.

About EngFlow

At EngFlow, we help developers save time by accelerating software builds and tests. Our cloud-based, distributed service optimizes developer workflows through remote execution and caching, improving efficiency, productivity, and product quality.

Backed by top investors, EngFlow is redefining how companies build software and ship well-tested products. Our solutions speed up builds by a factor of 10 or more, while our observability platform provides actionable insights for optimization. Founded by key contributors to Bazel, we build tools that empower engineering teams—from startups to Fortune 500 companies—to enhance developer velocity and improve build performance.

Learn more about our mission, culture, and team: EngFlow | Video

We’re looking for an experienced SRE to join our engineering team. You’ll be at the intersection of software engineering and systems operations — ensuring our distributed infrastructure is highly available, performant, and scalable while enabling our engineers to move quickly and confidently.

Key Responsibilities

  • Design, build, and maintain cloud infrastructure for our distributed build acceleration platform
  • Automate everything: from deployment pipelines to monitoring and recovery
  • Manage scalability and reliability for high-throughput, low-latency systems
  • Implement and maintain observability: logging, metrics, tracing, and alerting
  • Work closely with product and engineering teams to embed reliability into every feature
  • Diagnose and resolve production incidents quickly, and feed learnings back into systems design
  • Optimize cost, performance, and resilience across multi-cloud environments

Requirements

  • 4+ years in SRE, DevOps, or Production Engineering roles
  • Experience managing Kubernetes in production
  • Strong background in cloud infrastructure (GCP or AWS) and IaC (Terraform preferred)
  • Solid knowledge of networking, security, and distributed systems
  • Track record of improving system availability and developer productivity
  • A knack for debugging complex, cross-system issues under pressure

Benefits

We offer comprehensive medical, dental, vision benefits, 401k/pension, parental leave and generous vacation. The team is fully remote but we enjoy meeting together several times a year at exciting destinations throughout the world. We value getting the work done and having fun while doing it, and have done numerous fun team events such as chocolate, whisky, and tea tastings, monthly team games, escape the room, and other fun events.

Perks & Benefits Extracted with AI

  • Health Insurance: We offer comprehensive medical, dental, vision benefits
  • Team events and activities: we have done numerous fun team events such as chocolate, whisky, and tea tastings, monthly team games, escape the room, and other fun events.
  • Paid Parental Leave: parental leave
  • Paid Time Off: generous vacation
  • Remote-Friendly: The team is fully remote but we enjoy meeting together several times a year at exciting destinations throughout the world.

EngFlow is the build and test acceleration company created by core Bazel engineers and funded by Andreessen Horowitz. EngFlow’s secure (audited: SOC 2 type 2) remote execution, caching, and observability platform scales from 1 to 100,000+ cores, reduces time by 5-10x and cloud costs by 20-50%. The platform is compatible with variety of build systems, including Bazel, Buck v2, CMake, AOSP, Chromium. Whether deployed on your cloud or on EngFlow’s, our global Bazel and developer productivity experts provide 24x7 coverage, support small and large teams, no hidden costs, and SSO included. EngFlow products are used by engineers from startups to Fortune 500 companies to accelerate developer productivity and positively impact engineering culture. See this video to learn more about how and why we created EngFlow, our customers and platform capabilities: https://www.youtube.com/watch?v=TyPYZSp4nnE

View all jobs
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job