Site Reliability Engineer

AI overview

This role involves driving production infrastructure operations and ensuring compliance while collaborating with strategic customers to enhance reliability and deployment safety.

About Cinder

Cinder provides a cutting-edge platform to protect the internet. Cinder safeguards the interaction layer: the front end of products that users engage — and sometimes abuse. Our AI agents, integrated workflow platform, and deep expertise power real-time integrity at the speed of innovation.

We support some of the most important and innovative companies in the world, including OpenAI, Midjourney, and ElevenLabs. Cinder is backed by Accel and Y Combinator.

We care deeply about being relentless, intentional, and empathetic. We hire gritty thinkers who set ego aside, aggressively solve customer problems, and get better every day. We’re building an enterprise-grade platform to make the internet safer and we need highly curious, hard-working self-starters.

Cinder is seeking an experienced Site Reliability Engineer to help maintain and evolve our robust infrastructure.

What you’ll do

  • Drive our production infrastructure from Day 0 through Day 2 operations, continuously evolving it into a highly reliable platform.

  • Build proactive deployment observability and guardrails that surface risk early, prevent outages, and continuously raise the bar on release safety.

  • Architect and oversee compliance-by-design infrastructure, ensuring SOC 2, HIPAA, and GDPR requirements are embedded into systems.

  • Partner across data, AI, and product to unblock work, improve velocity, and raise operational standards.

  • Partner directly with our most strategic customers to diagnose issues, guide deployments, and shape infrastructure decisions that matter in production.

  • Contribute to on-call coverage while serving as the go-to escalation point for complex deployment issues.

You might be a good fit if you have:

  • You're fluent in cloud infrastructure, infrastructure as code (e.g. Terraform), container orchestration, and containerization technologies

  • You're comfortable working with diverse technologies (e.g., AWS, PostgreSQL, OpenSearch, Kafka) and excited to learn new systems as our infrastructure evolves

  • You bring a strong bias toward repeatable, secure deployment processes and operational excellence

  • You take pride in building software that is fast, reliable, and scalable

  • You love to stay on top of the latest trends with deployment technologies

  • Curious, systematic, and execution-oriented—you don't wait for perfect requirements and can navigate technical tradeoffs independently

  • Experience operating production infrastructure in environments with increasing scale, complexity, and reliability demands

Why join us?

This is a pivotal moment to join Cinder. As a fast-growing startup building critical AI software for some of the world's largest, most highly used, and most visible companies - your impact will be immediate and substantial. We are committed to fostering a vibrant, collaborative culture with a wonderful office, frequent in-person time to connect and innovate, competitive salary, generous benefits, and a 401k match, ensuring you are supported as you tackle defining industry challenges.

Salary
$180,000 – $240,000 per year
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job