deepset
deepset

Site Reliability Engineer

TLDR

Own and evolve infrastructure across SaaS, private cloud, and on-prem environments to deliver a production-grade platform while driving CI/CD and GitOps maturity.

TL;DR

We're hiring a Site Reliability Engineer to own and evolve deepset's cloud and customer infrastructure end to end. You'll work across SaaS, private cloud, and on-prem environments to make our self-hosted platform production-ready, drive CI/CD and GitOps maturity, and reduce complexity at scale. Your work will directly shape how deepset's AI platform is built, deployed, and scaled for our own cloud and for customers running it in their own environments.

Why deepset

At deepset, we’re on a mission to make custom AI solutions accessible to every organization. With Haystack, thousands of developers build advanced LLM applications every day, while our enterprise-ready AI Platform helps companies turn large language models into business value. We’re remote-first, flexible, and built on a culture of trust and ownership. You’ll collaborate with top-tier tech talent, tackle meaningful challenges, and help transform complex AI into solutions that are simple, powerful, and ready for the real world.

What you will do

You won’t just “keep things running” - you’ll help define how our platform is built, deployed, and scaled across cloud and customer environments.
  • Build and operate real-world infrastructureDesign, configure, and evolve infrastructure that runs both in our cloud and inside customer environments (SaaS, private cloud, on-prem).
  • Make self-hosted production-readyHelp us deliver a production-grade, self-hosted platform that can be deployed on any Kubernetes setup in weeks - not months.
  • Drive automation & platform maturityImprove CI/CD pipelines, GitHub workflows, and GitOps setups so teams can ship faster with confidence.
  • Reduce complexity and costContinuously simplify systems and optimize infrastructure spend without compromising performance or reliability.
  • Shape how we buildChampion best practices in reliability, scalability, and security across the organization, not as rules, but as working systems.

Requirements

  • 2-5 years of experience working with large-scale production infrastructure
  • Fluent German language skills
  • Experience with distributed or service-oriented architectures
  • Hands-on expertise with:
    • AWS
    • Kubernetes
    • CI/CD and GitOps (e.g. ArgoCD)
  • Working knowledge of Infrastructure as Code (Terraform preferred)
  • Solid troubleshooting skills - you can debug across systems, not just within one layer
  • A pragmatic mindset: you balance speed, simplicity, and reliability
  • Ownership and accountability - you take responsibility for systems end-to-end
  • Ability to work independently while staying aligned with the team’s goals

Nice to have

  • Familiarity with observability stacks (e.g. Datadog, Prometheus)
  • Experience optimizing cloud costs at scale
  • Interest or experience in Machine Learning / LLM systems
  • Experience improving developer experience and platform tooling using AI agents
  • Contributions to SRE practices like postmortems, SLIs/SLOs, and reliability engineering culture

Benefits

  • Remote-first setup with flexible hours & tech of your choice
  • 30 days vacation + extra days for family sick leave
  • Competitive salary & stock options for every team member
  • Monthly sports & mental health support allowance with Oliva
  • Annual learning & development budget
  • Monthly team socials & in-person meetups
  • Dog-friendly Berlin HQ

Benefits

Flexible Work Hours

Remote-first setup with flexible hours & tech of your choice

Health Insurance

Monthly sports & mental health support allowance with Oliva

Learning Budget

Annual learning & development budget

Dog-friendly office

Dog-friendly Berlin HQ

Paid Time Off

30 days vacation + extra days for family sick leave

Stock Options

Competitive salary & stock options for every team member

deepset builds the Haystack framework and the deepset Cloud platform, enabling businesses to create and deploy advanced NLP and LLM applications. Our solutions make it easy for organizations of all sizes to harness the power of large language models, turning complex AI capabilities into real business impact. We empower developers by providing the tools they need to innovate and integrate AI solutions seamlessly.

Founded
Founded 2018
Employees
11-50 employees
Industry
Internet Software & Services
Total raised
$45M raised
View company profile
Report this job
Apply for this job