Engage directly with strategic customers to implement and scale the Haystack Enterprise Platform, focusing on systems reliability and operational excellence.
Design & Land
Own technical outcomes from POC → production: integrations, data connectors, workflows, and infra-as-code (Kubernetes/Terraform/Helm).
Produce reference architectures and reusable templates; upstream patterns to Product to reduce future “custom” work.
Unblock enterprise environments: identity (OIDC/SAML), networking, storage, GPU scheduling, observability hooks.
Run & Harden
Define SLOs/Error Budgets with customers; implement end-to-end observability (logs/metrics/traces) and dashboards.
Create runbooks/playbooks; lead L3 incident response and RCAs; drive reliability roadmaps to closure.
Plan/execute upgrades and security patches in change windows; ensure rollback and post-upgrade verification.
Be an active member of the on-call rotation to make sure we deliver excellent customer experience
Partner & Enable
Train customer teams on operations and emergency procedures; hand off cleanly to Support/CSM.
Prioritize reliability and “productization” backlog with Product/Engineering based on field signal.
Document clearly: setup guides, diagrams, SLOs, testing/DR procedures, and “golden path” standards.
Hands on experience in programming language in Python (needed for improvements, bug fixing and small feature builds)
7+ years across SRE/Platform/Solutions/FDE, with evidence of shipping customer-facing builds and operating production systems.
Strong with Kubernetes, containers, Linux, IaC (Terraform/Helm), CI/CD, networking (TLS, DNS, ingress/LB), backup/restore.
Observability stacks (Prometheus/Grafana/OpenTelemetry/ELK); scripting (Python/Bash).
Enterprise integration experience (SSO, secrets, compliance); confident communicator with execs and engineers under time pressure.
Must be resident of the European Union with an EU Passport
Flexible Work Hours
Remote-first setup with flexible hours & tech of your choice
Learning Budget
Annual learning & development budget
Other Benefit
Dog-friendly Berlin HQ
Paid Time Off
30 days vacation + extra days for family sick leave
Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!
Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.
Forward Deployed Engineer Q&A's