About the Role
We're looking for a hands-on DevOps / Infrastructure Engineer who lives and breathes observability. This isn't a role where you'll be drawing architecture diagrams from the sidelines—you'll be deep in the trenches, building and operating the systems that keep our platform running smoothly. If you get a kick out of hunting down a tricky performance issue at 2am or feel genuine satisfaction when a well-crafted dashboard lights up with meaningful metrics, we want to talk to you.
You'll own our monitoring, logging, and tracing stack end-to-end—from instrumenting applications to building alerting strategies that actually work. You're someone who believes that if it's not observable, it's not in production.
What You'll Do
- Design, build, and maintain our observability platform—metrics, logs, traces, and everything in between
- Get hands-on with infrastructure: deploy services, troubleshoot incidents, and fix things when they break (because they will)
- Instrument applications and services to capture meaningful telemetry data that drives real insights
- Build dashboards and alerting systems that teams actually use—not just noise generators
- Dive into production issues, correlate data across systems, and lead root cause analysis
- Champion observability best practices across engineering teams and help developers instrument their own code
- Automate everything you can: infrastructure provisioning, deployment pipelines, and operational runbooks
- Work closely with SRE and development teams to improve system reliability and performance
- Evaluate and integrate new observability tools and technologies as the landscape evolves
What We're Looking For
- 3+ years of experience in DevOps, Infrastructure, or SRE roles—with real production battle scars
- Deep hands-on experience with observability tools: Prometheus, Grafana, Datadog, New Relic, Splunk, ELK stack, Jaeger, or similar
- Strong proficiency with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code (Terraform, Pulumi, CloudFormation)
- Solid scripting and automation skills (Python, Bash, Go, or similar)
- Experience with containerisation and orchestration (Docker, Kubernetes)
- Understanding of distributed systems, microservices architectures, and the unique observability challenges they present
- Familiarity with CI/CD pipelines and GitOps workflows
- Excellent troubleshooting skills—you're the person who doesn't give up until you've found the root cause
Nice to Have
- Experience with OpenTelemetry and vendor-agnostic instrumentation strategies
- Background in building custom exporters, collectors, or integrations
- Familiarity with chaos engineering and resilience testing practices
- Experience with FinOps and cloud cost optimisation
- Contributions to open-source observability projects
The Kind of Person You Are
- You're not afraid to roll up your sleeves and get stuck in—no task is beneath you
- You thrive in fast-paced environments and stay calm when things go sideways
- You take ownership and see problems through to resolution
- You're curious by nature and constantly looking for ways to improve systems
- You communicate clearly and can explain complex technical concepts to different audiences
- You're pragmatic—you know when to build the perfect solution and when "good enough" ships
What We Offer
- Competitive salary and equity package
- Flexible working arrangements
- Learning and development budget
- Modern tech stack and the autonomy to make real impact
- A team that values doing things properly over just doing things quickly
If this sounds like you, we'd love to hear from you. Send us your CV and tell us about a time you tracked down a gnarly production issue—bonus points if it involved creative use of observability data.