DevOps / Infrastructure Engineer
TLDR
Take ownership of observing and maintaining a robust alerting and logging infrastructure, while collaborating closely with development teams to enhance system reliability.
About the Role
We're looking for a hands-on DevOps / Infrastructure Engineer who lives and breathes observability. This isn't a role where you'll be drawing architecture diagrams from the sidelines—you'll be deep in the trenches, building and operating the systems that keep our platform running smoothly. If you get a kick out of hunting down a tricky performance issue at 2am or feel genuine satisfaction when a well-crafted dashboard lights up with meaningful metrics, we want to talk to you.
You'll own our monitoring, logging, and tracing stack end-to-end—from instrumenting applications to building alerting strategies that actually work. You're someone who believes that if it's not observable, it's not in production.
What You'll Do
- Design, build, and maintain our observability platform—metrics, logs, traces, and everything in between
- Get hands-on with infrastructure: deploy services, troubleshoot incidents, and fix things when they break (because they will)
- Instrument applications and services to capture meaningful telemetry data that drives real insights
- Build dashboards and alerting systems that teams actually use—not just noise generators
- Dive into production issues, correlate data across systems, and lead root cause analysis
- Champion observability best practices across engineering teams and help developers instrument their own code
- Automate everything you can: infrastructure provisioning, deployment pipelines, and operational runbooks
- Work closely with SRE and development teams to improve system reliability and performance
- Evaluate and integrate new observability tools and technologies as the landscape evolves
What We're Looking For
- 3+ years of experience in DevOps, Infrastructure, or SRE roles—with real production battle scars
- Deep hands-on experience with observability tools: Prometheus, Grafana, Datadog, New Relic, Splunk, ELK stack, Jaeger, or similar
- Strong proficiency with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code (Terraform, Pulumi, CloudFormation)
- Solid scripting and automation skills (Python, Bash, Go, or similar)
- Experience with containerisation and orchestration (Docker, Kubernetes)
- Understanding of distributed systems, microservices architectures, and the unique observability challenges they present
- Familiarity with CI/CD pipelines and GitOps workflows
- Excellent troubleshooting skills—you're the person who doesn't give up until you've found the root cause
Nice to Have
- Experience with OpenTelemetry and vendor-agnostic instrumentation strategies
- Background in building custom exporters, collectors, or integrations
- Familiarity with chaos engineering and resilience testing practices
- Experience with FinOps and cloud cost optimisation
- Contributions to open-source observability projects
The Kind of Person You Are
- You're not afraid to roll up your sleeves and get stuck in—no task is beneath you
- You thrive in fast-paced environments and stay calm when things go sideways
- You take ownership and see problems through to resolution
- You're curious by nature and constantly looking for ways to improve systems
- You communicate clearly and can explain complex technical concepts to different audiences
- You're pragmatic—you know when to build the perfect solution and when "good enough" ships
What We Offer
- Competitive salary and equity package
- Flexible working arrangements
- Learning and development budget
- Modern tech stack and the autonomy to make real impact
- A team that values doing things properly over just doing things quickly
If this sounds like you, we'd love to hear from you. Send us your CV and tell us about a time you tracked down a gnarly production issue—bonus points if it involved creative use of observability data.
Benefits
Flexible Work Hours
Flexible working arrangements
Learning Budget
Learning and development budget
Strive Gaming develops a robust iGaming platform designed for scalability, security, and high performance. Our focus is on providing reliable backend services that enhance the online gaming experience for our users. We're here to support gaming operators with powerful technology that drives engagement and growth.