Site Reliability Engineer - Full Remote (EU only)

TLDR

Collaborate closely with Backend, Frontend, and Product teams to ensure reliability, scalability, and performance of production systems while evolving engineering culture towards SRE best practices.

About the company

At Jobtome - https://weare.jobtome.com/- we are building a modern, cloud-native recruitment and marketing platform used at scale across multiple countries and brands.
Our systems power high-traffic job distribution, integrations with external partners, and real-time data pipelines, with a strong focus on reliability, observability, and automation.

Engineering is a core function of the company: we value ownership, pragmatic decision-making, and long-term technical excellence over short-term fixes.


The role

As a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our production systems.

You will work closely with Backend, Frontend, and Product teams to:

  • design resilient architectures

  • define reliability standards

  • improve observability and incident response

  • reduce operational toil through automation


This is not a pure ops role: you will contribute to codebases, collaborate on system design, and help evolve our engineering culture toward SRE best practices.

Requirements

What you will do

  • Design, implement, and maintain reliable and scalable cloud infrastructure

  • Define and evolve SLIs, SLOs, and error budgets

  • Improve monitoring, alerting, and observability across services

  • Lead and participate in incident response, post-mortems, and root-cause analysis

  • Automate repetitive operational tasks to reduce toil

  • Collaborate with Backend engineers on service design, scalability, and failure modes

  • Improve CI/CD pipelines, deployment strategies, and release safety

  • Contribute to infrastructure as code and platform tooling

  • Act as a reliability advocate across the engineering organization


Tech stack

  • Cloud: Google Cloud Platform (preferred), AWS

  • Containers & orchestration: Docker, Kubernetes (GKE)

  • Infrastructure as Code: Terraform

  • CI/CD: GitLab CI/CD

  • Observability: Cloud Monitoring, Logging, Prometheus, Grafana

  • Languages: Go, Python, Bash

  • Networking & security: IAM, VPCs, service accounts, secrets management


What we expect from a senior SRE

  • Strong experience running production systems at scale

  • Solid understanding of distributed systems and failure modes

  • Proven experience with SLO-driven reliability

  • Strong coding skills

  • Cloud infrastructure automation experience

  • Ability to debug complex cross-system issues

  • Ownership mindset and strong communication skills

  • Pragmatic approach to reliability, speed, and cost trade-offs


Working model

  • Flexible working hours

  • Remote-friendly setup

  • Small autonomous teams

  • Direct collaboration with product and leadership

Benefits

Flexible Work Hours

Flexible working hours

Remote-Friendly

Remote-friendly setup

Jobtome is an HR Tech company creating modern web applications that streamline complex hiring and recruitment processes for companies at scale. Our cloud-native platform enhances job distribution and integrates seamlessly with external partners, all backed by a commitment to reliability and automation, making it indispensable for organizations navigating the challenges of recruitment.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job