Senior Observability Engineer

AI overview

Drive major enhancements to observability and performance across cloud and on-prem Kubernetes environments, benefiting millions of users and fostering a culture of excellence.

Our mission is to enable everyone to build wealth

We reinvent how trading and investing work by creating exceptional products people love.

Fostering a culture of excellence and high velocity is the key to our success.

Today, we serve over 4.5 million clients, with more than €30 billion in assets under management - a testament to the scale and trust we’ve built in just a few years.

Own and evolve Trading 212’s observability and performance ecosystem across cloud and on-prem Kubernetes environments.

What you'll do

  • Design, automate, and optimize observability infrastructure (Prometheus, CloudWatch, Elasticsearch, Kafka, etc.) using IaC and GitOps.

  • Build Grafana dashboards and implement a smart alerting strategy to surface actionable insights.

  • Monitor and analyze system performance, identify bottlenecks, and drive improvements in reliability and cost-efficiency.

  • Collaborate with product, QA, and engineering teams to embed observability best practices.

  • Maintain clear documentation and mentor engineers, fostering a culture of data-driven performance.

  • Plan and test Multi-AZ/Region DR and resilience scenarios.

What you need to have

  • 5+ years of experience in DevOps, SRE, or Systems Engineering, focusing on observability for large-scale distributed systems.

  • Proven experience deploying and maintaining observability tools.

    • Metrics & Monitoring: Strong proficiency with Prometheus and Grafana; experience with AWS CloudWatch.

    • Log Management: Deep knowledge of the ELK stack (Elasticsearch, Logstash, Kibana, Fluentbit).

  • Cloud & Containers: Hands-on experience with AWS, Docker, and Kubernetes.

  • Automation & IaC: Skilled in Python, Go, or Bash for scripting, and proficient with Terraform (Ansible/Puppet a plus).

  • Systems Knowledge: Strong grasp of distributed systems, networking, and Linux/Unix internals.

  • Problem-Solving: Analytical, detail-oriented, and methodical in root cause analysis and troubleshooting.

Nice to have

  • Experience managing and scaling high-throughput Kafka clusters.

  • Experience with CI/CD pipelines (e.g., Github Actions) for managing infrastructure deployments.

  • Familiarity with distributed tracing systems (Jaeger, OpenTelemetry).

  • A background in Site Reliability Engineering (SRE)

  • understanding of SLOs, SLIs, and error budgets.

We offer

  • Challenges that will help you grow and realize your potential really fast

  • Opportunity to make a big Impact - you will build innovative services used by millions of investors to build wealth

  • Work with smart, spirited, helpful, high-performing colleagues with a common goal

  • An environment where nothing is set in stone

  • Appreciation for your talent and ideas

  • Generous remuneration package including annual bonuses

  • Excellent social benefits package, including private health insurance and sports card

  • 25 days of paid vacation per year

  • Delicious treats and a spacious game room

Are you ready to accelerate your career with us? We'd love to hear from you!

We thank all applicants, but only candidates selected for an interview will be contacted.

All personal data of applicants is protected by the law and will be treated with strict confidentiality.

Perks & Benefits Extracted with AI

  • Health Insurance: Excellent social benefits package, including private health insurance and sports card
  • Game room & snacks: Delicious treats and a spacious game room
  • Paid Time Off: 25 days of paid vacation per year
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Engineer Q&A's
Report this job
Apply for this job