Berlin, Germany

Full-Time

This would not be possible without a state of the art core platform as the backbone managed by the dedicated SME teams in our Core Group. One such team is the Performance and Observability Team having two sub-tracks as the name implies.

We are looking to expand our team across the Observability sub-track. The position is ideally suited to experienced candidates from Software Engineering / SRE / DevOps backgrounds with deep focus on Observability stack and best practices.

Our team currently has four members, each having a diverse background and perspective. Together we own an observability stack that handles billions of metrics, traces and log entries each month. Our customers are all the engineers at Wolt who use this stack to understand the health of their services / infrastructure at scale.

As of today, we manage a complex observability stack, covering a wide scope from application instrumentation and telemetry data collection to visualization and alerting, spanning both backend and client-facing applications. Our daily responsibilities ensure that this ecosystem operates seamlessly. In parallel, we're building the next-generation observability platform, re-architecting our stack and pipelines in collaboration with our counterparts at DoorDash. This partnership provides an unparalleled opportunity to drive high-impact initiatives across the observability domain, offering empowerment and involvement in cutting-edge projects.

What you’ll do :

Be responsible for building and improving our observability platform and tooling, used by all Wolt engineers.
Contribute to initiatives focused on architecting, building, and maintaining our observability stack to efficiently handle increasing telemetry data with greater reliability.
Champion observability best practices, guiding and supporting other Woltians in this space.
Take ownership of key initiatives to improve the quality, efficiency, and reliability of our observability stack.
Apply your expertise in SRE culture and practices to ensure observability has a meaningful impact on our business.
Participate in the on-call rotation to address incidents and outages, resolving reliability issues efficiently.
Help standardize observability resources by building tools and documentation that enhance productivity and developer experience.
Triage and resolve production issues within the observability scope.
Contribute to open-source efforts by sharing some of our internal tools with the broader community.

Qualifications:

Proven experience in Software Engineering, SRE, or a similar role with a focus on observability, reliability, and scaling large systems.
Strong foundation in computer science principles and engineering fundamentals.
Proficient in development, particularly in Go (preferred) or Python, with experience building automation tools and software for large-scale, distributed systems.
Hands-on experience with observability tooling such as DataDog, Prometheus, Mimir, Elasticsearch, Grafana, Jaeger, and tracing systems.
Expertise in cloud platforms like AWS, GCP, or Azure, with experience managing cloud infrastructure using Kubernetes and containers (Docker).
Deep knowledge of building and maintaining reliable, high-performance, and scalable distributed systems.
Solid understanding of SRE principles, incident response, and designing fault-tolerant architectures.
Experience with infrastructure-as-code tools like Terraform or Ansible for managing cloud environments.
Familiarity with CI/CD pipelines, automated testing, and continuous delivery practices.
Strong analytical and problem-solving skills, with experience troubleshooting complex distributed systems.
Excellent communication and collaboration skills, with the ability to work cross-functionally to enhance platform observability and reliability.
Experience working directly with development teams, with a willingness to dive into application code for observability-related topics, even when unfamiliar with the application code.
Solid experience with Docker and Kubernetes, coupled with a strong foundation in Unix systems and networking concepts.
Open to feedback, recognizing that no one is perfect—including us. We see feedback as an opportunity to learn and grow together.

Nice to Haves:

You have experience with OpenTelemetry, which is a key foundation for much of the infrastructure and tooling the team is converging on as part of our future observability strategy.
You have experience with handling data and running monitoring infrastructure at scale, such as managing petabyte-scale Elasticsearch clusters or similar databases
You have experience operating distributed event streaming platforms at scale e.g. Apache Kafka
Open-source contributions in observability, cloud, or platform engineering are a strong plus

📍This role can be based in one of our tech hubs in Helsinki, Berlin or Stockholm, or you can work remotely anywhere in Finland, Sweden, Germany, Denmark, and Estonia. Read more about our remote setup here. If you live outside of these countries - not to worry! We provide relocation support to help you make your way to Finland, Germany or Sweden.

The position will be filled as soon as we find the right people, so feel free to apply as soon as you feel like hearing more about the position and potentially joining Wolt & Doordash!

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Platform Engineer Q&A's

Report this job

Wolt is hiring a

Core Platform Engineer, Observability