DevOps Engineer GroudOS (REF5181Z)

TLDR

Collaborate on GroundOS, a transformative platform for global mobility, managing thousands of remote devices and ensuring continuous operations through resilient edge architecture.

Job Description:

Are you an expert in deploying, observing, and maintaining distributed fleets of devices? Do you build infrastructure that scales effortlessly and recovers automatically from mass reconnections? Join our team to oversee the operational backbone of our edge-to-cloud ecosystem. If you love automating complex deployments and diving deep into observability metrics, you are the right fit for us!

Project Description:

Our project, GroundOS, is not just another screen manager. It is a next-generation Universal Display System (UDS) built to power the future of global mobility. We are building an "Operating System for Reality" that orchestrates massive, data-driven signage networks across critical infrastructure, from major international airports to sprawling public transport systems. GroundOS moves beyond static displays; it uses a state-of-the-art digital twin to process and react to real-time operational data. To guarantee continuous operation, the platform features a resilient, offline-first edge architecture that ensures screens keep running smoothly even if the network fails. Join us to blend high-performance Rust edge computing with modern TypeScript cloud services and help us set a new global standard for how hundreds of millions of passengers experience their journey.

Tasks

  • Manage the deployment, observability, and lifecycle of thousands of remote mini-PCs           alongside Cloud components.
  • Execute Over-The-Air (OTA) updates reliably across a massive edge fleet.
  • Configure and manage NATS JetStream, including Leaf Nodes for edge-cloud bridging, stream retention, and cluster HA.
  • Setup and maintain tracing and metrics using OpenTelemetry to monitor cross-system          health.
  • Architect resilient systems capable of withstanding mass fleet reconnection events    (thundering herd) without performance loss.
  • Manage secrets, certificates, and secure mTLS communication between edge devices and the central control plane.
  • Lead incident management and root-cause analysis for fleet-wide issues.
  • Design scalable operations workflows to keep maintenance effort constant as the fleet      grows.

Qualifications:

  • Extensive experience with infrastructure automation and remote fleet management.
  • High proficiency in containerization (Docker), specifically optimized for edge devices     (multi-arch builds, ARM/x64).
  • Deep operational knowledge of NATS JetStream or similar high-throughput event            brokers.
  • Strong background in observability, tracing, and metric collection.
  • Solid understanding of Zero-Trust security architectures and certificate management.
  • Ability to remain calm and analytical during high-pressure incident response        situations.
  • Expert knowledge of agile development
  • Solid knowledge of Scrum
  • Experience working in agile projects and teams
  • Excellent English skills, both written and spoken (B2–C1)
  • Excellent technical and analytical skills, as well as problem-solving abilities
  • Ability to handle stressful situations and work independently

Advantages:

  • Experience with Google Clouds GKE for the central cloud control plane.
  • Prior experience with specific edge orchestration tools

 

* Please be informed that our remote working possibility is only available within Hungary due to European taxation regulation.

Deutsche Telekom IT Solutions, a subsidiary of the Deutsche Telekom Group, offers a comprehensive range of IT and telecommunications services, focusing on connected living and working solutions. Serving hundreds of large customers across Europe, the company leverages its skilled workforce to build digital infrastructures that empower organizations to innovate and thrive.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Operations Engineer Q&A's
Report this job
Apply for this job