Blue Machines AI
Blue Machines AI

Senior Platform / DevOps Engineer (Real-time Media, WebRTC, Edge + Cloud)

TLDR

Join a hands-on team to build a scalable, resilient real-time communications platform using cutting-edge technologies, including multi-region Kubernetes and Infrastructure-as-Code.

Senior Platform / DevOps Engineer (Real-time Media, WebRTC, Edge + Cloud)
Job title: Platform / DevOps Engineer (WebRTC, Edge + Cloud)
 Location: Bengaluru (Hybrid/Office)
 Employment type: Full-time
 Experience: 5–12+ years (flexible for strong fit)

About the role

We’re building and operating a LiveKit-like real-time communications platform (WebRTC) that must scale to millions of calls with edge PoPs for ultra-low latency and multi-region cloud reliability. This is a hands-on, high-ownership role focused on production systems, performance, and resilience.
We’re especially interested in engineers who’ve seen scale in real-time/streaming infra.

What you’ll do

Own reliability and performance of signaling, SFU/media nodes, TURN, routing, failover, and capacity planning
Build and run multi-region Kubernetes platforms with secure networking and zero-downtime deployments
Design edge + cloud architecture: PoPs, global routing, failover, autoscaling, DR
Implement SLOs/SLIs, incident response, postmortems, and operational excellence
Create strong observability: metrics, logs, tracing, and real-time QoE/latency metrics

  • Ship Infrastructure-as-Code and automation: Terraform, Helm, GitOps, CI/CD


Required skills

Strong production experience with Kubernetes at scale (multi-cluster/multi-region)
Strong Linux + networking fundamentals (UDP/TCP, NAT, conntrack, DNS, load balancing)

  • Experience with IaC + delivery: Terraform, Helm, GitOps (ArgoCD/Flux), CI/CD
  • Proven on-call ownership for high-availability systems


Nice to have

  • WebRTC/RTC operations: ICE, STUN/TURN, SFU scaling, packet loss/jitter tuning
  • Edge/PoP and traffic management experience (global routing, Anycast/DNS strategies)
  • Cost optimization for bandwidth-heavy workloads
  • Experience operating realtime/streaming systems at very high concurrency


What success looks like

You can keep a real-time system stable through traffic spikes, packet loss, ISP variability, zone/region failures

  • You think in terms of latency budgets, concurrency, bandwidth, packets/sec, not just pods and nodes
  • You build platforms that are observable, automatable, and easy to operate

Blue Machines AI builds a robust Voice AI platform tailored for enterprises, enabling businesses to automate interactions while maintaining context and compliance. We cater to a range of industries, helping clients like airlines and banks seamlessly transition from AI-driven support to human agents, ensuring a smooth customer experience.

Founded
Founded 2019
Industry
Internet Software & Services
View company profile
Report this job
Apply for this job