Plivo is a leading technology company transforming customer engagement for some of the world’s largest B2C brands, including Uber, WhatsApp, and Zomato. Our new product - the AI agents platform, automates the entire customer lifecycle - from acquiring, engaging, and supporting customers - through cutting-edge multimodal AI, including LLMs, text-to-speech, and speech detection. With a 100+ member team based out of India & US. We are building high-impact global products that handle over 1 billion API requests per month. If you are excited about solving hard, real-world AI challenges at scale, this is where you belong
Role overview
This is a deep systems and multidisciplinary role that bridges real-time communications (RTC), VoIP infrastructure, backend systems, and AI model development. You’ll architect and build a distributed RTC platform across the globe, develop backend services, and integrate AI models into production-grade voice and multimodal experiences at scale. If you love low-latency systems, real-time voice engineering, and AI-driven innovation, this is where you belong.
What You’ll Do
Design and build real-time voice systems using WebRTC, SIP/RTP and Websocket streaming.
Engineer backend infrastructure for signaling, routing, call control, and audio/video processing.
Work with open-source RTC stacks -- Freeswitch, Kamailio, Livekit, RTPEngine, and Pipecat.
Develop and integrate AI capabilities, including: TTS (Text-to-Speech), STT (Speech-to-Text), VAD (Voice Activity Detection), Media servers and AI voice agentsBuild and scale a global, distributed RTC platform with strong resilience, observability, and low latency.I ntegrate AI/ML models into real-time voice systems (speech recognition, synthesis, embeddings).
Build and scale a global, distributed RTC platform with strong resilience, observability, and low latency.
Work across the stack -- from C/Go/Rust real-time components to Python/Node.js backend services, and our SDKs.
Collaborate cross-functionally with Product, and DevOps teams.
Instrument and monitor systems for quality, latency, and performance.
Prototype rapidly: build, test, iterate, and deploy new RTC + AI features.
Contribute to open-source voice and AI ecosystems.
Be hands-on: Debug issues, tune queries, optimize performance, and improve resiliency, you own your code from dev to prod.
Don’t be afraid to jump on a call or chat with a customer to ensure they have a smooth experience -- you own the outcome, not just the code.
Use AI-assisted development tools to improve coding speed, testing, and code quality.
What You Bring
Strong foundation in systems programming -- C, Go, and/or Rust.
Experience in backend and real-time systems engineering.
Expertise in WebRTC, SIP, VoIP, and signaling/audio pipelines.
Hands-on with open-source RTC stacks: Freeswitch, Kamailio, Livekit, RTPEngine, Pipecat
Understanding of media negotiation, codec pipelines, and audio/video streaming.
Knowledge of real-time networking (UDP/TCP, ICE, NAT traversal).Experience building and scaling distributed RTC platforms.
Experience with AI voice systems -- TTS, STT, VAD, LLM voice agents, or speech embeddings.
Familiarity with AI/ML frameworks (PyTorch, TensorFlow, ONNX) or model integration.
Backend development experience in Python or Node.js.
Strong debugging, profiling, and performance optimization skills.
Builder mindset: proactive, curious, and thrives in complex systems.
Bonus Points
Contributions to open-source RTC or AI projects.
Familiarity with LLM integration and multimodal AI (voice + text).Experience in edge computing or real-time streaming optimization.
Exposure to audio signal processing or DSP algorithms.
Experience deploying real-time systems on cloud (AWS/GCP) with Docker/Kubernetes.
Experience with AI voice agents, or voicebots.