Role Overview
We are looking for a Voice AI Agent Developer to design, build, and optimize conversational voice AI agents. You will work on developing intelligent, natural-sounding voice experiences that solve real user problems. This role requires hands-on technical expertise combined with a strong understanding of conversational design principles.
Responsibilities
- Design, develop, and deploy voice AI agents for production environments
- Build and fine-tune voice pipelines including speech-to-text, natural language understanding, dialogue management, and text-to-speech components
- Integrate voice agents with backend systems, APIs, and third-party services
- Optimize for latency, accuracy, and natural conversation flow
- Develop and maintain testing frameworks to ensure voice agent quality and reliability
- Collaborate with product and design teams to define conversational user experiences
- Monitor agent performance, analyze conversation logs, and implement improvements based on user interactions
- Stay current with advancements in voice AI, LLMs, and conversational AI technologies
-
2+ years of industry experience in software development
- Hands-on experience building and deploying voice AI agents or conversational AI systems
- Proficiency in Python or similar programming languages
- Experience with voice/speech technologies such as: Speech-to-Text (Whisper, Deepgram, Google STT, AWS Transcribe), Text-to-Speech (ElevenLabs, PlayHT, Amazon Polly, Google TTS), Voice AI platforms (Voiceflow, VAPI, Retell, Bland AI)
- Working knowledge of LLMs and prompt engineering for conversational applications
- Experience with dialogue management and conversation state handling
- Familiarity with real-time audio streaming and WebSocket protocols
- Strong debugging and problem-solving skills
Nice to have
- Experience with telephony integrations (Twilio, SIP, VoIP)
- Background in NLU/NLP techniques and intent classification
- Experience fine-tuning or training speech models
- Familiarity with cloud platforms (AWS, GCP, Azure)
- Experience with RAG (Retrieval-Augmented Generation) for knowledge-grounded conversations
- Understanding of conversation design best practices and VUI principles
- Contributions to open-source voice or conversational AI projects
Technical Skills
- Languages: Python, JavaScript/TypeScript
- Voice Platforms: VAPI, Retell, Voiceflow, Bland AI, or similar
- Speech Technologies: Whisper, Deepgram, ElevenLabs, PlayHT
- LLM Frameworks: LangChain, OpenAI API, Anthropic API
- Infrastructure: Docker, Kubernetes, cloud services
- Databases: PostgreSQL, Redis, vector databases
- This is an on-site role based out of Bangalore.
- The interview process will consist of four rounds:
-
HR Screening Round – to understand your background, interests, and role fit.
-
Backend Interview - Focused on backend fundamentals, APIs, and system thinking.
-
AI / Voice AI Round – Deep dive into voice pipelines, LLMs, and building real-time conversational agents.
-
Final Round – a conversation with the CTO to assess alignment and expectations.