As an SDE II, you will own and enhance production AI/ML components while driving design and influencing the technical direction of your team.
As an SDE-2 in CoE-ML, you are an independent contributor who owns modules end-to-end, brings strong engineering judgment to AI/ML problems, and actively raises the technical bar of the team. You have moved beyond task execution — you drive design, anticipate failure modes, and begin to influence the technical direction of your pod.
You will:
Own, improve, and extend production AI/ML components — deeply understanding what exists before choosing to build new.
Take end-to-end responsibility for the reliability, performance, and cost-efficiency of the AI modules you own.
Contribute meaningfully to architecture discussions and challenge designs with data and first-principles thinking.
Actively leverage AI-assisted development tools and agentic workflows to multiply your own productivity.
Mentor SDE-1 engineers and interns, sharing technical knowledge and engineering best practices.
Partner closely with product managers, QA, data engineering, and DevOps to ship cohesive AI-powered features.
Design, implement, and continuously improve production-grade AI/ML components — including LLM-powered features, RAG pipelines, agentic workflows, and model inference services. You are expected to deeply understand existing systems, identify opportunities to enhance their quality, reliability, or performance, and own those improvements end-to-end.
Improve and extend existing AI infrastructure — including prompt pipelines, retrieval systems, embedding workflows, and agentic orchestration layers — rather than defaulting to greenfield solutions.
Write clean, well-tested, maintainable code in Python (primary) and optionally Java or Go, following software engineering best practices.
Implement unit, integration, and regression tests for AI components, including evaluation harnesses for LLM output quality.
Contribute to CI/CD pipelines and ensure smooth deployment of AI services on AWS/Kubernetes infrastructure.
Optimize model inference for latency, throughput, and cost — identifying bottlenecks and proposing concrete solutions.
Build and maintain evaluation frameworks to assess model performance, output quality, and regression across releases — using platforms such as Maxim, LangFuse, or Weights & Biases.
Define and track quality metrics (precision, recall, BLEU, ROUGE, LLM-as-judge scores, or task-specific KPIs) for modules under ownership.
Contribute to prompt engineering, few-shot design, and model selection to measurably improve output quality.
Treat evaluation as an ongoing operational discipline — not a one-time pre-release check — and integrate it into the development and deployment lifecycle.
Identify data quality issues affecting model performance and work with data engineering to resolve them.
Monitor AI services in production using infrastructure observability tooling such as Datadog, Prometheus, and Grafana.
Use AI gateway platforms (e.g., LiteLLM, Portkey, TrueFoundry) to track LLM traffic, enforce per-project cost attribution, and maintain governance over model access across environments.
Instrument and observe agentic workflows built on frameworks such as LangGraph or CrewAI — tracing multi-step executions, identifying failure points, and improving reliability.
Respond to production incidents, conduct root cause analysis, and implement preventive fixes.
Participate in on-call rotations and contribute to runbooks and post-mortems.
Proactively surface model drift, latency degradation, and cost anomalies before they escalate.
Lead low-level design (LLD) for features and modules under ownership, and actively participate in high-level design (HLD) discussions.
Surface tradeoffs around scalability, cost, and reliability — and back recommendations with data from production systems.
Document technical designs, API contracts, and component behaviours clearly and keep them up to date.
Propose and drive improvements to existing systems based on production learnings.
Work closely with SDE-3s and Tech Leads to align on design decisions and delivery plans.
Communicate progress, blockers, and technical risks clearly to the pod and stakeholders — without waiting to be asked.
Collaborate with product and QA to translate requirements into precise technical acceptance criteria.
Contribute to design reviews and provide constructive, evidence-based feedback on peers' work.
Mentor SDE-1s and interns on technical approaches, code quality, debugging methodology, and AI tooling.
Document learnings, failure analyses, and best practices for the team's knowledge base.
Participate in team tech talks, brown-bags, and internal AI community events.
Use AI-assisted development tools (e.g., Claude Code, Cursor, Copilot) as a standard part of the daily development workflow — not as an afterthought.
Experiment with agentic workflows to automate repetitive engineering tasks such as test generation, documentation, and code review preparation.
Develop familiarity with the agentic frameworks the team uses (LangGraph, CrewAI, or similar) both as a builder and as an operator debugging them in production.
Share productivity patterns, prompt techniques, and tool evaluations with the team — actively contributing to raising the AI tool adoption floor across the pod.
Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
3–5 years of professional software development experience, with at least 1–2 years working on AI/ML systems in production.
Solid programming skills in Python; familiarity with Java or Go is a plus.
Hands-on experience building or operating ML pipelines, LLM-based features, or data-intensive services.
Working knowledge of cloud platforms (AWS preferred) and containerised deployments (Docker, Kubernetes).
Understanding of machine learning fundamentals — model training, evaluation, feature engineering, and deployment trade-offs.
Familiarity with agentic frameworks (LangChain, LangGraph, CrewAI, or similar) and LLM observability/eval platforms (Maxim, LangFuse, Weights & Biases, or equivalent).
Experience writing automated tests and contributing to CI/CD pipelines.
Clear written and verbal communication skills; able to document and explain technical decisions to peers and stakeholders.
Experience with vector databases and embedding-based retrieval systems (e.g., Pinecone, Weaviate, pgvector).
Hands-on prompt engineering experience — chain-of-thought, few-shot, structured output, and tool-calling patterns.
Exposure to AI gateway platforms (LiteLLM, Portkey, TrueFoundry) for LLM cost governance and traffic management.
Familiarity with open-source LLMs (Hugging Face) and fine-tuning workflows.
Exposure to ASR/TTS or multimodal systems is a bonus.
Prior experience using AI coding assistants (Claude Code, Cursor, GitHub Copilot) as part of a daily development workflow.
Opportunity to build and ship AI features that directly impact how thousands of revenue professionals learn and perform.
A front-row seat to the GenAI revolution — working with LLMs, agentic AI, voice agents, and multimodal systems.
Collaborative, inclusive culture built on DAB values: Delight your customers, Act as a Founder, Better Together.
Mentorship from experienced SDE-3s and Tech Leads who are active in the broader AI community.
Competitive compensation, comprehensive benefits, and equity options.
Flexible work arrangements and strong support for continuous learning and professional development.
Mindtickle builds a revenue productivity platform designed to enhance sales performance through on-the-job learning and effective deal execution. It caters primarily to sales teams looking to maximize revenue generation per representative. What sets Mindtickle apart is its combination of learning and execution within a single platform, positioning it as the go-to solution recognized by industry leaders.
Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!