Staff Engineer, Machine Learning

AI overview

Design and build core backend services for AI/ML runtime while optimizing performance, reliability, and cost in production systems.

REQUIREMENTS:

Total experience of 6 years+
Strong expertise in Python and backend engineering with experience building scalable, distributed microservices.
Hands-on experience designing and delivering end-to-end RAG (Retrieval-Augmented Generation) workflows in production systems.
Solid understanding of ML solution design, including embeddings, retrieval, ranking, feature engineering, and evaluation strategies.
Experience with vector databases (FAISS, Pinecone, Milvus, Weaviate) and implementing chunking, indexing, vector search, re-ranking, caching, and memory patterns.
Knowledge of LLM/NLP engineering, including prompt engineering, model integration, orchestration tools (LangChain/LlamaIndex), and evaluation instrumentation.
Experience productionizing ML systems with observability, online/offline parity, and performance optimization across latency, throughput, and cost.
Strong backend integration skills using REST/gRPC APIs, Docker, Kubernetes, CI/CD, and cloud platforms (AWS/GCP/Azure).
Ability to independently design, ship, and operate reliable, scalable, and cost-efficient ML-backed backend systems with strong ownership mindset.

RESPONSIBILITIES:

Design and build core backend services powering AI/ML runtime including orchestration, session/state management, and tools/services integration.
Implement end-to-end retrieval and memory systems covering ingestion, embeddings, indexing, vector search, ranking, caching, and lifecycle management.
Productionize ML workflows with feature/metadata services, model integration contracts, and evaluation hooks.
Drive performance, reliability, and cost optimization with strong SLO ownership and observability practices (logs, metrics, tracing, guardrails).
Collaborate with applied ML teams on model routing, prompts/tools, evaluation datasets, and safe releases.
Translate business requirements into scalable technical designs, define NFR benchmarks, and review architecture for extensibility and best practices.
Lead troubleshooting, root-cause analysis, and POCs to validate technology and design decisions.

Bachelor’s or master’s degree in computer science, Information Technology, or a related field.