Job Overview
As a Machine Learning Infrastructure Engineer, you will be responsible for building and maintaining the infrastructure that enables the development, deployment, and scaling of machine learning, data, and service pipelines. You will collaborate closely with data scientists and full-stack developers to shape Contextual AI’s machine learning services and systems. Your role will be crucial in ensuring the scalability, reliability, availability, and performance of our RAG 2.0 systems.
What you’ll do:
• Build and productionize ML infrastructure to support state-of-the-art RAG systems using technologies like Terraform, ArgoCD, Kubernetes, and cloud infrastructure providers.
• Design, build, and optimize scalable machine learning infrastructure to support the training, evaluation, and deployment of large language models.
• Instrument, monitor, and optimize the performance and reliability of Contextual AI’s services infrastructure.
• Ensure security best practices are applied to application build pipelines and cloud/SaaS infrastructure.
• Stay updated on the latest trends and best practices in MLOps, security, and AI to continuously optimize RAG infrastructures.
• Work closely with stakeholders to understand business needs and translate them into technical solutions.
• Mentor and guide junior AI engineers and team members, promoting technical excellence and knowledge sharing within the team.
What we’re seeking:
• Education: Master’s degree in Computer Science, Engineering, or a related field (Ph.D. preferred).
• Experience: 5+ years of experience in building highly available, production-grade containerized distributed ML systems.
• Technical Expertise: 5+ years of experience with ML infrastructure technologies such as Kubernetes, GCP, AWS, ArgoCD, Terraform, CloudFormation, infrastructure as code, containerization, SLURM automation, Linux fundamentals, GitHub Actions, and CI/CD.
• Cloud Services: Demonstrated experience in managing cloud-based Kubernetes services.
• Programming Skills: 5+ years of experience in one programming language such as Python, Java, or Go.
• Machine Learning Knowledge: Familiarity with key concepts in machine learning, natural language processing, and computer vision.
• Data Skills: Familiarity with data preprocessing, feature engineering, and data visualization.
• HPC Experience: Familiarity with large GPU clusters and high-performance computing/networking.
• Leadership: Experience in mentoring and growing junior engineers into successful leaders.
• Communication: Excellent communication and collaboration skills, with the ability to work effectively in a fast-paced and dynamic environment.
If you are passionate about building and optimizing ML infrastructure to power cutting-edge AI solutions, we’d love to have you on our team.
Location: Mountain View, CA
Salary Range for California Based Applicants: $140,000 - $300,000 + equity + benefits (actual compensation will be determined based on experience, location, and other factors permitted by law).
Equal Opportunity
Contextual AI is an equal opportunity employer and complies with all applicable federal, state, and local fair employment practices laws. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law.