XR is a global technology platform powering the creative economy. Its unified platform moves creative and productions forward, simplifying the fragmentation and delivering global insights that drive increased business value. XR operates in 130 countries and 45 languages, serving the top global advertisers and enabling $150 billion in video ad spend around the world. More than half a billion creative brand assets are managed in XR’s enterprise platform.
Above all, we are a supportive and collaborative culture dedicated to DEI. We are caring, dedicated, positive, genuine, trustworthy, experienced, passionate and fun people with loyalty to our customers and our fellow teammates. It is our belief that the better we work together to help our clients achieve their goals, the more successful XR will be.
The Opportunity
The Lead MLOps Engineer plays a critical role in ensuring the seamless integration, deployment, monitoring, and scaling of machine learning models into production. The role blends the expertise of DevOps and machine learning to bridge the gap between data science and operational systems, ensuring that ML models perform reliably and at scale in real-world environments. As the Lead MLOps Engineer, you'll drive best practices for model lifecycle management and create the infrastructure to automate and streamline workflows.
Job Responsibilities
- Design and architect the AI/ML models platform to support scalable, efficient, and high-performance machine learning workflows.
- Build and manage infrastructure that supports the deployment of machine learning models. This includes leveraging cloud services (AWS), CDK, and containerization tools like Docker.
- Architecting and developing MLOps systems with tools such as AWS Sagemaker, MLFlow, Stepfunctions, Lambdas.
- Lead the design and implementation of CI/CD pipelines to automate model deployment and rollback processes, ensuring that models can be delivered seamlessly to production aiming to reduce manual intervention and increasing system reliability.
- Ensure scalability and efficiency of the models to handle real-time predictions and batch processing.
- Set up monitoring and logging solutions for tracking the performance of models in production (DataDog, Cloudwatch).
- Define and promote best practices in MLOps.
- Provide technical leadership and mentorship to MLOps engineers on technologies, and standard processes.
- Partner with the global engineering team to drive cross-functional alignment and ensure seamless integration of AI ML models into wider data ecosystem.
- Work closely with Data Scientists, DevOps teams, and Product Managers to ensure that machine learning models are integrated into business workflows and deployed effectively.
- Stay up-to-date with the latest trends and technologies in MLOps and machine learning deployment and identify opportunities to incorporate new tools or practices to improve efficiency.
Requirements
- MS/BS in Computer Science or related background preferred;
- 5+ years of experience in MLOps or related roles, with at least 2+ years in a leadership or senior engineering capacity;
- Proven experience leading and mentoring teams, managing multiple stakeholders, and delivering projects on time;
- Proficiency in Python is essential;
- Experience with shell scripting, system diagnostic and automation tooling;
- Proficiency and professional experience of ML and computer vision;
- Have built and deployed ML, computer vision or GenAI solutions (PyTorch, TensorFlow);
- Experience working with databases to manage the flow of data through the machine learning lifecycle;
- Experience with cloud-native services for machine learning, such as AWS SageMaker, MLFlow, Stepfunctions, Lambdas is essential;
- Deep expertise in Docker for containerization of machine learning models and tools is essential;
- Experience delivering environment using infrastructure-as-code techniques (AWS CDK, CloudFormation);
- Experience setting up and managing continuous CI/CD pipelines for ML workflows using tools like Jenkins, GitLab;
- Experience in fast-paced, innovative, Agile SDLC;
- Strong problem solving, organization and analytical skills;
- Experience with Databricks is beneficial;
- Experience in building and managing training, evaluation and testing datasets in beneficial;
- Knowledge of security best practices in the context of machine learning.