Chalice Custom Algorithms (chalice.ai) is the leading AI application for brands applying their own data and analytics in the real-time decisioning of ad buys. Chalice's software automates data ingestion, predictive analytics, and deployment of custom bidding instructions in a supervised-learning environment, where advertisers can visualize and test their custom algorithms.
Advertisers' algorithms can be deployed across all major DSPs, including The Trade Desk, DV360, as well as Meta and YouTube. Chalice was named "Best Demand Side Tech" by AdExchanger, and powered AdWeek's "Best Use of Programmatic" in the 2023 Media Plan of the Year awards.
We are looking for a highly skilled and experienced Senior Machine Learning Engineer to join our team and play a pivotal role in building our distributed ML infrastructure and advancing our core AI products.
We're seeking a Senior Machine Learning Engineer with 5-10 years of industry experience who thrives at the intersection of scalable ML systems, distributed computing, and business impact. In this role, you will develop and deploy production ML models—including neural network architectures for audience modeling and optimization—that directly power our core products: AI Audiences, AI Allocator, CPA Algo, and Curate AI.
You won't be reinventing broken pipelines—you'll be building on a strong foundation designed for scale and maintainability. Our team has already established distributed training infrastructure using Ray + PyTorch on Databricks, with MLflow for experiment tracking and Unity Catalog for data governance. You'll own the lifecycle of ML systems from training and hyperparameter tuning to batch inference and observability.
This is an opportunity to work closely with Directors of Engineering, Product, and Data Science to build systems that directly impact product strategy and business outcomes in the programmatic advertising space.
• Architect, train, and maintain scalable neural network systems for audience modeling and bid optimization using PyTorch and Ray distributed training (Ray Train, Ray Tune, DDP)
• Build and optimize multi-GPU training pipelines on Databricks, including hyperparameter search with ASHA scheduling and early stopping
• Develop feature engineering pipelines using PySpark, including embedding layers (EmbeddingBag, Embedding) for categorical and behavioral features
• Implement model comparison workflows with champion/challenger evaluation on holdout data
• Build resilient training and batch inference workflows with a focus on automation, reproducibility, and checkpoint recovery
• Implement robust model monitoring and observability solutions (MLflow, Prometheus, Grafana, Datadog) to track drift, performance metrics (AUC, AUPRC, F1), and system health
• Manage model versioning, experiment tracking, and artifact persistence using MLflow and Unity Catalog
• Work closely with engineering teams to integrate model outputs into production systems and optimize dataflows for fault-tolerance
• Partner with product stakeholders to align ML efforts with business impact, KPIs, and product strategy across AI Audiences, AI Allocator, CPA Algo, and Curate AI
• Lead technical design reviews, contribute to internal Python packages, and enforce engineering best practices (testing, CI/CD, modularity)
• Stay current on ML infrastructure advancements (distributed training, inference optimization, model serving patterns) and help guide adoption internally
• Document system architectures, create runbooks, and enable team members to adopt and extend the ML framework
• Master's Degree or PhD in Computer Science, Statistics, Machine Learning, or related discipline with 5-10 years of industry experience
• Strong proficiency in PyTorch for neural network development, including custom architectures with embedding layers, MLP backbones, and binary classification heads
• Production experience with Databricks including Delta Lake, Unity Catalog, Asset Bundles, and cluster management
• Strong grasp of MLOps best practices: experiment tracking (MLflow), model versioning, model serving, monitoring, and reproducibility
• Expert-level Python and PySpark skills for data processing and feature engineering at scale
• Experience building and maintaining batch inference pipelines with schema versioning and artifact management
• Familiarity with cloud platforms (AWS: S3, EC2) and data warehousing (Snowflake)
• Experience with CI/CD workflows including build automation, testing, and packaging using GitHub Actions and Make
• Excellent collaboration and communication skills; ability to work effectively in a cross-functional environment with DS, Product, and Engineering teams
• Exposure to adtech or programmatic advertising (DSPs, bid optimization, audience modeling, lookalike modeling)
• Experience with LLMs, vector databases, or embedding models for content understanding
• Experience building feature stores or large-scale ETL/ELT systems
• Experience with distributed training using Ray (Ray Train, Ray Tune, Ray Datasets) on multi-GPU clusters
• Production experience with Databricks clean rooms
• Experience with reinforcement learning for optimization
• Knowledge of Databricks model serving endpoints
• Experience deploying and operating machine learning workloads in Kubernetes based environments
• Exposure to event driven messaging systems (SQS, SNS, MSK, Red Panda)
• Knowledge of observability tooling: Prometheus, Grafana, Datadog
• ML Frameworks: PyTorch, Ray (Train, Tune, Datasets), PySpark ML
• Data Platform: Databricks (Delta Lake, Unity Catalog), Snowflake, AWS (S3, EC2)
• MLOps: MLflow (experiment tracking, model registry), GitHub Actions
• Observability: Prometheus, Grafana, Datadog
• Languages: Python, SQL, JavaScript/TypeScript
Joining Chalice means being at the forefront of AI for advertising, working in a dynamic environment that encourages creativity, growth, and innovation. We value diversity, collaboration, and dedication to making a meaningful impact in the advertising landscape.
Chalice is a profitable early-stage company with funding from industry insiders including Trade Desk's TD7 Ventures and Aperiam Ventures. Our clients range from Fortune 100 companies and the world's largest advertising agencies to up-and-coming brands and independent agencies across the globe.
• AI is too important for brands to outsource
• AI should be transparent and auditable
• We strive for excellence while fostering learning and creating a supportive work environment
• We are a team of non-conformist, independent thinkers
• We are committed to diversity of thought, background, and experience, and creating psychological safety for all to thrive
• Medical, Dental, and Vision coverage
• 401(k) options
• Unlimited PTO
• 11 Company Holidays
• Office-wide closure between Christmas Eve and New Year's
Chalice is based in NYC with a hybrid office/remote work model. This role is remote with the option for hybrid work in NYC.
If you're ready to be part of a team that's shaping the future of advertising through custom AI solutions, reach out—let's build something amazing together!
Chalice participates in E-Verify. Chalice will provide the Social Security Administration (SSA) and, if necessary, the Department of Homeland Security (DHS), with information from each new employee's Form I-9 to confirm work authorization.
Learn more: www.chalice.ai
Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!
Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.
Senior Machine Learning Engineer Q&A's