Machine Learning Engineer - Multi-Modality Foundation Model

TLDR

Develop multi-modality foundation models to enhance autonomous system intelligence, utilizing Knowledge Distillation to optimize performance for real-time applications.

In this role, you will:
  • Build, pre-train, and evaluate large-scale multi-modality foundation models from the ground up, successfully aligning diverse data streams (e.g., Vision, LiDAR, Radar, Language, Audio).

  • Define and execute the ML roadmap for deploying these multi-modality representations to the vehicle.

  • Architect and implement Knowledge Distillation pipelines to compress large-capacity multi-modal teacher models into highly efficient, production-ready student models.

  • Build high-quality training and evaluation datasets, applying advanced data-centric techniques to maximize cross-modal representation learning and student model convergence.

  • Collaborate with downstream perception teams to integrate and validate the performance, robustness, and latency of your models in on-board production systems.

  • Qualifications:
  • MS or PhD in Computer Science, Machine Learning, or a related technical field with demonstrated professional experience.

  • Deep, proven expertise in building and training large-scale multi-modality foundation models (e.g., Vision-Language Models (VLMs), Vision-Audio-Text, or Vision-LiDAR-Radar architectures).

  • Strong understanding of cross-modal alignment, multi-modal attention mechanisms, and large-scale pre-training techniques.

  • Proven experience in Knowledge Distillation (KD), model compression, and training highly efficient student models for production environments.

  • Proficiency in ML frameworks (e.g., PyTorch) and experience building large-scale ML training and evaluation pipelines.

  • Bonus Qualifications:
  • Experience in the Autonomous Driving or robotics industry.

  • Experience with model deployment, optimization, and hardware constraints (e.g., C++ for inference, TensorRT, quantization, pruning).

  • Publications in top-tier conferences (CVPR, ICCV, NeurIPS, ICLR, ACL) related to multi-modality foundation models, cross-modal learning, or model compression.

  • About Zoox
    Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics, machine learning, and design, Zoox aims to provide the next generation of mobility-as-a-service in urban environments. We’re looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.


    Accommodations
    If you need an accommodation to participate in the application or interview process please reach out to [email protected] or your assigned recruiter.

    A Final Note:
    You do not need to match every listed expectation to apply for this position. Here at Zoox, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.

    Zoox is building a fully autonomous vehicle fleet from the ground up, coupled with the ecosystem necessary to launch this technology into urban environments. By integrating robotics, machine learning, and innovative design, Zoox is paving the way for a new era of mobility-as-a-service.

    View all jobs
    Salary
    $189,000 – $258,000 per year
    Ace your job interview

    Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

    Machine Learning Engineer Q&A's
    Report this job
    Apply for this job