About Our Internship Program
Zoox’s internship program offers hands-on experience with cutting-edge technology, mentorship from some of the industry’s brightest minds, and the opportunity to make meaningful contributions to real projects. We seek interns who demonstrate strong academic performance, engagement beyond the classroom, intellectual curiosity, and a genuine interest in Zoox’s mission.
Project Overview
The Perception Attributes team builds the agent semantics layer of Zoox's perception stack. Our models classify what obstacles mean — detecting emergency vehicle lights, pedestrian gestures, turn signals, and dozens of other behavioral signals that inform how the AV responds. This work sits at the intersection of safety-critical autonomy and cutting-edge ML: our models run on every Zoox vehicle, and our outputs directly influence decisions like yielding to emergency vehicles and interacting with construction workers. The team is small, moves fast, and collaborates closely with ML researchers across the AI org.
During the internship, you will work on one of the most exciting open problems in AV perception: using modern foundation models — large vision-language models, multimodal transformers, and audio-visual architectures — to dramatically expand the semantic understanding of our perception stack. Current approaches require months of data collection and labeling to add a single new attribute class. The research goal is to change that fundamentally, using VLMs and language-aligned representations to make our models more generalizable, queryable, and data-efficient. The work spans dataset construction, model design, and evaluation — with direct implications for how Zoox handles novel emergency vehicles, complex pedestrian behavior, and safety-critical edge cases as we scale to new cities.
Requirements:
Currently working towards a Ph.D., or advanced degree in a relevant engineering program
Good academic standing
Able to commit to a 12-week internship beginning in late May or June of 2026.
At least one previous industry internship, co-op, or project completed in a relevant area
Ability to relocate to the Bay Area, California (or Boston, Massachusetts) for the duration of the internship
Interns at Zoox may not use any proprietary information they are working on as part of their thesis, any published work with their university, or to be distributed to anyone outside of Zoox
Qualifications (It’s helpful if you meet a majority of the following qualifications, but it isn’t a requirement):
Strong background in computer vision and deep learning
Experience training and evaluating ML models in PyTorch
Familiarity with vision transformers, contrastive learning, or knowledge distillation
Bonus Qualifications:
Experience with vision-language models (CLIP, SigLIP, LLaVA, or similar)
Familiarity with knowledge distillation or multimodal learning
Experience with large-scale dataset construction or data pipelines
Compensation:
The monthly salary for this position is $9,500. Compensation will vary based on geographic location. Additional benefits may include medical insurance, and a housing stipend (relocation assistance will be offered based on eligibility).
About Zoox
Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics, machine learning, and design, Zoox aims to provide the next generation of mobility-as-a-service in urban environments. We’re looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.
Accommodations
If you need an accommodation to participate in the application or interview process please reach out to
[email protected] or your assigned recruiter.
A Final Note:
You do not need to match every listed expectation to apply for this position. Here at Zoox, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.