The Experience team designs Spotify’s consumer experience—end to end, moment to moment, across every screen, platform, and partner integration. Our mission is to make listening feel effortless, personal, and joyful for billions of users around the world. That means turning complexity into clarity across hundreds of touchpoints—from our mobile and desktop apps to the smart speakers, TVs, cars, and integrations where Spotify shows up every day. If it touches a consumer, we shape it. We bring deep insight into human behavior, design, and technology to craft experiences that feel intuitive, expressive, and unmistakably Spotify.
The Enrichment & Content Intelligence team sits within Content Platform in the Experience Mission. We build the metadata-resolution and content-enrichment infrastructure that powers how Spotify understands music and video content at global scale. Our systems help answer foundational questions across the platform: which tracks are the same recording, which music videos match which audio tracks, who wrote and performed a song, and how content relationships connect across Spotify’s catalog.
Our infrastructure powers products and experiences used by millions of listeners, artists, and creators every day. From recommendations and charts to royalties and artist tooling, the work we do directly shapes how content is understood and surfaced across Spotify.
We’re looking for a Senior Machine Learning Engineer to help evolve the machine learning systems behind Recording Groups, Music Video Resolution, SongDNA, and the Music Knowledge Graph. This role sits at the intersection of multimodal machine learning, entity resolution, and production-scale engineering, with opportunities to work across audio, video, and metadata understanding problems at massive scale.
What You'll Do
Own and evolve large-scale ML pipelines powering Spotify’s content-resolution systems
Lead development of multimodal embedding frameworks supporting multimodal understanding, music video matching, SongDNA
Improve entity-resolution systems across music and video content, helping Spotify better understand relationships between recordings, versions, and content formats
Design and run experiments to improve precision, recall, and overall content-quality outcomes using offline evaluation, golden datasets, A/B testing, and impact analysis
Build scalable ML evaluation and monitoring infrastructure, including standardized datasets, retraining workflows, and continuous improvement systems
Contribute to the evolution of the Music Knowledge Graph by improving production ML capabilities, observability, and model lifecycle management
Partner closely with Product Managers, Data Scientists, and engineering teams across Content Platform and the wider Experience Mission
Help shape technical strategy for the squad and contribute to long-term ML direction across the product area
Mentor engineers and contribute to a strong culture of technical collaboration and experimentation
Who You Are
You have solid experience building, deploying, and maintaining machine learning systems in production at scale
You have strong experience training, evaluating, and operating ML models using modern frameworks such as PyTorch or TensorFlow
You have experience working with multimodal machine learning systems across audio, computer vision, text embeddings, or related domains
You understand entity resolution, deduplication, record linkage, or large-scale matching problems, ideally across multiple content modalities
You know how to design evaluation systems that balance model quality, operational performance, and real-world impact
You are experienced working with large-scale distributed data processing systems and ML infrastructure
You communicate effectively across engineering, product, and data science stakeholders
You are comfortable leading technical initiatives and influencing engineering direction within a team
Experience with Scio, Dataflow, Flyte, BigQuery, or similar distributed processing frameworks is a plus
Experience with Scala is a plus
Experience with computer vision, video understanding, multimodal embeddings, or recommendation systems is a strong plus
Where You'll Be
This role is based in New York City
We offer you the flexibility to work where you work best! There will be some in person meetings, but still allows for flexibility to work from home.
The United States base range for this position is $184,049–262,928 USD, plus equity. The benefits available for this position include health insurance, six-month paid parental leave, 401(k) retirement plan, monthly meal allowance, 23 paid days off, paid flexible holidays, and paid sick leave. These ranges may be modified in the future.