Machine Learning Engineer - Web Data Quality

Rio de Janeiro , Brazil
full-time Remote

AI overview

Design and implement intelligent systems for quality improvement of large-scale web datasets, employing modern AI techniques and collaborating with cross-functional teams.

At Zyte, we make the world’s web data accessible to everyone. Our technology powers data extraction at scale, helping businesses and researchers unlock the full potential of the web.

We’re a remote-first, multicultural team of engineers, data scientists, and innovators who believe in curiosity, collaboration, and continuous learning. If you’re passionate about building reliable AI systems and improving the quality of web data, we’d love to hear from you.

About the Role

As a Machine Learning Engineer (Web Data Quality), you’ll design and implement intelligent systems that automatically detect, measure, and improve the quality of large-scale web datasets. You’ll work at the intersection of data science, AI, and distributed systems, collaborating closely with product, engineering, and data teams to make data accuracy measurable, scalable, and actionable.

Requirements

What You’ll Do

  • Develop and deploy ML models for anomaly detection, schema drift, and content validation
  • Build and improve data quality pipelines leveraging modern data and MLOps tools
  • Design and optimize embeddings and GenAI models to enhance data consistency
  • Collaborate with engineers to integrate AI systems into production workflows
  • Conduct experiments, evaluate performance, and iterate for continuous improvement
  • Stay up to date on AI/ML and GenAI research to guide innovation within Zyte

Required

  • 3+ years of experience in Machine Learning / Data Science / AI Engineering
  • Strong Python skills and experience with ML frameworks (PyTorch, TensorFlow, scikit-learn)
  • Experience with data validation, anomaly detection, or data quality systems
  • Familiarity with data pipelines (Airflow, Spark, or similar)
  • Understanding of model evaluation, metrics, and deployment best practices
  • Excellent problem-solving, communication, and collaboration skills

Preferred

  • Experience with LangChain, LlamaIndex, or GenAI model orchestration
  • Familiarity with data labeling tools and active learning approaches
  • Contributions to open-source or public ML projects
  • Experience working in a remote, cross-functional team environment

Benefits

  • 35 days of paid time off
  • Health & wellness support
  • Inclusive and supportive team environment
  • Attend conferences and meet with team members from across the globe.
  • Work with cutting-edge open source technologies and tools

Perks & Benefits Extracted with AI

  • Health Insurance: Health & wellness support
  • Paid Time Off: 35 days of paid time off

At Zyte (formerly Scrapinghub), we eat data for breakfast and you can eat your breakfast anywhere while you work for Zyte. Founded in 2010, we are a globally distributed team of over 190 Zytans working from over 28 countries. We are on a mission to enable our customers to extract the data they need to continue to innovate and grow their businesses. We believe that all businesses deserve a smooth pathway to data.For more than a decade, Zyte has led the way in building powerful, easy-to-use tools to collect, format, and deliver web data, quickly, dependably, and at scale. And today, the data we extract helps thousands of organizations make smarter business decisions, secure competitive advantage, and drive sustainable growth. Today, over 3,000 companies and 1 million developers rely on our tools and services to get the data they need from the web.By joining the Zyte team, you will: Become part of a self-motivated, progressive, multi-cultural, and curious team that excel every day. When you need help there is always someone there who has your back. We are committed to making our customers excited. Have the freedom & flexibility to work remotely. Get the chance to work with cutting-edge technologies and tools. We love to innovate and create new ways of doing things, always striving to do better and be better.

View all jobs
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Machine Learning Engineer Q&A's
Report this job
Apply for this job