Code Data Quality Specialist
TLDR
This hybrid role focuses on reviewing and auditing code annotations to ensure high-quality data for AI models while improving internal tools for annotators.
Generate and validate high-quality data annotations, based on guidelines and continuous feedback, for the development and evaluation of AI models
Surface systemic issues, edge cases, and gaps in guidelines back to annotation operations and technical stakeholders
Produce annotations yourself when needed, modeling the quality bar expected of the team
Build and maintain internal tools and automation that streamline annotator workflows such as visualization dashboards, batch configuration scripts, output management utilities, and similar
Troubleshoot environment, tooling, and CLI/git issues for annotators on their local machines, liaising with IT and engineering as needed
A degree in computer science, engineering, or a related field. Alternatively, 2 to 5 years of professional experience in software engineering, technical support, or developing tools
Hands-on experience using code agents (e.g. Mistral’s vibe) in your own development workflow, and genuine interest in how they're evolving
Proficient in at least one programming language (e.g. Python, JavaScript, or similar), with enough breadth to read and reason about code across a few core languages
Able to apply consistent judgment against a rubric and surface edge cases, ambiguities, or gaps in guidelines
Sustained focus and accuracy on detail-oriented, high-volume review work
Comfortable working in a Unix-like terminal: shell basics, package managers, environment setup, and git workflows (branches, merges, resolving conflicts)
Able to troubleshoot local development environment issues (dependencies, virtual environments, paths, permissions) across common operating systems
Professional proficiency in English, with strong writing and comprehension skills
Prior experience in data annotation for AI/ML, especially LLM training (SFT, RLHF, preference data), evals/benchmarks, or agentic data
Experience building an annotation team through interviews and training
Experience supporting technical users or troubleshooting developer environments (internal tools support, DevRel, teaching assistant for coding courses, etc.)
Fluency across multiple programming languages, or domain depth in one of: frontend, backend, DevOps, MLOps, data engineering
Familiarity with rubric-based evaluation concepts, inter-annotator agreement, or quality measurement for human-labeled data
Experience developing, deploying, and managing internal tooling or automation scripts
Benefits
Free Meals & Snacks
Daily lunch vouchers
Health Insurance
Full health insurance for you and your family
Mobility pass contribution
Monthly contribution to a mobility pass
Paid Parental Leave
Generous parental leave policy
Visa Sponsorship
Mistral AI develops high-performance, open-source AI models and solutions that simplify tasks and enhance creativity for both enterprises and individuals. Our comprehensive platform seamlessly integrates into daily work life, offering tools like Le Chat and Mistral Compute to democratize access to advanced AI technology. We're dedicated to driving innovation and making AI accessible to everyone.