Software Engineer

TLDR

Design and create reinforcement learning tasks to improve AI coding models, owning the full lifecycle while collaborating across infrastructure enhancements.

About Mechanize

Mechanize builds reinforcement learning environments that frontier AI labs use to train and evaluate their coding models. Learn more at mechanize.work.

Why the work matters

AI models have gotten good at narrow coding tasks but still fail at the complex, judgment-heavy parts of software engineering. We build the environments that expose those failures and help models improve.

What you'll do

You'll design, build, and quality-assure RL tasks. Each task is a self-contained software engineering challenge with a prompt, an environment, and an automated grader. You own the full lifecycle: ideation, grading infrastructure, running frontier models against the task, failure analysis, and iteration. At this level, we expect you to consistently produce tasks that target meaningful capability gaps in frontier models, and to develop a strong sense for what makes a task informative versus merely difficult.

You will use coding agents heavily, and a large part of the job is directing them well, evaluating their output, and knowing when they are failing in subtle ways. You may also contribute to shared infrastructure: improving our build pipeline, automating parts of QA, or building tooling for other engineers.

What makes someone good at this

Strong technical fundamentals combined with a well-calibrated intuition for AI model behavior. You need to anticipate where a model will take shortcuts, distinguish genuine capability gaps from grader issues, and understand how a model will interpret a prompt. At this level, we expect extensive familiarity with what frontier coding agents can and can't do.

Good fit if you:

  • Can code in Python

  • Are confident working independently at a consistent pace

  • Have developed an intuition for what coding agents can and can't do

  • No prior ML or AI experience required

Probably not a good fit if you:

  • Want a product engineering role building features for end users

  • Prefer a highly collaborative team environment with shared ownership

  • Want extensive structured mentorship

This is independent, high-ownership work. You own your tasks from start to finish, with regular check-ins and feedback. Strong performers are recognized and promoted quickly. Benefits include health, dental, vision, and life insurance. Applying takes less than one minute.

Interview process: https://www.mechanize.work/how-our-interview-process-works

Learn more about the work: https://www.mechanize.work/what-working-here-is-like

About Mechanize. ~20 person team in San Francisco. Backed by Patrick Collison, Nat Friedman, Daniel Gross, Jeff Dean, Dwarkesh Patel, and Sholto Douglas. Featured in the New York Times, the Dwarkesh Podcast and Hard Fork.

Mechanize specializes in creating reinforcement learning environments designed to train AI models for real-world tasks. Our products cater to top AI labs, providing them the tools necessary to push the boundaries of full economic automation.

View all jobs
Salary
$350,000 per year
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Software Engineer Q&A's
Report this job
Apply for this job