Applied ML Engineer, Real‑Time Video Generation

AI overview

Take video generation models from research to real-time production, focusing on optimization and low-latency inference systems.

About Cantina:

Cantina Labs is a social AI company, developing a suite of advanced real-time models that push the boundaries of expression, personality, and realism. We bring characters to life, transforming how people tell stories, connect, and create. We build and power ecosystems. Cantina, our flagship social AI platform, is just the beginning.

If you're excited about the potential AI has to shape human creativity and social interactions, join us in building the future!

About the Role:
We’re looking for an Applied ML Engineer who can take video generation models from research to real‑time production. You’ll work across model engineering (fine‑tuning/distillation/optimization) and low‑latency inference & streaming (including WebRTC prototypes).

Typical time split (roughly):

  • 50–60% model engineering (distillation, optimization, fine‑tuning)

  • 30–40% serving / streaming / inference infrastructure

  • 10–20% prototyping + product integration

What You’ll Do:

  • Productionize video generation models: turn research checkpoints into robust, scalable inference APIs.

  • Make models fast and affordable: distillation + performance optimization (latency/cost/memory tradeoffs).

  • Build real‑time inference systems: low‑latency serving, streaming outputs, reliability/observability.

  • Prototype fast: ship demos (often WebRTC) and harden them into production features.

  • Multi‑GPU work: run/optimize large model components across GPUs when needed.

  • Collaborate with research: translate model constraints into deployable systems and performance improvements.

What You’ll Bring:

  • 2+ years in ML engineering (or equivalent), with real ownership of shipped systems.

  • Strong PyTorch + Python, comfortable with both training and inference code.

  • Hands-on experience with generative models (diffusion/transformers/VAEs), especially for image/video.

  • Proven ability to improve latency/cost in practice (profiling, memory optimization, runtime improvements).

  • Production mindset: debugging under load, monitoring, deployment hygiene.

  • WebRTC / real-time media delivery experience.

  • Comfortable in cloud environments: Docker, Kubernetes basic

Bonus Points For:

  • Distillation experience end-to-end (teacher/student, eval design).

  • Familiarity with acceleration toolchains (e.g., compilation / TensorRT / Triton / ONNX).

Technical Stack You’ll Work With:

  • Cloud/Infra: AWS (S3, DynamoDB), Kubernetes, Docker

  • ML: PyTorch

  • Models: video generation (diffusion/VAEs/transformers)

  • Optimization: distillation, real‑time inference, multi‑GPU strategies

  • Streaming: WebRTC prototypes + low‑latency delivery patterns

Location:

This role can be performed remotely in Europe, within GMT +/- 2 hours.

Compensation:

The anticipated annual base salary range for this role is between €190,000-€225,000, plus bonus. When determining compensation, a number of factors will be considered, including skills, experience, job scope, location, and competitive compensation market data.

Benefits for U.S.-based roles:

  • Competitive salary and generous company equity

  • Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina

  • 42 days of paid time off, including:

    • 15 PTO days

    • 10 sick days

    • 15 company holidays

    • 2 floating holidays

  • Generous parental leave & fertility support

  • 401(k) retirement savings plan

  • Lifestyle spending account – $500/month to use however you’d like

  • Complimentary lunch and snacks for in-office employees

  • One Medical membership, and more!

Salary
€190.000 – €225.000 per year
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

ML Engineer Q&A's
Report this job
Apply for this job