About Cantina:
Cantina Labs is a social AI company, developing a suite of advanced real-time models that push the boundaries of expression, personality, and realism. We bring characters to life, transforming how people tell stories, connect, and create. We build and power ecosystems. Cantina, our flagship social AI platform, is just the beginning.
If you're excited about the potential AI has to shape human creativity and social interactions, join us in building the future!
About the Role:
We’re looking for an Applied ML Engineer who can take video generation models from research to real‑time production. You’ll work across model engineering (fine‑tuning/distillation/optimization) and low‑latency inference & streaming (including WebRTC prototypes).
Typical time split (roughly):
50–60% model engineering (distillation, optimization, fine‑tuning)
30–40% serving / streaming / inference infrastructure
10–20% prototyping + product integration
What You’ll Do:
Productionize video generation models: turn research checkpoints into robust, scalable inference APIs.
Make models fast and affordable: distillation + performance optimization (latency/cost/memory tradeoffs).
Build real‑time inference systems: low‑latency serving, streaming outputs, reliability/observability.
Prototype fast: ship demos (often WebRTC) and harden them into production features.
Multi‑GPU work: run/optimize large model components across GPUs when needed.
Collaborate with research: translate model constraints into deployable systems and performance improvements.
What You’ll Bring:
2+ years in ML engineering (or equivalent), with real ownership of shipped systems.
Strong PyTorch + Python, comfortable with both training and inference code.
Hands-on experience with generative models (diffusion/transformers/VAEs), especially for image/video.
Proven ability to improve latency/cost in practice (profiling, memory optimization, runtime improvements).
Production mindset: debugging under load, monitoring, deployment hygiene.
WebRTC / real-time media delivery experience.
Comfortable in cloud environments: Docker, Kubernetes basic
Bonus Points For:
Distillation experience end-to-end (teacher/student, eval design).
Familiarity with acceleration toolchains (e.g., compilation / TensorRT / Triton / ONNX).
Technical Stack You’ll Work With:
Cloud/Infra: AWS (S3, DynamoDB), Kubernetes, Docker
ML: PyTorch
Models: video generation (diffusion/VAEs/transformers)
Optimization: distillation, real‑time inference, multi‑GPU strategies
Streaming: WebRTC prototypes + low‑latency delivery patterns
Location:
This role can be performed remotely in Europe, within GMT +/- 2 hours.
Compensation:
The anticipated annual base salary range for this role is between €190,000-€225,000, plus bonus. When determining compensation, a number of factors will be considered, including skills, experience, job scope, location, and competitive compensation market data.
Benefits for U.S.-based roles:
Competitive salary and generous company equity
Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina
42 days of paid time off, including:
15 PTO days
10 sick days
15 company holidays
2 floating holidays
Generous parental leave & fertility support
401(k) retirement savings plan
Lifestyle spending account – $500/month to use however you’d like
Complimentary lunch and snacks for in-office employees
One Medical membership, and more!
Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!
Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.
ML Engineer Q&A's