Enable Data Science and AI teams to take models and AI-powered services from idea to production, overseeing the technical direction of the MLOps stack and ensuring compliance.
1. Technical strategy & architecture (MLOps)
- Define and evolve the end-to-end ML platform architecture (data, training, registry, serving, monitoring, governance) used by multiple squads.
- Design standard patterns for:
- Reproducible training pipelines and experiment tracking.
- Model packaging, versioning and promotion flows (dev → staging → production).
- Online and batch inference, with safe rollout strategies (canary, shadow, rollback).
- Balance reliability, performance and cost for ML workloads, working closely with SRE/Infra and Finance/FinOps.
2. Day‑to‑day MLOps enablement & operations
- Act as the go‑to person for complex MLOps questions: how to structure pipelines, choose serving patterns, or design monitoring and rollback.
- Review and challenge designs and deployments for new models and data pipelines, ensuring they follow platform standards and non‑functional requirements.
- Partner with Fraud, Anomaly and other product squads to ensure:
- Clear SLAs/SLOs for ML components.
- Proper logging, metrics and alerts for incidents and regressions.
- Contribute to on‑call readiness: playbooks, dashboards, incident reviews and continuous improvement of our operational posture.
3. AI infrastructure & AI‑assisted operations
- Define infrastructure, contracts and guardrails so that we can safely consume agents and AI services built by the AI team, and extend them when needed from MLOps.
- Design patterns and tooling so that AI and agents automate as much as possible of what we do in MLOps, for example:
- Feature platform operations (feature store pipelines, backfills, parity checks, DQ/drift monitoring).
- MLOps platform workflows (training/eval pipelines, promotion gates, rollbacks, documentation and runbook generation).
- Operational flows in Fraud / Anomaly (triage of alerts, log/metric analysis, enrichment of incident context).
- Platform FinOps & cost optimization (suggesting right‑sizing, schedule changes, decommissioning opportunities).
- Contribute to evaluation, observability and safety for these AI‑powered automations (e.g. prompts, policies, redaction, auditability), in close collaboration with dedicated AI teams.
4. Governance, security & compliance
- Set and maintain technical standards for:
- Model and data access control, PII handling and redaction.
- Auditability of model changes, deployments and runtime behavior.
- Environment separation and change management for ML/AI workloads.
- Work with InfoSec and Architecture to ensure the platform aligns with regulatory and internal requirements while remaining practical for engineers and data scientists.
Nice to have
- Experience rolling out AI assistants (code or infra copilots, AI log analysis, etc.) inside engineering organizations, including policies and best practices.
- Exposure to LLM and AI infrastructure (gateways, vector stores, evaluation harnesses), even if not as a core focus.
- Prior responsibilities as Technical Referent / Tech Lead / Architect for platforms or shared services.
- Contributions to internal standards, RFCs, guilds or tech communities.
Flexible Work Hours
we have flexible schedules and we are driven by performance.
Learning Budget
get access to a Premium Coursera subscription.
dLocal Houses
want to rent a house to spend one week anywhere in the world coworking with your team? We’ve got your back!
dLocal offers a robust payment processing solution designed for global enterprises to navigate cross-border transactions in emerging markets. By facilitating local payments and payouts in 40 countries, we help major brands enhance conversion rates and streamline their payment operations.
Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!
Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.
Machine Learning Engineer Q&A's