As a Senior MLOps Engineer, you will play a critical role in building and operating our ML and AI platform, automating key processes and workflows for efficiency and impact.
Implement and maintain online and offline feature pipelines that feed our enterprise Feature Store, combining:
Flink‑based streaming jobs ingesting large volumes of events from multiple sources (payments, fraud, anomaly, etc.) into online stores.
Databricks / Spark pipelines for offline feature computation, backfills and training datasets.
Ensure:
Point‑in‑time correctness for offline training and backtesting.
Low‑latency, high‑throughput online feature serving with clear SLAs, TTL semantics and multi‑tenant safety.
Contribute to the feature catalog and specs:
Define entities, feature views, schemas, SLAs, PII classification and owners.
Help data scientists and domain teams onboard new features safely and consistently across Flink and Databricks.
Develop tooling for:
Backfills and materialization coordination between Flink and Databricks (Lakehouse / Delta).
Offline–online parity checks, data quality, drift and freshness monitoring for critical feature groups.
Unified feature retrieval APIs (online/offline/batch) and SDK/CLI usage from models and services.
Implement and improve training and evaluation pipelines:
Reproducible workflows, experiment tracking and model registry integration.
Promotion flows from dev → staging → production, following platform standards.
Work on online and batch inference paths:
Model packaging and deployment.
Rollout strategies (canary, shadow, rollback) aligned with SRE/Infra.
Instrument pipelines and services with metrics, logs and traces:
Integrate with our observability stack (e.g. OTel, Coralogix).
Expose dashboards and alerts for ML components (latency, errors, drift, freshness).
Integrate and extend agents and AI services (built by the AI Team and MLOps) to automate key parts of the Feature Store and MLOps workflows (health checks, drift and quality analysis, documentation/specs, incident triage, FinOps suggestions, etc.).
Design these automations with clear guardrails: observable, auditable and easy to roll back, always keeping humans in control of production decisions.
Implement changes that respect platform standards around:
Access control, secrets management and PII handling in features and models.
Environment separation and change management for ML/AI components.
Participate in on‑call rotations or escalation paths for ML pipelines and feature infrastructure:
Diagnose and fix incidents.
Contribute improvements to playbooks, dashboards and tests.
Work closely with:
MLOps Technical Referent to align on architecture and technical direction.
Data Science squads and the AI Team to understand requirements and unblock use cases.
Fraud, Anomaly and other product squads as consumers of features and models.
Contribute to internal documentation, RFCs, examples and onboarding guides so other engineers and data scientists can adopt the platform more easily.
Mentor mid‑level engineers on good practices in pipelines, testing, observability and automation.
Solid experience as a Senior Engineer working on:
MLOps, data platforms, or large‑scale backend / distributed systems.
Hands‑on experience with big data / streaming technologies (e.g. Spark, Flink, Kafka, Kinesis, or similar).
Proven track record building production‑grade ML pipelines:
Experiment tracking and reproducible training flows.
CI/CD for models and data pipelines.
Online and batch inference at scale.
Familiarity with cloud‑based ML platforms and containerized deployments
(e.g. Databricks, SageMaker, Vertex AI, or equivalent).
Strong understanding of observability:
Metrics, logs and traces.
Data and model drift, freshness and quality checks.
Ability to write clean, maintainable code and collaborate through reviews, design docs and pairing sessions.
Comfortable communicating with Data Scientists, ML Engineers and Infra/SRE, translating requirements into concrete technical solutions.
Experience working with or around Feature Stores (Feast, Databricks Feature Store, custom implementations, etc.).
Exposure to LLMs, agents and AI assistants, especially applied to:
Developer productivity (code/infra copilots).
Log/metric/incident analysis or documentation generation.
Experience in Fintech, risk, fraud or anomaly detection environments.
Contributions to internal standards, RFCs, runbooks or technical talks.
Flexible Work Hours
Flexibility: we have flexible schedules and we are driven by performance.
Learning Budget
Learning & development: get access to a Premium Coursera subscription.
Coworking house rental
dLocal Houses: want to rent a house to spend one week anywhere in the world coworking with your team? We’ve got your back!
dLocal offers a robust payment processing solution designed for global enterprises to navigate cross-border transactions in emerging markets. By facilitating local payments and payouts in 40 countries, we help major brands enhance conversion rates and streamline their payment operations.
Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!
Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.
Senior Machine Learning Engineer Q&A's