Execute software engineering tasks to support end-to-end machine learning operations, with a focus on scaling workflows for large data and distributed systems.
Design, build, and maintain MLOps systems, including microservices, queuing systems, APIs, and orchestration workflows using Python, Kubernetes, Kafka, and modern database systems.
Implement observability tools such as Prometheus and Grafana to ensure reliability, performance, and visibility of ML systems in production.
Collaborate closely with data scientists and machine learning engineers to streamline workflows for LLM, generative AI, and Agentic AI development and deployment.
Review architecture and implementation plans to ensure alignment with organizational goals, scalability, and best practices.
Mentor junior and mid-level engineers, fostering a culture of collaboration, innovation, and operational excellence.

Education & Experience

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
5+ years of professional experience in Machine Learning, MLOps, DevOps, or Software Engineering, with a focus on building scalable and reliable software systems.

Core Technical Expertise

Proficiency in Python and Linux, with strong knowledge of designing scalable, distributed systems.
Hands-on experience in designing, implementing, and maintaining MLOps workflows, including CI/CD pipelines, monitoring, and production optimization.
Strong background in cloud computing (AWS/GCP), infrastructure as code (Terraform), containerization, and orchestration (Kubernetes).
Solid understanding of modern software development practices such as test-driven development (TDD), systems thinking, and CI/CD automation.

Observability & Reliability

Experience deploying and managing observability tools such as Prometheus, Grafana, and OpenTelemetry to ensure high reliability and performance in production ML systems.
Expertise in scaling and optimizing distributed systems for large-scale, multi-node computations.

Collaboration & Leadership

Proven ability to collaborate effectively with cross-functional teams, supporting data scientists and ML engineers in deploying and scaling ML models, including LLMs and generative AI systems.
Demonstrated capability in leading technical projects, mentoring junior and mid-level engineers, and fostering a culture of collaboration, innovation, and operational excellence.

Professional Attributes

Pragmatic approach: delivering efficient, real-world solutions while maintaining technical excellence.
Strong attention to detail with a focus on precision and robustness in system design and implementation.
Accountability for deliverables, adherence to company standards, and commitment to ethical guidelines.

What we offer:

Senior Machine Learning Engineer

AI overview