Senior Platform/MLOps Engineer
TLDR
Build scalable systems foundational to manufacturing technology, focusing on AI/ML infrastructure, collaborating with teams to deploy GPU workloads in Kubernetes for impactful industrial solutions.
-
Design, implement, and maintain reliable, scalable, and secure infrastructure, applications, and tooling, with a focus on our ML/AI pipelines and workloads
-
Write clean, maintainable code, and perform peer code-reviews
-
Write clear and concise documentation and engage in cross-team communication and knowledge sharing
-
Work with other team members to investigate design approaches, prototype new technology and evaluate technical feasibility
-
Pair with adjacent teams to understand how your frameworks and infrastructure are actually used in the field, continuously improving them and leveraging recent advances to improve developer velocity
-
At least 5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE).
-
B.S. or M.S. degree (or equivalent) in Computer Science, Engineering, or a related field
-
Proficiency in at least one modern programming languages (Python, Javascript, C#, Go, etc)
-
Demonstrated industry best-practices in MLOps
-
Proficiency with CI/CD tools and GitOps workflows
-
Familiarity with running GPU workloads in kubernetes
-
Strong knowledge of Kubernetes (self-hosted and managed) and modern k8s paradigms (e.g. CNCF)
-
Proficiency with Infrastructure as Code tools (Terraform, etc) and configuration management tools (Ansible, etc)
-
Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry)
-
Experience in air-gapped or extremely strict security environments
-
Experience communicating with users, technical leaders and management to collect requirements, describe system designs, and architecting software systems that meets your stakeholders needs
-
Knowledge and demonstrated application of software engineering best practices relating to the SDLC including code reviews, SCM, CI/CD, testing, and operations
-
Demonstrated ability to mentor and grow other team members
Bright Machines is an innovative manufacturer specializing in AI-powered robotics to streamline the assembly of data center infrastructure. We cater to the needs of hyperscalers and OEMs, leveraging advanced technology to efficiently build vital hardware that fulfills the growing demand for computational power.
- Founded
- Founded 2018
- Employees
- 201-500 employees
- Industry
- Internet Software & Services
- Total raised
- $180M raised