Design, architect and develop inference infrastructure for models and services which are scalable and can handle a large number of simultaneous requests
Apply Engineering best practices – Automation, code reviews, integration tests, performance load tests and CI/CD
Collaborate with cross-functional team (product, engineering, research) to solve complex engineering challenges
Requirements:
Operational experience on a production system that hosts LLMs
Experience with GCP cloud
Experience with building, deploying, and maintaining Kubernetes production clusters
Experience with deploying infrastructure as code (Terraform, Google Deployment Manager, etc.)
Strong experience with Python and/or Java/Kotlin/Rust/Go
Strong experience operating on large volumes of data on the cloud (e.g. vector search, object storage, key/val store, relational databases, etc.)
Experience with software engineering and CI/CD best practices and deployment of AI models and services in production
This job is no longer available
Enter your email address below to get notified whenever we find a similar job post.