Machine Learning Engineer (Platform)

AI overview

Establish scalable data processing pipelines and optimize machine learning model training for an AI platform aimed at improving cancer therapy personalization.
About Us: Artera is an AI startup that develops medical artificial intelligence tests to personalize therapy for cancer patients. Artera is on a mission to personalize medical decisions for patients and physicians on a global scale. As a Machine Learning Engineer at Artera, you’ll work on the AI Platform team with a focus on establishing scalable and efficient pipelines for data processing and model training. You’ll work closely with AI model developers, fellow machine learning engineers, and our platform engineering team. You’ll ensure that Artera’s model developers can rely on highly efficient, large-scale training regimes and deploy optimized models to production environments. Essential Responsibilities:
  • Evolve and  manage Artera’s ML compute infrastructure.
  • Build and evolve the core libraries used by AI scientists to develop, launch, and monitor AI products.
  • Architect our model development and inference pipelines to accelerate model development speed and developer efficiency.
  • Work with model developers to optimize GPU and CPU efficiency and data throughput of large-scale foundation models and downstream model training runs.
  • Optimize Artera’s ability to store and process terabytes of digital pathology data efficiently for the use in serving large-scale training regimes.
  • Ensure that Artera’s observability infrastructure provides a clear picture of how to continue to optimize performance across our model landscape.
  • Experience Requirements:
  • 4+ years of industry software engineering experience
  • 3+ years of industry experience in using ML orchestration frameworks such as Flyte, Ray, Kubeflow, Metaflow, MLFlow, Dagster, Argo Workflow or Prefect
  • 3+ years of industry experience using one of PyTorch, TensorFlow, or JAX in Python
  • 2+ years of industry experience building with AWS, Docker, and Kubernetes
  • 1+ years of industry experience optimizing large-scale, high data-throughput, distributed machine learning training pipelines
  • Desired:
  • Experience using Terraform, SqlAlchemy
  • Experience in multi-node and multi-gpu training. 
  • Experience deploying and maintaining infrastructure for machine learning training and production inference
  • Familiarity with TorchScript, ONNXRuntime, DeepSpeed, AWS Neuron or similar approaches to inference optimization
  • Equal Employee Opportunity: At Artera, we value bringing together individuals from diverse backgrounds to develop new and innovative solutions for patients and physicians. As an equal opportunity employer, we do not discriminate on the basis of race, color, religion, national origin, age, sex (including pregnancy), physical or mental disability, medical condition, genetic information gender identity or expression, sexual orientation, marital status, protected veteran status, or any other legally protected characteristic. 
    Salary
    $140,000 – $220,000 per year
    Get hired quicker

    Be the first to apply. Receive an email whenever similar jobs are posted.

    Ace your job interview

    Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

    Machine Learning Engineer Q&A's
    Report this job
    Apply for this job