Staff Software Engineer, Compute

TLDR

Build a core Python/Rust platform for scaling operations to 100x current traffic, leveraging AI to enhance system reliability and performance.

You are an experienced software engineer who thrives on building large-scale computing platforms. You have deep expertise in large scale distributed systems that deal with high complexity, a lot of traffic and data. You know how to achieve reliability and scale with minimum operational load.

Key responsibilities

  • Build our core Python/Rust platform: request routing, AI workload orchestration, scheduling, GPU autoscaling, large scale file storage, queueing, etc
  • Produce forward designs for platform evolution as we scale to 100x current traffic and need to provide low latency across the world
  • Leverage AI to an extreme level to automate the mundane parts of building complex but reliable systems
  • Profile and tune low level CPU and memory performance

Requirements

  • 5+ years experience building distributed compute and orchestration platforms in Python or Rust
  • Strong understanding of distributed systems fundamentals: consensus, scheduling, fault tolerance, capacity planning
  • Deep understanding of computational complexity and memory allocation
  • Track record of designing systems that scale under real production load
  • Experience building and using observability to drive performance and reliability decisions
  • Excellent communication and ability to drive technical decisions across teams
  • Self-starter who executes quickly, takes ownership, and constantly seeks improvement

Nice to have

  • Experience with AI/ML inference or training infrastructure
  • Experience with high-performance systems programming (async runtimes, zero-copy, memory-safe concurrency)
  • Background in building multi-tenant compute platforms
  • Understanding of networking fundamentals and performance characteristics
  • Familiarity with GPU workload characteristics and scheduling constraints

Compensation

  • $180,000-250,000 plus equity + benefits

Location

  • San Francisco, CA

What we offer at fal

  • Interesting and challenging work

  • A lot of learning and growth opportunities

  • We are currently hiring in downtown San Francisco.

  • We offer visa sponsorship and will help you relocate to San Francisco.

  • Health, dental, and vision insurance (US)

  • Regular team events and offsites

Benefits

Health Insurance

Health, dental, and vision insurance (US)

Visa Sponsorship

We offer visa sponsorship and will help you relocate to San Francisco.

Fal builds a generative media platform that empowers developers to create and scale multimodal AI applications effortlessly, providing ready-to-use APIs and intuitive interfaces. Focused on delivering robust infrastructure for the generative AI era, Fal combines expertise in distributed systems with custom compute environments to ensure high performance and reliability.

View all jobs
Salary
$180,000 – $250,000 per year
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Staff Software Engineer Q&A's
Report this job
Apply for this job