Staff Software Engineer, Compute

TLDR

Build and improve large-scale computing platforms using Python and Rust, with a focus on scalability, reliability, and leveraging AI for automation.

You are an experienced software engineer who thrives on building large-scale computing platforms. You have deep expertise in large scale distributed systems that deal with high complexity, a lot of traffic and data. You know how to achieve reliability and scale with minimum operational load.

Key responsibilities

  • Build our core Python/Rust platform: request routing, AI workload orchestration, scheduling, GPU autoscaling, large scale file storage, queueing, etc
  • Produce forward designs for platform evolution as we scale to 100x current traffic and need to provide low latency across the world
  • Leverage AI to an extreme level to automate the mundane parts of building complex but reliable systems
  • Profile and tune low level CPU and memory performance

Requirements

  • 5+ years experience building distributed compute and orchestration platforms in Python or Rust
  • Strong understanding of distributed systems fundamentals: consensus, scheduling, fault tolerance, capacity planning
  • Deep understanding of computational complexity and memory allocation
  • Track record of designing systems that scale under real production load
  • Experience building and using observability to drive performance and reliability decisions
  • Excellent communication and ability to drive technical decisions across teams
  • Self-starter who executes quickly, takes ownership, and constantly seeks improvement

Nice to have

  • Experience with AI/ML inference or training infrastructure
  • Experience with high-performance systems programming (async runtimes, zero-copy, memory-safe concurrency)
  • Background in building multi-tenant compute platforms
  • Understanding of networking fundamentals and performance characteristics
  • Familiarity with GPU workload characteristics and scheduling constraints

Location

  • Turkey

What we offer at fal

  • Interesting and challenging work
  • A lot of learning and growth opportunities
  • Regular team events and offsites

Fal builds a generative media platform that empowers developers to create and scale multimodal AI applications effortlessly, providing ready-to-use APIs and intuitive interfaces. Focused on delivering robust infrastructure for the generative AI era, Fal combines expertise in distributed systems with custom compute environments to ensure high performance and reliability.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Staff Software Engineer Q&A's
Report this job
Apply for this job