AI Inference Engineer

New York , United States

full-time

TLDR

Develop APIs for high-performance AI inference systems, enhance reliability, and drive innovations in LLM optimizations while working with cutting-edge technologies.

We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout our inference stack
Improve the reliability and observability of our systems and respond to system outages
Explore novel research and implement LLM inference optimizations

Qualifications

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Apply for this job

Perplexity

Perplexity builds an advanced answer engine that leverages large language models to redefine how users search and interact with information online. Targeted at enhancing browsing experiences, the company is at the forefront of AI-driven knowledge tools, making it easier for people to discover relevant answers swiftly and effectively.

Website LinkedIn

View all jobs

Salary

$220,000 – $405,000 per year

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Engineer Q&A's

Report this job

Apply for this job