Mountain View, United States

Job Overview

The AI Inference team, within the Contextual AI’s platform organization, designs, builds, and operates Gen AI and LLM inference systems at scale. The team pioneers system innovation to optimize latency, throughput, and cost for all Contextual AI’s models powered by RAG 2.0 technology.

What you’ll do:

As a Member of Technical Staff at Contextual AI, you will:
• Design, develop, test, and deploy high-performance inference solutions for—though not limited to—Gen AI state-of-the-art model architectures, RAG 2.0, knowledge retrieval models, and language encoders.
• Be responsible for optimizing end-to-end inference latency, throughput, and cost, ensuring the most efficient use of our inference cluster.
• Drive system architecture, spearhead best practices, and mentor junior engineers.
• Improve the reliability, scalability, and observability of our distributed inference infrastructure.
• Read papers and consult with scientists to gain insights into emerging techniques, integrating them into our roadmap.
• Design and experiment with new algorithms, benchmarking the latency and accuracy of your implementations.

What we’re seeking:

• M.Sc. or PhD in Computer Science, Engineering, Statistics, Mathematics, or a related field.
• 5+ years of non-internship professional software development experience, including experience in leading design or architecture of new and existing systems.
• Experience as a mentor, tech lead, or leading an engineering team.
• Proficiency in Python, PyTorch, multi-threaded asynchronous C++/Go, and performance optimization.
• Experience with GPU programming and the GPU inference stack: TensorRT-LLM, Triton, CUDA, and CUPTI.
• Proficiency in the TensorFlow and/or PyTorch frameworks.
• Experience with Linux kernel system calls or the POSIX API (process control, communication, and device management).
• A problem-solving mindset, owning tasks end-to-end and acquiring the necessary knowledge to get the job done.
• A good intuition for when off-the-shelf solutions are sufficient and the ability to build tools to accelerate your workflow when they aren’t.
• The ability to move quickly in an environment where things are sometimes loosely defined and may have competing priorities or deadlines.

Location: Mountain View, CA

Salary Range for California Based Applicants: $140,000 - $300,000 + equity + benefits (actual compensation will be determined based on experience, location, and other factors permitted by law).

Equal Opportunity

Contextual AI is an equal opportunity employer and complies with all applicable federal, state, and local fair employment practices laws. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law.

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Report this job

Contextual AI is hiring a

Member of Technical Staff (AI Inference)