Palo Alto, United States

Full-Time

Mistral AI is hiring an expert in the role of serving and training large language models at high speed on GPUs. The role is based in San Francisco.

The role will involve

-Writing low-level code to take all advantage of high-end GPUs (H100) and max out their capacity

-Rethinking various part of the generative model architecture to make them more suitable for efficient inference-Integrating low-level efficient code in a high-level MLOps framework

The successful candidate will have

-High technical competence for writing custom CUDA kernels and pushing GPUs to their limits. High expertise on the distributed computation infrastructure of current generation GPU clusters

-Overall understanding of the field of generative AI, knowledge or interest in fine-tuning and using language models for applications

Apply for this job