AI Inference Engineer

Burlingame , United States
full-time

AI overview

Bridge AI/LLM models and unique platforms by optimizing deployment, profiling performance, and developing tools to enhance inference in cutting-edge neural processing applications.

Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code.

Role:

The AI Inference Engineer in Quadric is the key bridge between the world of AI/LLM models and Quadric unique platforms. The AI Inference Engineer at Quadric will [1] port AI models to Quadric platform; [2] optimize the model deployment for efficient inference; [3] profile and benchmark the model performance. This senior technical role demands deep knowledge of AI model algorithms, system architecture and AI toolchains/frameworks.

Responsibilities:

  • Quantize, prune and convert models for deployment
  • Port models to Quadric platform using Quadric toolchain
  • Optimize inference deployment for latency, speed
  • Benchmark and profile model performance and accuracy
  • Develop tools to scale and speed up the deployment
  • Make Improvement to SDK and runtime
  • Provide technical support and documents to customers and developer community

Requirements

Requirements:

  • Bachelor’s or Master’s in Computer Science and/or Electric Engineering.
  • 5+ years of experience in AI/LLM model inference and deployment frameworks/tools
  • experience with model quantization (PTQ, QAT) and tools
  • experience with model accuracy measures
  • experience with model inference performance profiling
  • experience with at least one of the following frameworks: onnxruntime, Pytorch, vLLM, huggingface-transformer, neural-compressor, llamacpp
  • Proficiency in C/C++ and Python
  • Demonstrate good capability in problem solving, debug and communication

Benefits

  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k, IRA)
  • Life Insurance (Basic, Voluntary & AD&D)
  • Paid Time Off (Vacation, Sick & Public Holidays)
  • Family Leave (Maternity, Paternity)
  • Short Term & Long Term Disability
  • Training & Development
  • Work From Home
  • Free Food & Snacks
  • Stock Option Plan

Perks & Benefits Extracted with AI

  • Free Meals & Snacks: Free Food & Snacks
  • Health Insurance: Health Care Plan (Medical, Dental & Vision)
  • Other Benefit: Training & Development
  • Paid Parental Leave: Family Leave (Maternity, Paternity)
  • Paid Time Off: Paid Time Off (Vacation, Sick & Public Holidays)

Quadric is building the next generation of Computing Architecture for the Edge.Our team is as thoughtfully architected as our product; in fact, the two go hand-in-hand. We are looking for technical ninjas, who are ready for the adventure of a lifetime. What do we mean by ninjas? We mean people with deep domain expertise who are driven by the desire to do something BIG in the company of good people.Our team is built upon mutual respect for what everyone brings to our end-to-end system. Without each part, there would be no whole. As such, our team is collaborative and focused.What We Value: Integrity, Humility, HappinessWhat We Expect: Initiative, Collaboration, CompletionOur Goal: For employees to look back on this chapter of building the company with amazing memories -- remembering it as a time that was challenging and exciting as we worked together to build something extraordinary.LCA Notice

View all jobs
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Engineer Q&A's
Report this job
Apply for this job