AI Engineer - LLM Quantization Specialist

Jakarta , Indonesia
full-time

AI overview

Join an international team to develop and optimize large language models, focusing on quantization and model compression to drive efficient AI solutions.

Who We Are

At CloudFactory, we are a mission-driven team passionate about unlocking the disruptive potential of AI for the world. By combining advanced technology with a global network of talented experts, we make unusable data usable and inference reliable and trustworthy, driving real-world business value at scale. 

More than just a workplace, we’re a global community founded on strong relationships and the belief that meaningful work transforms lives. Our commitment to earning, learning, and serving fuels everything we do, as we strive to connect one million people to meaningful work and build leaders worth following.

Our Culture

At CloudFactory, we believe in building a workplace where everyone feels empowered, valued, and inspired to bring their authentic selves to work. We are:

  • Mission-Driven: We focus on creating economic and social impact.
  • People-Centric: We care deeply about our team’s growth, well-being, and sense of belonging.
  • Innovative: We embrace change and find better ways to do things, together.
  • Globally Connected: We foster collaboration between diverse cultures and perspectives.

If you’re ready to earn, learn, serve, and be part of a vibrant global community, CloudFactory is your place!

Position Overview

This is a full-time role based in Jakarta, Indonesia. You’ll begin with an on-site phase for the first six months to collaborate closely with our client team, then transition to a hybrid schedule (three days on-site each week). The initial contract is for one year, with the possibility of extension.

About the Role

We are seeking an experienced AI Engineer to join our international AI research and engineering team. Based in Indonesia, you will collaborate closely with our Berlin-based core AI group to develop, optimize, test and deploy large language models (LLMs) at scale.

Your primary focus will be on quantization, model compression, and performance optimization to ensure efficient inference.

This is an exciting opportunity to be part of a global AI innovation hub, working on next-generation model efficiency and serving a global user base.

Key Responsibilities

  • Develop and implement quantization and pruning strategies for large language models (LLMs) to improve runtime efficiency and reduce memory footprint.
  • Collaborate with the AI Research team on model architecture, fine-tuning, and deployment of multilingual and multimodal models.
  • Evaluate and benchmark quantized models across hardware platforms (GPU, TPU, CPU, edge accelerators).
  • Contribute to the design and maintenance of model optimization pipelines (training, evaluation, conversion, inference).
  • Stay current with cutting-edge research on model compression, distillation, and efficient inference frameworks.
  • Support continuous integration of optimized models into production and internal tools.
  • Document methodologies and share insights across global teams to promote technical excellence and reproducibility

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, Electrical Engineering, or related field.
  • 5+ years of professional experience in AI/ML engineering, with a focus on deep learning model optimization.
  • Hands-on experience with quantization techniques (e.g., PTQ, QAT, INT8/FP16 quantization) using frameworks like PyTorch, TensorFlow, or ONNX Runtime.
  • Solid understanding of LLM architectures (e.g., Transformer-based models such as GPT, LLaMA, Mistral, Falcon).
  • Strong programming skills in Python, including proficiency with CUDA, NumPy, and PyTorch internals.
  • Experience with distributed training/inference and deployment on cloud or edge infrastructure.
  • Excellent communication skills and comfort working in a remote, cross-functional, international environment.

Preferred Qualifications:

  • Experience with quantization-aware training (QAT) and post-training quantization (PTQ).
  • Familiarity with Hugging Face Transformers, DeepSpeed, or TensorRT.
  • Contributions to open-source ML optimization libraries or toolkits.
  • Knowledge of low-level performance profiling and benchmarking (e.g., NVIDIA Nsight, PyTorch Profiler).
  • Prior experience collaborating with global AI research teams across time zones.

CloudFactory is a global leader in combining people and technology to provide a cloud workforce solution for machine learning and core business data processing. Our managed teams have experience hundreds of AI projects and can process data with high accuracy using virtually any tool. As an impact sourcing service provider (ISSP), CloudFactory creates economic and leadership opportunities for talented people in developing nations. Trusted by 170+ companies, we enrich data for 11 of the world’s top autonomous vehicle companies and process millions of tasks a day for innovators including Microsoft, Hummingbird, Ibotta, Luminar and nuTonomy. We’re on four continents, with offices in the U.K., U.S., Nepal and Kenya.You will enjoy CloudFactory if creating meaningful work for 1 million people in the developing world excites you. Also if you value building relationships, can be described as both humble and courageous in the same sentence, and you are passionate about pooling individual talents to win as one unified team. You have developed your own engine for personal growth, and help others grow by giving both constructive and encouraging feedback. You love to do the crazy hard work upfront to make things simple for others and your approach is often thinking big, starting small and then scaling fast. If any of this resonates, it is likely you will enjoy and thrive at CloudFactory like nowhere else on earth! 5 Reasons You Should Work at CloudFactory!!Join us and make a difference in the world!After submitting your application, all of our communication will be via email, so please check your inbox and spam folders regularly. CloudFactory will at no stage of this process ask candidates to make payments or pay fees of any kind.

View all jobs
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

AI Engineer Q&A's
Report this job

This job is no longer available

Enter your email address below to get notified whenever we find a similar job post.

Unsubscribe at any time.