AI Engineer

AI overview

Manage platforms and infrastructure for reliable AI systems at scale while collaborating closely with engineering teams and ensuring effective deployment and monitoring practices.

At Toku, we create bespoke cloud communications and customer engagement solutions to reimagine customer experiences for enterprises. We provide an end-to-end approach to help businesses overcome the complexity of digital transformation and deliver mission-critical CX through cloud communication solutions. Toku combines local strategic consulting expertise, bespoke technology, regional in-country infrastructure, connectivity, and global reach to serve the diverse needs of enterprises operating at scale. Headquartered in Singapore, Toku supports customers across APAC and beyond, with a growing footprint across global markets.

 

In this role, you will focus on enabling AI systems to run reliably, efficiently, and at scale in production. You will manage the platforms, pipelines, and infrastructure that allow applied AI engineers to deploy, monitor, and scale models across cloud environments. Success in this role depends on strong MLOps expertise, comfort with cloud-native AI workloads, and close collaboration with infrastructure and engineering teams.

Requirements

What you will be doing

 

  • AI platform & MLOps ownership: Design, improve, and operate MLOps pipelines for training, deploying, and managing ML models in production.

  • Model deployment pipelines: Build and maintain CI/CD-style workflows for model packaging, versioning, and deployment across environments.

  • Cloud infrastructure for AI: Operate and optimise AWS-based infrastructure for AI workloads, including compute, storage, and networking components.

  • GPU scaling & performance: Manage GPU-enabled workloads, addressing scalability, reliability, and cost-efficiency for high-load AI applications.

  • Monitoring & reliability: Implement monitoring and alerting for deployed models, focusing on system health, performance, and operational stability.

  • Tooling & standardisation: Own and evolve shared tooling such as MLflow, Docker-based workflows, and deployment frameworks to improve developer productivity.

  • Collaboration with infra teams: Work closely with infrastructure, SRE, and engineering teams to align AI platform practices with broader system standards.

  • Production support: Support live AI services by diagnosing deployment, scaling, and infrastructure-related issues impacting AI features.

  • Lifecycle management: Ensure reproducibility, traceability, and governance across the full ML lifecycle, from experimentation to production.

 

We’d love to hear from you if you have

 

  • MLOps expertise: Hands-on experience building and operating MLOps pipelines for production ML systems.

  • Cloud-native AI experience: Strong experience with AWS services used for AI workloads, including EC2, ECS, and SageMaker.

  • Containerisation & orchestration: Practical experience with Docker and container-based deployment of ML workloads.

  • ML tooling: Experience with MLflow or similar tools for experiment tracking, model versioning, and lifecycle management.

  • Scalability & performance: Experience managing GPU-based workloads and addressing performance and cost challenges at scale.

  • Infrastructure mindset: Strong understanding of cloud infrastructure concepts as they apply to ML systems.

  • Python for ML systems: Ability to work with Python-based ML codebases to support deployment and lifecycle needs.

  • AI awareness: Working familiarity with LLMs, NLP models, and applied ML concepts sufficient to support deployment and monitoring (without owning core model development).

  • Production experience: Proven experience supporting live, production ML systems with real customer impact.

  • Collaboration skills: Ability to work cross-functionally with applied AI engineers, backend engineers, and infra teams.

 

What would you get?

 

  • Training and Development

  • Discretionary Yearly Bonus & Salary Review

  • Healthcare Coverage based on location

  • 20 days Paid Annual Leave (excluding Bank holidays)

 

Toku has been recognised as a LinkedIn Top Startup and by the Financial Times as one of APAC’s Top 500 High Growth Companies. If you’re looking to be part of a company on a strong growth trajectory while working on meaningful, real-world challenges, we’d love to hear from you.

Perks & Benefits Extracted with AI

  • Health Insurance: Healthcare coverage based on location
  • Paid Time Off: 20 days paid annual leave (excluding bank holidays)
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

AI Engineer Q&A's
Report this job
Apply for this job