Senior AI/ML Engineer (WFH) - #34523

Cebu City , Philippines
full-time

AI overview

Design and implement benchmarks to evaluate and optimize AI capabilities for enterprise applications, influencing SaaS AI performance through innovative evaluation methodologies.
  • Design and implement agent evaluation pipelines that benchmark AI capabilities across real-world enterprise use cases
  • Build domain-specific benchmarks for product support, engineering ops, GTM insights, and other verticals relevant to modern SaaS
  • Develop performance benchmarks that measure and optimize latency, safety, cost-efficiency, and user-perceived quality
  • Create search- and retrieval-oriented benchmarks, including multilingual query handling, annotation-aware scoring, and context relevance
  • Partner with AI and infra teams to instrument models and agents with detailed telemetry for outcome-based evaluation (Member of Technical Staff: AI Performance1)
  • Drive human-in-the-loop and programmatic testing methodologies for fuzzy metrics like helpfulness, intent alignment, and resolution effectiveness
  • Contribute to company’s open evaluation tooling and benchmarking frameworks, shaping how the broader ecosystem thinks about SaaS AI performance

Requirements

  • 3 to 7 years of experience in systems, infra, or performance engineering roles with strong ownership of metrics and benchmarking
  • Fluency in Python and comfort working across full-stack and backend services
  • Experience building or using LLMs, vector-based search, or agentic frameworks in production environments
  • Familiarity with LLM model serving infrastructure (e.g., vLLM, Triton, Ray, or custom -Kubernetes-based deployments), including observability, autoscaling, and token streaming
  • Experience working with model tuning workflows, including prompt engineering, fine-tuning (e.g., LoRA, DPO, or evaluation loops for post-training optimization
  • Deep appreciation for measuring what matters — whether itʼs latency under load, degradation in retrieval precision, or regression in AI output quality
  • Familiarity with evaluation techniques in NLP, information retrieval, or human-centered AI (e.g. RAGAS, Recall@K, BLEU, etc.)
  • Strong product and user intuition — you care about what the benchmark represents, not just what it measures

Advantageous Skills:

  • Experience contributing to academic or open-source benchmarking projects

Manila Recruitment is a top recruitment agency in the Philippines, offering hiring solutions for executive search, IT, developers, managers, and specialized roles. With a database of over 250,000 candidates, we provide innovative headhunting services a...

View all jobs
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

ML Engineer Q&A's
Report this job
Apply for this job