Backend AI & Data Pipeline Engineer

TLDR

Design and maintain scalable, event-driven data processing pipelines for an intelligent course and job matching platform using advanced technologies and cost-controlled architecture.

About the role

We are looking for a Backend AI & Data Pipeline Engineer to own the end-to-end data processing infrastructure that powers Yuzee's intelligent course and job matching platform. You will design and maintain scalable, event-driven pipelines that process tens of thousands of daily records, generate semantic embeddings, and feed a growing knowledge graph used for personalised career pathway recommendations.

What you'll do

  • Design and maintain three distinct processing pipelines — scheduled job ingestion, event-driven course processing, and a periodic knowledge graph builder — each with independent trigger logic and cost controls
  • Generate and manage semantic embeddings via Amazon Bedrock (Titan v2), index them in MongoDB Atlas Vector Search, and calibrate similarity thresholds to ensure match accuracy
  • Build and maintain a knowledge graph linking jobs, courses, skills, and industries using FP-Growth association rules and archetype-to-SOC code mapping
  • Build and improve a two-stage discovery and matching API on AWS Lambda — vector retrieval first, then deep eligibility scoring with LLM re-ranking
  • Right-size Fargate Spot instances and design resumable processing loops that tolerate interruption, keeping infrastructure costs under control as data volume scales
  • Maintain and improve daily job scrapers across multiple sources and build institution data scrapers with robust HTML cleaning pipelines

What we're looking for

  • 1+ years of backend engineering experience focused on data pipelines, ML infrastructure, or search systems
  • Hands-on experience with AWS serverless and container services — Lambda, ECS Fargate, EventBridge, and Step Functions
  • Strong Python skills — Pandas, async processing, bulk database operations, and text cleaning
  • Familiarity with vector databases and semantic similarity search; MongoDB Atlas Vector Search experience is a strong plus
  • Cost-conscious infrastructure mindset — you think in per-record compute costs, free tiers, Spot resilience, and right-sizing
  • Ability to document and communicate complex architecture clearly to both technical and non-technical stakeholders

Nice to have

  • Experience with knowledge graphs or association rule mining (FP-Growth, Apriori)
  • Experience using LLMs for re-ranking or eligibility assessment on top of vector retrieval results
  • Background in edtech, jobtech, or recommendation/matching systems

Degree or existing proven experience 

Benefits 

  • You can work from home for the whole internship period
  • A reference letter can be requested upon completion of internship
  • A bit of flexibility with working time aside from the usual 9am to 6pm (Ex. 8am to 5pm / 7:30am to 4:30pm)
  • The possibility of retainment for part-time or Full-time work post-internship based on your performance, even if you are not based in Malaysia

Benefits

Flexible Work Hours

A bit of flexibility with working time aside from the usual 9am to 6pm (Ex. 8am to 5pm / 7:30am to 4:30pm)

Remote-Friendly

You can work from home for the whole internship period

Seeka Technology builds an A.I.-powered platform designed to connect students and job seekers with the right educational and career opportunities. Targeting individuals from kindergarten through university, along with vocational and language training centers, Seeka helps simplify the process of finding and applying to relevant programs and jobs. What sets us apart is our commitment to creating seamless matches between talented individuals and the institutions or businesses that need them.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Engineer Q&A's
Report this job
Apply for this job