Infrrd

Senior AI Systems Engineer

Bengaluru, India

Full-Time

On-site

TLDR

Design and build automated evaluation pipelines and scalable extraction systems to improve document accuracy at scale, replacing brute-force prompts with agentic tooling and feedback loops.

Hello there! Infrrd here — it’s pronounced In-fur-d.

We’re an Enterprise AI company that uses AI and Machine Learning to help global
organisations automate data extraction from complex documents — invoices, contracts,
insurance claims, and more. Our customers are some of the world’s leading enterprises in
mortgage, insurance, and manufacturing, and we’ve been profitable and independent since
2016.

Job Purpose:

To build the automated systems that measure, diagnose, and improve document extraction and classification accuracy at scale. This role eliminates the manual bottleneck in the accuracy improvement cycle — replacing brute-force prompt iteration with agentic evaluation pipelines, automated feedback loops, and intelligent internal tooling. The engineer in this role makes the entire team faster without proportionally increasing headcount, and enables systematic accuracy improvement as a repeatable engineering capability rather than an ad-hoc effort.

Job Duties and Responsibilities

Design and build agentic evaluation pipelines: error detection → root cause → hypothesis generation → prompt variant testing → A/B measurement → production promotion, with minimal human intervention.
Own the accuracy measurement infrastructure — automate error analysis, data quality pipelines, and batch evaluation frameworks across document types and customer configurations.
Build and evolve internal accuracy tooling from manual utilities into automated improvement platforms — classification and extraction correction loops, NTP rule generation, performance reporting.
Take prototype methodologies and productionize them into reliable, scalable systems the team can operate independently.
Build LLM-based extraction and classification pipelines using few-shot and RAG strategies for complex, real-world document types.
Design and maintain A/B testing infrastructure for prompt and model changes — no untested changes go to production.
Create live dashboards tracking extraction accuracy, NTP rates, and false positive rates across document types and customer configurations.
Optimize LLM costs while maintaining quality: prompt compression, output token minimization, model selection and migration strategies.
Write production-grade data pipelines with error handling, retries, logging, and monitoring.
Collaborate with platform engineering and applied research functions on architecture and methodology translation.
Mentor 1–2 junior engineers; build tooling and documentation they can operate independently.

Required Qualifications

BE / MTech in Computer Science, AI/ML, Computational Data Science (CDS), Computer Science & Automation (CSA), or related discipline.

Experience Range
8-10 years total; minimum 4-6 years building production LLM or AI systems; minimum 4-6 years in evaluation, quality measurement, or accuracy improvement work.

"Must-have" Skills

Production-grade Python — clean, tested, maintainable systems; not just scripts (pytest, FastAPI or Flask)
Hands-on LLM API experience (OpenAI, Anthropic, Gemini, AWS Bedrock or equivalent) with systematic, measurement-driven prompt engineering — methodology over instinct
Agentic pipeline design — multi-step reasoning, tool use, orchestration frameworks (LangChain, LlamaIndex or equivalent), automated evaluation and feedback loops
Evaluation framework design for LLM systems — precision/recall/F1, confusion matrices, A/B testing, per-class error analysis
Analytical depth sufficient to design meaningful accuracy metrics and interpret why a model fails on a specific document or field type
MongoDB or equivalent NoSQL — queries, aggregations, indexing pandas / numpy for data processing and batch analysis
Git, code reviews, CI/CD basics (GitHub Actions or Jenkins)
Clear written communication — able to explain model behaviour and accuracy findings to non-technical stakeholders

"Would-be-nice" Skills

Document AI: PDF parsing, layout-aware extraction, OCR, structured form extraction
RAG pipeline design and vector search (Pinecone, Weaviate, or similar)
Classification systems with large label spaces (50+ classes)
Async Python (asyncio, aiohttp) for pipeline throughput
Embedding models and semantic similarity for document matching
Prior experience working alongside a Research or Applied Science team as the engineering counterpart

Working Knowledge (Tools)
Python, FastAPI / Flask, MongoDB, Git, GitHub Actions / Jenkins, LLM APIs (OpenAI / Anthropic / Gemini or equivalent), LangChain / LlamaIndex, Pandas / Numpy, Pytest, Docker

General Knowledge

NLP concepts, LLM prompt engineering patterns, REST APIs, RAG pipelines, vector databases, JSON data structures

Thorough Knowledge

Agentic workflow design and orchestration, LLM evaluation metrics (F1 / Precision / Recall, per-class analysis, confusion matrices), production Python systems (error handling, retries, logging, monitoring), NoSQL aggregations, systematic A/B testing for model changes, prompt optimization methodology

By submitting your application, you agree that your personal information and resume may be collected, processed, and stored by us for recruitment purposes, including consideration for future roles.

Apply for this job

Infrrd

Infrrd builds an Intelligent Document Processing platform that leverages AI and Machine Learning to automate data extraction from complex documents. This solution is designed for enterprises looking to streamline their operations by reducing the manual effort involved in processing unstructured data.

Employees: 201-500 employees
Industry: IT Services

View company profile

Systems Engineer

Report this job