Senior Software Developer, Distributed Systems & Data Platform
TLDR
Play a key role in designing and operating the data infrastructure that powers Caseware’s AI systems with a focus on building scalable pipelines and enhancing system reliability.
-
Design and implement reliable, scalable data ingestion and integration pipelines for structured, semi-structured, unstructured, and multi-modal data (e.g., databases, documents, APIs, events), ensuring data is AI-ready, governed, secure, and observable.
-
Build and scale retrieval infrastructure, including vector storage, embedding pipelines, hybrid search, and graph-based knowledge representations, while optimizing data modeling for retrieval quality.
-
Develop and operate agent memory systems and pipelines for AI system signals (tracing, feedback, evaluation, and usage data) to support observability and continuous improvement.
-
Apply data quality, validation, monitoring, and testing frameworks in production pipelines, ensuring governance, access control, lineage, and security standards are met, including safe handling of sensitive data in AI retrieval and generation workflows.
-
Monitor, troubleshoot, and optimize AI data pipelines and retrieval workflows for reliability, performance, and cost, with strong observability and resilient processing patterns.
-
Design and support evaluation workflows for AI systems, enabling offline testing, benchmarking, and continuous improvement of retrieval and agent performance over time.
-
Lead pragmatic platform evolution by defining clear contracts between AI services and data systems, reducing coupling, and improving developer experience.
-
Strong software engineering fundamentals, including designing maintainable, testable systems and owning features end-to-end.
-
Production experience with distributed systems, including async workflows, failure modes, retries, and eventual consistency.
-
Hands-on experience building and operating data pipelines for AI systems, such as embeddings pipelines, retrieval workflows, or feedback data processing.
-
Experience working with AI-related data infrastructure, including vector databases, search systems, or graph-based storage.
-
Experience with retrieval systems (RAG), embedding pipelines, or hybrid search (vector + keyword).
-
Experience with agent frameworks, agent memory systems, or orchestration of tool-using AI systems.
-
Experience designing pipelines for observability data, including traces, logs, metrics, or user feedback loops for AI evaluation.
-
Experience operating production systems, including monitoring, incident response, and continuous improvement.
-
Cloud experience on AWS building production systems, including storage, messaging, and orchestration.
-
Experience with infrastructure as code, with CDK preferred and CloudFormation or Terraform acceptable.
-
Strong collaboration and communication skills, with the ability to mentor and raise engineering maturity through reviews and design discussions.
-
Strong English language communication and collaboration skills.
-
Backend & Platform: TypeScript, NestJS, Python
-
Cloud & Infrastructure: AWS EKS, AWS Lambda, AWS Bedrock, AWS AgentCore
-
Search & Retrieval: AWS OpenSearch Serverless, AWS S3 Vectors, AWS Knowledge Bases
-
Document & Data Processing: AWS Textract, DynamoDB, S3
-
AI Evaluation & Observability: LangFuse, LangSmith (or equivalent)
-
AI-assisted development tools: GitHub Copilot, AWS Kiro
-
Developer Tooling: GitHub, GitHub Actions, Nx Monorepo
-
Collaboration: Jira, Confluence, Microsoft Teams, Outlook
Caseware is evolving Caseware Cloud to deliver intelligent, data-driven experiences—powering analytics, automation, and AI/agentic capabilities on top of a modern data platform.
This role is for someone who can bridge transactional backend systems and data-intensive distributed workflows. You’ll work on systems that combine:
-
APIs and domain services (microservices, relational modeling, service boundaries)
-
Asynchronous workflows (messaging, retries, idempotency, replay safety)
-
Distributed/batch data processing (Spark-based processing and lake patterns)
-
Cloud platform primitives (AWS orchestration and managed services)
-
AI-ready retrieval workflows (embedding + vector retrieval pipelines)
-
Improved reliability and operability of ingestion + async workflows (clearer idempotency/replay patterns, fewer recurring incidents).
-
Cleaner boundaries between orchestration/control-plane concerns and data-processing execution concerns.
-
Better observability across APIs, queues, workflows, and distributed jobs.
-
Clearer data contracts and more predictable schema evolution practices.
-
Tangible improvements in developer experience (local run, testing, reduced “environment-only” hacks).
Benefits
Health Insurance
Prepaid Medicine
Home Office Stipend
Upgrade vacation after 5 years
Upgrade vacation starting at 5 years of service
Paid Time Off
5 Personal Time Off days per year
CaseWare develops sophisticated software solutions tailored for accounting firms, corporations, and government entities. With a global reach and over 30 years of expertise, we empower users to optimize audits and financial reporting, transforming complex data into actionable insights.