Staff Data Engineer

New York, U.S.

$200,000 – $325,000 per year

TLDR

Build and operate systems for managing data across 80+ trial sites, enhancing access to novel therapies through technology and AI-driven insights.

Iterative Health is a healthcare technology and services company powering the acceleration of clinical research to transform patient outcomes. The Iterative Health Site Network is a premier network of 100+ clinical research sites across the US and Europe, accelerating the path to market for novel therapies. By combining deep expertise in clinical trials with cutting-edge AI, we empower research teams and study sponsors to expand and expedite access to novel therapeutics for patients in need.

About the Role

Accelerating clinical research is one of the defining challenges in healthcare. Promising therapies exist that patients can't access because the operational infrastructure to run clinical trials efficiently doesn't exist yet. We're building it. That means designing technology systems that bring order to a fragmented landscape of clinical data sources, automating the operational work that slows trials down, and turning real-world clinical data into a foundation for predictive intelligence.

We're building a uniquely valuable data asset: real-world patient and research data flowing across 80+ trial sites, spanning dozens of EHRs and clinical systems, focused on patient populations that are chronically underserved by existing clinical research infrastructure. Your job is to build the pipelines, data models, and AI infrastructure that make this asset real, from ingestion and normalization through to the systems that power predictions on top of it. You'll own data quality and observability as foundational engineering problems. You'll also have a direct hand in shaping how this data drives our AI strategy, what we model, what we predict, and what becomes possible.

This is an opportunity for someone who wants to be part of a small, fast-moving engineering team at a formative stage. You'll shape what gets built, how decisions get made, and what the team becomes.

Responsibilities

Own the data layer and architecture: the models, schemas, and infrastructure decisions that everything downstream depends on
Build and operate the pipelines and transformations that move data from ingestion through normalization, enrichment, and into the formats that support analytics, ML training, and production model serving
Own data quality and observability: build the systems that make data issues visible and correctable before they compound
Partner with ML and engineering teams to identify what's modelable, define training data requirements, and build the data foundations for new predictive capabilities
Define how clinical and operational data is governed across the system
Evaluate and select the tools and technologies that make up the data stack, with a clear point of view on build vs. buy
Help shape the engineering culture of a small, growing team: how technical decisions get made, how problems get debated, what rigor looks like in practice

What We’re Looking For

Required Qualifications

10+ years of experience in data engineering or related roles, with significant time spent building data systems
Experience with healthcare data strongly preferred (HL7, FHIR, claims, EHR extracts) or other complex, regulated data domains
Deep experience modeling and integrating data from multiple heterogeneous sources with inconsistent schemas and quality
Experience applying AI and LLMs to data engineering problems: extraction, normalization, classification, entity resolution
Strong understanding of how data infrastructure supports ML workflows from feature engineering to training data pipelines to model serving
Fluent in SQL and at least one modern programming language (Python, Java, Scala, Go), with experience across modern data infrastructure - distributed processing, streaming, cloud-native storage, orchestration, and transformation frameworks
Have built data systems from early stages, making foundational decisions with incomplete information
Naturally raise the quality of the engineering around you through code review, design guidance, and honest technical conversation

Preferred Qualifications

Experience building data infrastructure that directly supports ML model training and evaluation
Familiarity with clinical trial operations, EDC systems, or life sciences data
SOC 2, HIPAA or similar compliance experience baked into engineering practice
A track record of building or improving data systems that others had given up on making reliable

New York pay range

$200,000—$325,000 USD

At Iterative Health, we’re actively working towards creating an environment that is representative of the diversity of patients our technology serves. We are focused on building an equitable and inclusive culture, and by extension, hiring process. If you require any accommodations to make the application process or interviewing experience more accessible to you, please contact [email protected].

Apply for this job

Iterative Health

Iterative Health leverages AI and machine learning to enhance gastrointestinal (GI) care through optimized clinical trials and advanced disease assessment tools. We serve clinical research sites across the US and Europe, empowering them with innovative technology to accelerate the delivery of novel therapies to patients in need. Our deep expertise in clinical trials and our extensive site network uniquely position us to transform patient outcomes in GI and hepatology.

Founded: Founded 2017
Employees: 51-200 employees
Industry: Health Care Providers & Services
Total raised: $180M raised

View company profile

Staff Data Engineer

Report this job