HighLevel
Lead Data Engineer
TLDR
Own the event ingestion and identity layer while ensuring data reliability and building foundational datasets for downstream modeling.
About HighLevel:
HighLevel is an AI-powered business operating system that gives agencies, entrepreneurs and SMBs the infrastructure to build, automate and scale. Today, HighLevel supports SMBs across 150+ countries, fueling community-driven growth rooted in real customer outcomes.
To date, businesses operating on HighLevel have generated over $7 billion in ecosystem value, demonstrating the impact of shared infrastructure at scale. By centralizing conversations, automation and intelligence into one system, we help businesses move faster, reduce complexity and execute efficiently.
Behind the platform, HighLevel powers more than 4 billion API hits and 2.5 billion message events daily. With 250 terabytes of distributed data, 250+ microservices and over 1 million domain names supported, our architecture is built for performance, resilience and long-term scalability.
Our People
With over 2,000 team members across 10+ countries, HighLevel operates as a global, remote-first organization built for speed and ownership. We value initiative, clarity and execution, creating space for ambitious people to build systems that support millions of businesses worldwide. Here, innovation thrives, ideas are celebrated and people come first, no matter where they call home.
Our Impact
Every month, HighLevel enables more than 1.5 billion messages, 200 million leads and 20 million conversations for the more than 1 million businesses we support. Behind those numbers are real people building independence, expanding opportunity and creating measurable impact. We’re proud to be a part of that.
Learn more about us on our YouTube Channel or Blog Posts
About the Role:
Define event schemas, required fields, and compatibility rules in collaboration with the CDP team
Implement automated validation and contract enforcement to prevent breaking schema changes
Maintain versioning and compatibility guarantees for event producers and downstream consumers
Build and maintain pipelines that ingest, validate, and process high-volume product events
Ensure event streams are deduplicated, ordered correctly, and safe for downstream consumption
Partner with platform teams to ensure ingestion pipelines scale with product growth
Define and maintain identity stitching logic across anonymous and authenticated users
Handle identity merges, splits, and corrections while preserving tenant boundaries
Ensure identity resolution remains explainable, deterministic, and safe for downstream datasets
Design workflows that allow event datasets and identity graphs to be replayed or rebuilt safely
Build tooling for historical corrections, schema evolution, and dataset reprocessing
Ensure downstream models can be rebuilt without manual intervention when definitions evolve
Provide guidance and tooling that help product teams emit events consistently
Maintain validation checks and schema enforcement that catch instrumentation issues early
Collaborate with engineering teams to evolve instrumentation safely over time
Ensure deletion and suppression requests propagate correctly through event and identity pipelines
Partner with governance and security teams to support policy requirements
Define requirements and interfaces for event infrastructure and downstream analytical systems
Work with platform teams to ensure pipelines remain reliable, scalable, and observable.
Requirements:
7+ years of experience in data engineering, platform engineering, or product data roles
Strong experience building and operating event ingestion or streaming pipelines
Experience implementing schema validation, data contracts, or event governance frameworks
Strong SQL and Python, with experience building data processing or validation tooling
Familiarity with identity resolution, entity resolution, or customer identity systems
Experience operating analytical data systems or large-scale event datasets
We are looking for a Lead Data Engineer to own the event ingestion and identity layer that connects product instrumentation to downstream analytical systems.
This role focuses on the operational reliability and correctness of event and identity data as it moves through the data platform. You will design and operate pipelines, schema validation, and replay workflows that ensure product events remain consistent and safe to use for analytics and customer-facing reporting.
You will work closely with product engineering teams on instrumentation patterns, with the CDP team on event contracts and definitions, and with platform teams to ensure event infrastructure and analytical systems scale reliably. This role builds the foundational event and identity datasets required for reliable downstream modeling. Behavioral models, canonical entities, and business analytics datasets are owned by the analytics engineering team.
Responsibilities:
This role focuses on the operational reliability and correctness of event and identity data as it moves through the data platform. You will design and operate pipelines, schema validation, and replay workflows that ensure product events remain consistent and safe to use for analytics and customer-facing reporting.
You will work closely with product engineering teams on instrumentation patterns, with the CDP team on event contracts and definitions, and with platform teams to ensure event infrastructure and analytical systems scale reliably. This role builds the foundational event and identity datasets required for reliable downstream modeling. Behavioral models, canonical entities, and business analytics datasets are owned by the analytics engineering team.
EEO Statement:
The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government record-keeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.
We encourage you to review our Privacy Policy before submitting your application.
#LI-Remote #LI-NJ1
The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government record-keeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.
We encourage you to review our Privacy Policy before submitting your application.
#LI-Remote #LI-NJ1
HighLevel is an all-in-one white-label sales and marketing platform that empowers marketing agencies, entrepreneurs, and businesses to enhance their digital presence and drive growth. With a suite of robust tools designed to capture, nurture, and convert leads, HighLevel supports a diverse community of over 2 million clients across various industries.
- Founded
- Founded 2018
- Employees
- 201-500 employees
- Industry
- Internet Software & Services
- Total raised
- $60M raised
Lead Data Engineer