Position Overview
We're looking for a data engineer to help us turn raw driving data into the well-structured, queryable, and trustworthy datasets that power our autonomy stack. You'll work across the full lifecycle of our data, from ingestion to the point it gets pulled into a training run, and your work will directly shape how quickly the rest of the team can iterate.
Key Responsibilities
Build and maintain pipelines that ingest, validate, and process multi-modal sensor logs from our vehicles
Design schemas and data models that make our driving data discoverable and queryable for ML training, evaluation, and debugging
Turn raw driving data into the derived signals, annotations, and aggregates that downstream teams consume
Write tooling that the broader team relies on day-to-day: data loaders, query interfaces, dataset assembly utilities
Collaborate closely with ML, vehicle software, curation, and fleet operations to make sure data flows smoothly from collection through to model training
Contribute to the design of our data stack, making decisions that scale with the team and the fleet
Minimum Qualifications
BS, MS, or PhD in Computer Science, Engineering, Robotics, or a related field — or equivalent industry experience
Strong proficiency in Python, including writing maintainable code that other engineers will read and extend
Solid database fundamentals and an intuition for designing schemas that hold up as requirements evolve
Working understanding of how ML training pipelines consume data, and an eye for designing upstream systems that serve them well
Comfortable working in large codebases and modern build/dev environments (Bazel, monorepos, dev containers, or similar)
Curious, flexible, and pragmatic — able to pick up unfamiliar tools and reason from first principles rather than relying on prior recipes
Eligible to work in the United States
Preferred Qualifications
Experience working with data in an autonomous vehicle, robotics, or similar context
Familiarity with Foxglove, rerun, or similar visualization/data-platform tooling
Experience designing or maintaining data catalogs, metadata stores, or feature stores
Background in handling high-volume multi-modal data (video, point clouds, time-series) at terabyte-plus scale
Cloud data engineering experience (GCP or AWS — object storage, serverless triggers, batch processing)
Comfort operating as an early team member — high ownership, low ego, fast iteration
Compensation
This role is eligible for base salary + benefits + equity compensation. Salary ranges are determined by role, level, and location. Within the range, individual pay is determined by additional factors, including qualifications, skills, experience, and location.
Additional Information
As part of the interview process, we may use Artificial Intelligence (AI) tools to compare your qualifications and experience to the job description. A human reviews all AI output and makes a final hiring decision. Humble Robotics does not rely on the output to make any employment decisions. Some applicants may have a legal right to opt-out of the use of AI as part of our interview process. Contact **[email protected]** to exercise this right or if you have further questions on the use of AI tools in our hiring process.
Humble Robotics is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, national origin, gender, age, religion, disability, sexual orientation, veteran status, marital status or any other characteristics protected by law. Humble Robotics will consider qualified applicants with arrest and conviction records in a manner consistent with local ordinances.