At Viaduct, we use patented AI to discover hidden patterns in complex time series data – so manufacturers and operators of connected equipment can deliver transformative business results from their data, fast. On our platform we deliver solutions across the equipment lifecycle, from manufacturing productivity, manufacturing quality, service operations and fleet management.

Who You Are

You are a thoughtful engineer. You understand the complexities of distributed systems and how to triage and solve issues that arise with them. Scalability is top of mind when designing any system or writing code. You believe building a better ETL system requires close collaboration with the machine learning and data science teams.

About the Role

As an data engineer at Viaduct, an analytics and ML platform company, your work is critical to our success. You are responsible for ensuring data quality and reliability in every part of our ETL pipeline, from ingestion to client integrations.

Responsibilities

Creating and supporting batch, incremental, and real-time ETL pipelines
Standardizing ingestion, validation, and cleaning processes across clients
Automating data validation to increase data quality
Managing and evolving schemas in all parts of our pipeline
Monitoring, tuning, and optimizing Spark jobs
Becoming a domain-expert in connected vehicle data

About You

4+ years as a data engineer
Experience as a tech lead or mentor
Proficiency in Python/Scala/Java/C++ and SQL
3+ years of experience with Spark or equivalent technologies
2+ years of experience with a workflow scheduler (Airflow, Prefect, Argo, etc)
2+ years of experience with distributed file-systems (HDFS, S3, etc)
Familiar with the tools in open-source data ecosystem (Apache, CNCF, etc)
Experience with incremental or real-time processing (Delta Lake, Apache Hudi, Kafka Stream, Spark Streaming, etc)

Security and Privacy Responsibilities

Follow our policy and procedure documents related to security and privacy
Follow the security and privacy guidelines in the Employee Handbook
Participate in new hire and annual training for security and privacy
Treat data security and privacy as one of your primary job responsibilities
Report Security Incidents you discover as bugs
Get approval from the Security Team before adding new 3rd party software to our codebase
Explicitly consider security implications when doing PR reviews

Bonus

Experience with Kubernetes
Experience working with ML teams
Contributor to open source projects
Experience in the Automotive industry or a love of cars
Prior work in small, agile teams

Data Integration Engineer

AI overview