About Us
HG Insights is the global leader in technology intelligence, delivering actionable AI driven insights through advanced data science and scalable big data solutions. Our platform informs go-to-market decisions, and we influence how businesses spend millions of marketing and sales budgets.
What You’ll Do:
- Design, build, and optimize large-scale distributed data pipelines for processing billions of events from and to multiple sources of data.
- Architect and scale enterprise-grade big-data systems, including data lakes, ETL/ELT workflows, and syndication platforms for customer-facing products.
- Orchestrate pipelines and workflows with Airflow.
- Write optimized data into analytical databases such as ClickHouse, DuckDB, and Redshift
- Ensure data quality, consistency, and reliability across the pipeline.
- Monitor pipeline performance and troubleshoot data issues.
- Collaborate with product teams to develop features across databases, and backend services.
- Implement cutting-edge solutions for data ingestion, transformation, and analytics.
- Drive system reliability through automation, CI/CD pipelines (Docker, Kubernetes, Terraform), and infrastructure-as-code practices.
What You’ll Be Responsible For
- Development of the Data side of our Platform, ensuring scalability, performance, and cost-efficiency across distributed systems.
- Collaborating in agile workflows (daily stand-ups, sprint planning) to deliver features rapidly while maintaining system stability.
- Ensuring security and compliance across data workflows, including access controls, encryption, and governance policies.
What You’ll Need
- BS/MS/Ph.D. in Computer Science or related field, with 7+ years of experience building production-grade big data systems.
- Strong SQL skills and solid data modeling fundamentals with hands-on experience building ETL / ELT pipelines.
- Proficiency in Python for data processing and integrations.
- Familiarity with dbt, Airflow, Databricks and modern analytics engineering practices
Nice-to-Haves
- Knowledge of data governance frameworks and compliance standards (GDPR, CCPA).
- Contributions to open-source big data projects or published technical blogs/papers.
- DevOps proficiency in monitoring tools (Prometheus, Grafana) and serverless architectures.