About Us At Qloo, we harness large-scale behavioral and catalog data to power recommendations and insights across entertainment, dining, travel, retail, and more. Our platform is built on a modern AWS data stack and supports analytics, APIs, and machine-learning models used by leading brands. We are looking for an experienced Data Engineer to help evolve and scale this platform. Role Overview As a Data Engineer at Qloo, you will design, build, and operate the pipelines that move data from external vendors, internal systems, and public sources into our S3-based data lake and downstream services. You’ll work across AWS Glue, EMR (Spark), Athena/Hive, and Airflow (MWAA) to ensure that our data is accurate, well-modeled, and efficiently accessible for analytics, indexing, and machine-learning workloads. You should be comfortable owning end-to-end data flows, from ingestion and transformation to quality checks, monitoring, and performance tuning. Responsibilities - Design, develop, and maintain batch data pipelines using Python, Spark (EMR), and AWS Glue, loading data from S3, RDS, and external sources into Hive/Athena tables. - Model datasets in our S3/Hive data lake to support analytics (Hex), API use cases, Elasticsearch indexes, and ML models. - Implement and operate workflows in Airflow (MWAA), including dependency management, scheduling, retries, and alerting via Slack. - Build robust data quality and validation checks (schema validation, freshness/volume checks, anomaly detection) and ensure issues are surfaced quickly with monitoring and alerts. - Optimize jobs for cost and performance (partitioning, file formats, join strategies, proper use of EMR/Glue resources). - Collaborate closely with data scientists, ML engineers, and application engineers to understand data requirements and design schemas and pipelines that serve multiple use cases. - Contribute to internal tooling and shared libraries that make working with our data platform faster, safer, and more consistent. - Document pipelines, datasets, and best practices so the broader team can easily understand and work with our data. Qualifications - Bachelor’s degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience. - Experience with Python and distributed data processing using Spark (PySpark) on EMR or a similar environment. - Hands-on experience with core AWS data services, ideally including: - S3 (data lake, partitioning, lifecycle management) - AWS Glue (jobs, crawlers, catalogs) - EMR or other managed Spark platforms - Athena/Hive and SQL for querying large datasets - Relational databases such as RDS (PostgreSQL/MySQL or similar) - Experience building and operating workflows in Airflow (MWAA experience is a plus). - Strong SQL skills and familiarity with data modeling concepts for analytics and APIs. - Solid understanding of data quality practices (testing, validation frameworks, monitoring/observability). - Comfortable working in a collaborative environment, managing multiple projects, and owning systems end-to-end. We Offer - Competitive salary and benefits package, including health insurance, retirement plan, and paid time off. - The opportunity to shape a modern cloud-based data platform that powers real products and ML experiences. - A collaborative, low-ego work environment where your ideas are valued and your contributions are visible. - Flexible work arrangements (remote and hybrid options) and a healthy respect for work-life balance.

Data Engineer

AI overview

Perks & Benefits Extracted with AI