Data Engineer
TLDR
Own the end-to-end lifecycle of data from ingesting raw event streams to building reliable datasets that power analytics and personalization at WebEngage.
- We are built for scale.
- We are built for complexity.
- We are built for outcomes.
- A powerful Customer Data Platform (CDP)
- Real-time behavioral segmentation and intelligence
- Omnichannel journey orchestration
- AI-driven personalization and recommendations
- Deep analytics, experimentation, and revenue attribution
- WebEngage BLACK: our AI-native layer that brings Agentic capabilities to engagement.
- Design, build, and maintain production-grade ETL/ELT pipelines that ingest data from APIs, databases, event streams (Kafka/Pub-Sub), and flat files into the central data warehouse.
- Implement idempotent, incremental load patterns with built-in retry logic, dead-letter queues, and SLA-based alerting to ensure zero-data-loss pipelines.
- Own pipeline observability — set up data freshness checks, row-count validations, schema drift detection, and anomaly alerts using tools like Great Expectations or dbt tests.
- Translate business requirements into clean dimensional models (star/snowflake schemas) and maintain a well-documented data catalogue.
- Design slowly changing dimensions (SCD Type 1/2), bridge tables, and fact tables optimised for analytical query patterns.
- Enforce partitioning, clustering, and materialised view strategies to keep warehouse costs under control while maintaining sub-second query performance.
- Write clean, modular, well-tested Python and SQL code. Follow DRY principles, use version control (Git), and participate in peer code reviews.
- Build reusable transformation frameworks using dbt or equivalent tooling, with proper documentation and testing at every layer (staging → intermediate → mart).
- Containerise data services with Docker and automate deployments via CI/CD pipelines (GitHub
- Build interactive dashboards and analytical tools using Streamlit, enabling stakeholders to explore metrics, run ad-hoc analyses, and make data-driven decisions without engineering dependency.
- Design and maintain BI layers — semantic models, KPI definitions, and pre-aggregated mart tables that serve as the single source of truth for reporting across teams.
- Translate raw data into compelling visual narratives using libraries like Plotly, Matplotlib, or Altair; present findings to both technical and non-technical audiences.
- Partner with product managers, analysts, and data scientists to understand data needs and proactively identify gaps in current data coverage.
- Document data lineage, transformation logic, SLAs, and known limitations in a shared knowledge base to enable self-service analytics.
- Contribute to internal engineering guilds, knowledge-sharing sessions, and post-incident reviews for pipeline failures.
- Strong SQL skills with expertise in complex queries, performance optimization, and cost-efficient design on cloud data warehouses (BigQuery, Redshift).
- Strong Python scripting for data ingestion, transformation, and validation, with hands-on experience in Pandas, SQLAlchemy, APIs, and automation.
- ETL/ELT- End-to-end ownership of data pipelines, including ingestion, transformation, and loading, with understanding of incremental loads, and backfills.
- Data Modelling- Ability to design dimensional and transactional data models (star/snowflake, SCDs) and translate business needs into optimized table structures.
- Airflow, dbt, Docker, CI/CD
- GCP/AWS, data warehousing concepts
- BI tools / Streamlit / visualization
- Bachelor’s degree in Computer Science, Engineering, Mathematics, Statistics, or a related quantitative field (or equivalent practical experience).
- 1–3 years of professional experience in data engineering, analytics engineering, or a backend role with significant data pipeline work.
- Strong understanding of data warehouse architecture — know when to use wide denormalised tables vs. normalised models, and the trade-offs of each.
- Familiarity with version control workflows (Git branching strategies, pull requests, code reviews) and agile development practices.
- A data quality mindset — you instinctively validate assumptions, add assertions to pipelines, and treat silent data failures as critical incidents.
- We take transparency very seriously. Along with a full view of team goals, get a top-level view across the board with our monthly & quarterly town hall meetings.
- A highly inclusive work culture that promotes a relaxed, creative and productive environment.
- Practice autonomy, open communication, growth opportunities,while maintaining a perfect work-life balance
- Learning is a way of life. Unlock your full potential backed with cutting-edge tools and mentorship (Macbook for Engagers!)
- Get the best in class medical insurance (with Covid Care facilities), programs for taking care of your mental health, and a Contemporary Leave Policy (beyond sick leaves)
Benefits
Health Insurance
Get the best in class medical insurance (with Covid Care facilities), programs for taking care of your mental health, and a Contemporary Leave Policy (beyond sick leaves)
Learning Budget
Learning is a way of life. Unlock your full potential backed with cutting-edge tools and mentorship (Macbook for Engagers!)
WebEngage is a customer data platform and marketing automation suite designed to streamline user engagement and retention for consumer tech enterprises and SMBs. Our product empowers brands to execute hyper-personalized engagement campaigns across various channels, unifying and analyzing customer data to drive revenue from both existing and anonymous users.
- Founded
- Founded 2011
- Employees
- 51-200 employees
- Industry
- Internet Software & Services