Data Engineer

Hyderabad , India
full-time

AI overview

Lead the development and maintenance of resilient data pipelines and systems for a Media Mix Optimization platform, ensuring high-quality data for analytics and insights.

We are implementing a Media Mix Optimization (MMO) platform designed to analyze and optimize marketing investments across multiple channels. This initiative requires a robust on-premises data infrastructure to support distributed computing, large-scale data ingestion, and advanced analytics. The Data Engineer will be responsible for building and maintaining resilient pipelines and data systems that feed into MMO models, ensuring data quality, governance, and availability for Data Science and BI teams. The environment integrates HDFS for distributed storage, Apache NiFi for orchestration, Hive and PySpark for distributed processing, and Postgres for structured data management.

This role is central to enabling seamless integration of massive datasets from disparate sources (media, campaign, transaction, customer interaction, etc.), standardizing data, and providing reliable foundations for advanced econometric modeling and insights.

Responsibilities:

 

Data Pipeline Development & Orchestration
o Design, build, and optimize scalable data pipelines in Apache NiFi to

automate ingestion, cleansing, and enrichment from structured, semi-structured, and unstructured sources.

Ensure pipelines meet low-latency and high-throughput requirements for distributed processing.

Data Storage & Processing
o Architect and manage datasets on HDFS to support high-volume,

fault-tolerant storage.
o Develop distributed processing workflows in PySpark and Hive to

handle large-scale transformations, aggregations, and joins across

petabyte-level datasets.
o Implement partitioning, bucketing, and indexing strategies to

optimize query performance.

Database Engineering & Management
o Maintain and tune Postgres databases for high availability, integrity,

and performance.
o Write advanced SQL queries for ETL, analysis, and integration with

downstream BI/analytics systems.

Collaboration & Integration
o Partner with Data Scientists to deliver clean, reliable datasets for

model training and MMO analysis.
o Work with BI engineers to ensure data pipelines align with reporting

and visualization requirements.

Monitoring & Reliability Engineering
o Implement monitoring, logging, and alerting frameworks to track

data pipeline health.
o Troubleshoot and resolve issues in ingestion, transformations, and

distributed jobs.

Data Governance & Compliance
o Enforce standards for data quality, lineage, and security across

systems.
o Ensure compliance with internal governance and external

regulations.

Documentation & Knowledge Transfer
o Develop and maintain comprehensive technical documentation for

pipelines, data models, and workflows.
o Provide knowledge sharing and onboarding support for cross-

functional teams.

 

  • Bachelor’s degree in Computer Science, Information Technology, or related field (Master’s preferred).

  • Proven experience as a Data Engineer with expertise in HDFS, Apache NiFi, Hive, PySpark, Postgres, Python, and SQL.

  • Strong background in ETL/ELT design, distributed processing, and relational database management.

  • Experience with on-premises big data ecosystems supporting distributed computing.

  • Solid debugging, optimization, and performance tuning skills.

  • Ability to work in agile environments, collaborating with multi-disciplinary

    teams.

  • Strong communication skills for cross-functional technical discussions.

    Preferred Qualifications:

  • Familiarity with data governance frameworks, lineage tracking, and data cataloging tools.

  • Knowledge of security standards, encryption, and access control in on- premises environments.

  • Prior experience with Media Mix Modeling (MMM/MMO) or marketing analytics projects.

  • Exposure to workflow schedulers (Airflow, Oozie, or similar).

  • Proficiency in developing automation scripts and frameworks in Python for

    CI/CD of data pipelines.

 BLEND360 is an award-winning, new breed Data Science Consultancy focused on powering exceptional results for our Fortune 500/1000 clients and other major organizations. We are a growing company—born at the intersection of advanced analytics, data, and technology.Who we are:People are everything here at BLEND360.  We are inspired by advancing our Client’s most critical initiatives, products and projects by matching our clients with the right talent. BLEND360 has been among the Inc. 5000 fastest growing companies 8 years in a row, and we’re very proud of our World Class NPS score. Our success is a direct result of our passion for advancing the careers of the talented people we work with every day. When you work at BLEND360, you will:Collaborate with a smart, passionate group of people who are invested in your success.Partner with an impressive list of clients, who value Blend360’s services and the world class experience we deliver with every engagement. Thrive with a company and leadership team who are committed to growth.

View all jobs
Salary
$20 per hour
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Data Engineer Q&A's
Report this job
Apply for this job