Data Engineer – AI & Digital Platforms

Must-Have Skills

Key Responsibilities

Design and develop scalable data pipelines across Hadoop (Hive, Impala, Spark, Kafka, Iceberg) and Teradata environments.
Build ingestion and transformation frameworks using Java, Spark, Python, and Shell scripts.
Develop full stack applications and internal tools using Python, Shell scripting, and modern web frameworks (Flask, React).
Create APIs and microservices to expose data and ML models securely to downstream systems and user interfaces.
Collaborate with data scientists to operationalize ML models using Cloudera Machine Learning (CML).
Build and deploy GenAI/LLM-powered applications for intelligent data interaction, summarization, and automation.
Implement enterprise-grade security controls including RBAC, LDAP, Kerberos, Apache Ranger, and row-level access.
Tune and optimize data applications for performance across Hadoop and Teradata, ensuring efficient resource utilization
- Support sandbox environments for prototyping, enabling users to build ML models, dashboards, and data pipelines.
Required Skills & Experience

Data Engineering
- Strong experience with Hadoop ecosystem (Hive, Impala, Spark, Kafka, Iceberg, Ranger, Atlas), Teradata, and data pipeline orchestration.
- Experience with MPP databases (e.g., Trino, Presto).
- Proven ability in development and performance tuning of large-scale data applications.

Full Stack Development

Proficiency in Python, Shell scripting, REST APIs, and web frameworks (Flask, React).

Machine Learning & AI

Hands-on experience with ML platforms (CML), Spark MLlib, and Python ML libraries (scikit-learn, XGBoost).
Experience in operationalizing ML models at enterprise scale.

GenAI/LLM Applications

Familiarity with building applications using large language models (OpenAI, Hugging Face, LangChain).
Ability to build agent workflows and support users in creating agent-based solutions.

Security & Governance

Experience with enterprise data security (LDAP, Kerberos, RBAC), data masking, and access control.

Performance Tuning

Strong expertise in optimizing data applications and queries in Hadoop and Teradata environments.

Tools & Platforms

Cloudera Data Platform (CDP), Informatica, QlikSense, Apache Oozie, Git, CI/CD pipelines.

Soft Skills

Staff Engineer (Data Engineer – AI & Digital Platforms)

TLDR