Job Summary:
We are looking for a skilled Data Engineer with 3-4 years of experience to join our team. The ideal candidate should have expertise in Python, AWS, Core Java, and the Hadoop ecosystem, with strong experience in data processing, data warehousing, and query optimization.
Responsibilities include designing, developing, and optimizing scalable data pipelines to integrate data from multiple sources into data warehouses and data marts.
Key Responsibilities:
- Develop and maintain scalable data pipelines to efficiently process and integrate data from multiple sources into data warehouses and data marts.
- Collaborate with data scientists, analysts, and stakeholders to understand data requirements and support analytical and business needs.
- Optimize data workflows by implementing automation and best practices to improve performance and efficiency.
- Troubleshoot and resolve data pipeline issues, ensuring data integrity, accuracy, and reliability.
- Implement and maintain data quality checks, validation processes, and monitoring solutions.
- Support the design and optimization of data models and database structures for performance and scalability.
- Assist in migrating and optimizing data processing systems, ensuring cost-effective and efficient infrastructure usage.
- Document data processes, workflows, and technical solutions for knowledge sharing and continuous improvement.
Required Skills & Qualifications:
- Proficiency in Python, AWS, Hadoop ecosystem, Core Java, and PySpark. - Strong expertise in SQL, NoSQL, and query writing.
- Experience working with databases such as MySQL, PostgreSQL, and Oracle. - Good understanding of data warehouse concepts.
- Knowledge of Scala (Good to have).
- Experience with open data formats like Iceberg, Delta Lake (Good to have).
- Familiarity with BI tools such as Tableau, Looker (Good to have), and Apache Superset. - Ability to work in a fast-paced, data-driven environment.
- Strong problem-solving and analytical skills.