What you’ll be doing:
-
Building Scalable Data Pipelines: Designing and developing high-quality, scalable ETL pipelines for processing big data using AWS analytical services, leveraging no-code tools, and reusable Python libraries to ensure efficiency and reusability.
-
Collaborating & Aligning with Project Goals: Working closely with cross-functional teams, including senior data engineers, engineering managers, business analysts, to understand project objectives and deliver robust data solutions, following Agile/Scrum principles to drive consistent progress.
-
Data Discovery & Root Cause Analysis: Performing data discovery and analysis to uncover data anomalies, while identifying and resolving data quality issues through root cause analysis. Making informed recommendations for data quality improvement and remediation.
-
Automating Deployments: Managing the automated deployment of code and ETL workflows within cloud infrastructure (AWS preferred) using tools like GitHub Actions or AWS CodePipeline or any modern CI/CD systems to streamline processes and reduce manual intervention.
-
Effective Time Management: Demonstrating strong organizational and time management skills, prioritizing tasks effectively, and ensuring the timely delivery of key project milestones.
-
Documentation & Data Mapping: Developing and maintaining comprehensive data catalogues, including data mapping and documentation, to ensure data governance, transparency, and accessibility for all stakeholders.
-
Learning & Contributing to Best Practices: Continuously improving your skills by learning and implementing data engineering best practices, staying updated on industry trends, and contributing to team knowledge-sharing and codebase optimization.
What we’re looking for:
-
Experience: 2 to 5 years of experience in data engineering or related analytical roles, with a minimum of 2 years working on cloud and big data technologies on AWS. AWS experience is highly preferred, and familiarity with Google BigQuery an Google Analytics 4 is a plus.
-
Data Expertise: Strong analytical skills in handling and processing structured and semi-structured datasets, with hands-on experience in designing and implementing scalable data engineering solutions, on AWS.
-
Cloud Technologies: Proficiency in building data pipelines and working with data warehousing solutions on AWS (Redshift, S3, Glue, Lambda, etc.). Experience with alternative cloud platforms (e.g., Google Cloud, Azure) is a bonus.
-
Programming Skills: Strong programming proficiency in Python, with additional experience in Java/Scala being a plus. Ability to write efficient, reusable, and scalable code to process large datasets.
-
Data Warehousing: Proven experience with modern data warehousing tools like AWS Redshift, Snowflake, or equivalent platforms, with a focus on performance optimization and query tuning.
-
Version Control & Automation: Hands-on experience with version control systems like GitHub, GitLab, or Bitbucket, and with CI/CD pipelines using tools like GitHub Actions, AWS CodePipeline, Jenkins, etc., to ensure smooth, automated deployments.
-
Data Governance & Security: Knowledge of data governance practices, compliance standards, and security protocols in a cloud environment.
-
Optional Skills: Experience in business intelligence (BI) tools like Tableau, Power BI, or QuickSight, and exposure to data visualization techniques will be an advantage.
-
Collaboration & Problem Solving: Ability to work in a cross-functional team, collaborating closely with data scientists, analysts, and product managers to deliver high-impact data solutions. Strong problem-solving skills and adaptability to changing business requirements.
Excellent Communication Skills required.