Summary
We are looking for a Senior Data Engineer to join our Data Platform group. In this position, you will work in a small, dynamic team to build data infrastructure and manage the overall data pipeline. You will be responsible for expanding and optimizing data and data pipeline architecture, as well as optimizing dataflow and collection of data from cross functional teams.
Responsibilities
- Design, develop, and maintain a generic ingestion framework capable of processing various types of data (structured, semi-structured, unstructured) from customer sources.
- Implement and optimize ETL (Extract, Transform, Load) pipelines to ensure data integrity, quality, and reliability as it flows into the centralized datastore like Elasticsearch.
- Ensure the ingestion framework is scalable, secure, efficient and capable of handling large volumes of data in real-time or batch processes.
- Continuously monitor and enhance the data ingestion process to improve performance, reduce latency, and handle new data sources and formats.
- Develop automated testing and monitoring tools to ensure the framework operates smoothly and can quickly adapt to changes in data sources or requirements.
- Provide documentation, support, and training to other team members and stakeholders on using the ingestion framework.
- Implement large-scale near real-time streaming data processing pipelines.
- Design, support and continuously enhance the project code base, continuous integration pipeline, etc.
- Build analytics tools that utilize the data pipeline to provide actionable insights into key business performance metrics.
- Perform POCs and evaluate different technologies and continue to improve the overall architecture.
- Experience building and optimizing Big Data data pipelines, architectures and data sets.
- Strong proficiency in Elasticsearch, its architecture and optimal querying of data.
- Strong analytic skills related to working with unstructured datasets.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data systems.
- One plus years of experience contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems.
- Candidates must have 4 to 6 years of experience in a Data Engineer role with Bachelors or Masters (preferred) in Computer Science or Information Systems or equivalent field. Candidate should have knowledge of using following technologies/tools:
- Experience working on Big Data processing systems like Hadoop, Spark, Spark Streaming, or Flink Streaming.
- Experience with SQL systems like Snowflake or Redshift
- Direct, hands-on experience in two or more of these integration technologies; Java/Python, React, Golang, SQL, NoSQL (Mongo), Restful API.
- Versed in Agile, APIs, Microservices, Containerization etc.
- Experience with CI/CD pipeline running on GitHub, Jenkins, Docker, EKS.
- Knowledge of at least one distributed datastores like MongoDb, DynamoDB, HBase.
- Experience using batch scheduling frameworks like Airflow (preferred), Luigi, Azkaban etc is a plus.
- Experience with AWS cloud services: EC2, S3, DynamoDB, Elasticsearch
We believe that coming together as a community, in person, is important for innovation, connection and fostering a sense of belonging. Our roles have the right balance of remote and in-office working to enable flexibility for managing your life along with ensuring a real connection with your colleagues and the broader IFS community. #Li-Hybrid