- Design, develop, and maintain a generic ingestion framework capable of processing various types of data (structured, semi-structured, unstructured) from customer sources.
- Implement and optimize ETL (Extract, Transform, Load) pipelines to ensure data integrity, quality, and reliability as it flows into the centralized datastore like Elasticsearch.
- Ensure the ingestion framework is scalable, secure, efficient and capable of handling large volumes of data in real-time or batch processes.
- Continuously monitor and enhance the data ingestion process to improve performance, reduce latency, and handle new data sources and formats.
- Develop automated testing and monitoring tools to ensure the framework operates smoothly and can quickly adapt to changes in data sources or requirements.
- Provide documentation, support, and training to other team members and stakeholders on using the ingestion framework.
- Implement large-scale near real-time streaming data processing pipelines.
- Design, support and continuously enhance the project code base, continuous integration pipeline, etc.
- Build analytics tools that utilize the data pipeline to provide actionable insights into key business performance metrics.
- Perform POCs and evaluate different technologies and continue to improve the overall architecture.
- Experience building and optimizing Big Data data pipelines, architectures and data sets.
- Strong proficiency in Elasticsearch, its architecture and optimal querying of data.
- Strong analytic skills related to working with unstructured datasets.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data systems.
- One plus years of experience contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems.
- Candidates must have 4 to 6 years of experience in a Data Engineer role with Bachelors or Masters (preferred) in Computer Science or Information Systems or equivalent field. Candidate should have knowledge of using following technologies/tools:
- Experience working on Big Data processing systems like Hadoop, Spark, Spark Streaming, or Flink Streaming.
- Experience with SQL systems like Snowflake or Redshift
- Direct, hands-on experience in two or more of these integration technologies; Java/Python, React, Golang, SQL, NoSQL (Mongo), Restful API.
- Versed in Agile, APIs, Microservices, Containerization etc.
- Experience with CI/CD pipeline running on GitHub, Jenkins, Docker, EKS.
- Knowledge of at least one distributed datastores like MongoDb, DynamoDB, HBase.
- Experience using batch scheduling frameworks like Airflow (preferred), Luigi, Azkaban etc is a plus.
- Experience with AWS cloud services: EC2, S3, DynamoDB, Elasticsearch
We believe that coming together as a community, in person, is important for innovation, connection and fostering a sense of belonging. Our roles have the right balance of remote and in-office working to enable flexibility for managing your life along with ensuring a real connection with your colleagues and the broader IFS community. #Li-Hybrid