The Data team at VoxelCloud (Westwood, Los Angeles, CA) manages and maintains large-scale medical and healthcare data at the core of all our R&D activities. Reporting to Data Team Lead, the Data Engineer intern will participate in the acquisition and manipulation of massive datasets in multi-modal formats (medical images, text(EMR), etc.) on cloud storage. The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up. The Data Engineer intern will support our software developers and machine learning engineers on product/research initiatives and will create an optimal data delivery pipeline that is consistent across ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems, and products. The right candidate will be excited by the prospect of optimizing or even re-designing our company’s data architecture to support our next generation of products and data initiatives.
Responsibilities:
- Create and maintain optimal data pipelines to support machine learning research and development
- Identify, design, and implement internal process improvements: automating data QA, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS/AliCloud big data technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights into product utilization and operational efficiency.
- Keep our data separated and secure across national boundaries both locally and on cloud storage.
- Proficient with at least one object-oriented/object function scripting languages: Python, Java, C++, Scala, etc
- Working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases (Postgres).
- Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
- Solid understanding of information retrieval, statistics and machine learning. Experience with Computer Vision and NLP is a plus.
- Prefer 1+ years in big data and related technology (e.g. DFS); experience with high-performance and scalable distributed system.
- Prefer experience with AWS cloud services: EC2, EMR, RDS, Redshift
- Skillful with automation tasks, but willing to get hands dirty for quality control.
- Detail-oriented, well organized and self-motivated with a continuous drive to learn, explore and challenge; good communication skills and team player.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- MS, BA/BS degree in computer science, statistics or related field.
We Offer…
- An outstanding start-up culture;
- Transparent, collaborative work environment;
- Competitive compensation
- Excellent Medical, Dental, and Vision coverage
- 401k, paid Vacation and Holiday
All your information will be kept confidential according to EEO guidelines.