In this role, the candidate will be required to understand Deep learning workload characteristics and have the hands-on ability to measure, analyze and use the data to project and estimate the power and performance of the latest DL workloads.
Responsibilities
- The ideal candidate will have both software and hardware background to do sensitivity analysis for both hardware knobs and understand how to measure and improve the performance of DL workloads.
- The candidate should have worked on simulators and have experience with benchmarking DL models.
- The ideal candidate should have at least 5+ years of experience working on performance analysis of DL workloads running workloads on accelerators and improving them.
- Programming and debugging code written in python/C++/CUDA/HIP/OpenCL will be required as well as ability to model and work with the hardware teams to measure power and performance of key kernels running on RTL and performance simulators
- Knowledge of performance and power modeling is a plus.
- Solid understanding of the fundamentals of computer architecture, memory hierarchy, caches and fabrics is a prerequisite for the role.
Requirements
- Excellent skills in problem solving, written and verbal communication, excellent organization skills, and highly self-motivated.
- Ability to work well in a team and be productive under aggressive schedules
Education and Experience
- PhD, Master’s Degree in Computer Engineering / Computer science with 5+ years of experience working on DL models.
- Coursework on computer architecture, parallel computing , compilers and digital design is required.