Anirban Ghose

Posts

Memory Optimizations for Deep Learning Workloads on Hardware Accelerators


The objective of the project lies in exploring memory optimizations for speeding up training of large Deep Neural Networks (DNN) using oneAPI’s DPC++ toolchain for any general purpose heterogeneous architecture comprising multicore CPUs, integrated GPUs as well as general purpose hardware accelerators with discrete memory spaces such as discrete GPUs and FPGAs. Currently the project is in the concept stage.