Methods for Accelerating Machine Learning in High Performance Computing

Robert Lim

Driven by massive dataset corpuses and advances and programmability in accelerator architectures, such as GPUs and FPGAs, machine learning (ML) has delivered remarkable, human-like accuracy in tasks such as image recognition, machine translation and speech processing. Although ML has improved accuracy in selected human tasks, the time to train models can range from hours to weeks. Thus, accelerating model training is an important research challenge facing the ML field. This work reports on the current state in ML model training, both from an algorithmic and a systems perspective by investigating performance optimization techniques on heterogeneous computing systems. Opportunities in performance optimizations, based on parallelism and locality, are reported and sheds light on techniques to accelerate the learning process, with the goal of achieving on-the-fly learning in heterogeneous computing systems.