A. Malony, S. Biersdorff, W. Spear, S. Mayanglambam. "An Experimental Approach to Performance Measurement of Heterogeneous Parallel Applications using CUDA." Presented at International Conference on Supercomputing, Tsukuba, Japan 2010.

Keywords: Performance tools, GPGPU, profiling, tracing

Heterogeneous parallel systems using GPU devices for ap- plication acceleration have garnered significant attention in the supercomputing community. However, to realize the full potential of GPU computing, application developers will re- quire tools to measure and analyze accelerator performance with respect to the parallel execution as a whole. A per- formance measurement technology for the NVIDIA CUDA platform has been developed and integrated with the TAU parallel performance system. The design of the TAUcuda package is based on an experimental NVIDIA CUDA driver and associated runtime and device libraries. In any envi- ronment where the CUDA experimental driver is installed, TAUcuda can provide detailed performance information re- garding the execution of GPU kernels and the interactions with the parallel program without any modification to the program source or executable code. The paper describes the TAUcuda technology and how it is integrated with the TAU measurement framework to provide integrated performance views. Various examples of TAUcuda use are presented, in- cluding CUDA SDK examples, a GPU version of the Linpack benchmark, and a scalable molecular dynamics application, NAMD.


