Allen D. Malony, Sameer Shende, and Robert Bell
Department of Computer and Information Science
University of Oregon
Eugene, Oregon 97405, USA
Parallel performance tools offer insights into the execution behavior of an application and are a valuable component in the cycle of application development, deployment, and optimization. However, most tools do not work well with large-scale parallel applications where the performance data generated comes from upwards of thousands of processes. As parallel computer systems increase in size, the scaling of performance observation infrastructure becomes an important concern. In this paper, we discuss the problem of scaling and perfomance observation, and the ramifications of adding online support. A general online performance system architecture is presented. Recent work on the TAU performance system to enable large-scale performance observation and analysis is discussed. The paper concludes with plans for future work.