With the nascent use of Java for high-performance parallel and distributed computing comes the requirements that application developers and system managers have performance measurement and analysis tools. These requirements are not new: performance is a dominant concern and the need for tools is fundamental. The Java language environment and how it is used for high-performance computing, however, pushes the state of performance technology in new respects. First, the Java Virtual Machine (JVM) presents a sophisticated shared memory execution platform that is multi-threaded, supports the mapping of user-level threads to system threads, allows just-in-time (JIT) compilation and dynamic loading of code modules, and interfaces with distributed systems middleware. The combination of these features is new. Second, the Java Native Interface (JNI) opens up the Java environment, making inter-language execution possible. This allows access to high-performance application and communication libraries, but it complicates the ability to track multi-level inter-language performance events across different execution contexts and to integrate those events in local and global performance views. Lastly, because the Java language system is portable, the facilities, tools, and interfaces that support performance measurement and analysis for Java need to be portable as well.
In this paper, we share our experiences developing a prototype performance measurement and analysis system for Java. The system is built upon our robust TAU (Tuning and Analysis Utilities) performance framework for scalable parallel and distributed computing. TAU has been designed to support performance analysis for a general model of parallel computation. It provides portable measurement interfaces and services, flexible instrumentation, the ability to observe multiple software layers and levels of execution, and certain provisions for mixed-language programming. However, in all of these areas, TAU had to be extended in new ways to accommodate Java software features and the hybrid execution model it imposes. This experience has been valuable in that we believe such characteristics will be more the norm in the future, and the techniques we developed will contribute to the repertoire of methods applied to these new performance technology challenges.
In Section 2, we briefly describe the TAU framework and the general computation model it supports. We decided to focus our attention on a (cluster-oriented) style of high-performance computing that uses Java multi-threading for shared memory parallel computing on a symmetric multiprocessing (SMP) node and MPI message passing for communications between distributed nodes. Although not a comprehensive coverage of HPC Java environments , we feel this style of multi-level parallel Java programming is representative of current trends. In Section 3, we describe how the TAU framework has been adapted for this model. Following these sections, we show examples of performance analysis for a parallel Java application, highlighting the ability to capture performance information across execution levels and at different levels of parallelism. Sections 5 and 6 discuss recent features that enable more refined performance measurements. Section 7 addresses the issue of instrumentation overhead and quantifies the costs of TAU measurements. Conclusions and thoughts for future directions are given in Section 8.