As more applications for parallel and distributed systems are developed using portable hierarchical software frameworks, layered runtime modules, and multi-language software components, the requirements for integrated portable performance analysis will grow more complex. In particular, it becomes a challenge to observe performance events that occur throughout the software hierarchy and across language components and then relate those events to high-level execution abstractions and associated performance views.
Some of the challenges performance technologists face became apparent in our work with Java and its use in a MPI-based parallel execution environment. The extensions we made to the TAU system for unifying JVM versus native execution performance measurement, managing multi-level multi-threading, utilizing different instrumentation mechanisms for Java and MPI, and providing source-level instrumentation, all demonstrate TAU's robust capabilities. However, in the future, we also expect that new techniques for Java code parallelization will introduce new requirements for integrated performance instrumentation.