Showing 1 entry
Keywords: Online performance measurement, cluster monitoring, TAU, supermon
Online application performance monitoring allows tracking performance characteristics during execution as opposed to doing so post-mortem. This opens up several possibilities otherwise unavailable such as real-time visualization and application performance steering that can be useful in the context of long-running applications. As HPC sys- tems grow in size and complexity, the key challenge is to keep the online performance monitor scalable and low overhead while still providing a useful performance reporting capability. Two fundamental components that constitute such a performance monitor are the measurement and transport systems. We adapt and combine two existing, mature systems - TAU and Supermon - to address this problem. TAU performs the mea- surement while Supermon is used to collect the distributed measurement state. Our experiments show that this novel approach leads to very low- overhead application monitoring as well as other benefits unavailable from using a transport such as NFS.
Modified: Tue Jun 5 10:31:50 US/Pacific 2007
Created: Mon Jun 4 17:45:30 US/Pacific 2007
Return to the ParaDucks Research Group Publications page.