To demonstrate the use of the OpenMP performance tool interface with TAU, we applied it to a two-dimensional Stommel ocean current application from the San Diego Supercomputing Center. The application code models wind-driven circulation in a homogeneous rectangular ocean under the influence of surface winds, linearized bottom friction, flat bottom, and Coriolis force. A 5-point stencil is used to solve partial differential equation on a grid of points. Table 4 shows the source code for a more compute-intensive for block, before and after instrumentation with OPARI. By linking with the TAU-specific pomp library and a user-configured TAU measurement package, the performance data for OpenMP and MPI events can be captured and displayed.
Figure 4: TAU Performance Profile of an OpenMP/MPI 2D Stommel Model of Ocean Circulation Instrumented with OPARI
Figure 4 presents profiling data for the Stommel application. Shown is a region-based performance view where individual parallel loops are distinguished. The for block shown in Table 4 is highlighted in the ``n,c,t 0,0,0 profile'' display (representing node 0, context 0, and thread 0) and is seen to take a significant percentage of time to execute. The execution time for this block across all threads is shown in the ``for
profile'' display. Clearly, there is a work imbalance between the two threads within each process, but the distribution is consistent across nodes (i.e., processes). Notice how the MPI performance data is integrated with the OpenMP data in the display. It is also possible for TAU to be used to obtain construct-based performance data.
By linking the Stommel application with a trace-configured performance library, OpenMP and MPI events can be displayed using the Vampir  visualization tool. Figure 5 displays an event timeline showing the overlaps of OpenMP and MPI events.
Figure 5: TAU Performance Trace of Stommel Application