These are brief notes about each release, the features in each released are described in detail in the tau [dash] announcements [at] nic.uoregon.edu mailing list archives.
28 Apr 2019.
- New support for AMD ROCm ROCTracer
- Added support for improved OTF2 support for OpenSHMEM
- Added support for Kokkos push/pop region profiling interface
- Added support for Dwarf for TAU_EBS_RESOLUTION=function (needs libelf-devel).
- Added initial support for AMD hipcc and hcc compilers.
- Added support for chrome trace viewer output from tau_trace2json -chrome
- Added support for Cray XC50 with ARM64 platform (-arch=craycnl).
- Added support for Score-P 5.0 (-scorep=download)
10 Nov 2018.
- New support for AMD ROCm
- Added support for OpenSHMEM with GPUs and threads
- Improved support for CUPTI and pthread
- TAU_EBS_RESOLUTION (file/function/line) extension for event-based sampling (tau_exec -ebs)
- Support for OpenMP OMPT TR6 using library replacement
- Updated support for OpenCL and pthread
13 Nov 2017. See announcement.
- New JOGL2 support in paraprof and perfexplorer's 3D profile browsers. This update includes 64 bit support under all platforms (including Mac OS X). Updated support for 3D topology displays for Cray XC platform.
- Support for Mac OS X with PDT, Event Based Sampling (EBS), MPI, and Compiler-based instrumentation.
- Updated binary rewriters for MAQAO (tau_rewrite), and PEBIL (tau_pebil_rewrite) in PDT v3.25.
- The first release of a new pycoolr GUI for online performance evaluation using BEACON (configure -beacon=download) and SOS Flow (configure -sos=download). See examples/sos.
- Updated support for DyninstAPI v9.3.2.
- Added support for LIKWID to access low-level performance counter information in TAU (configure -likwid=download; export TAU_METRICS=TIME,LIKWID_
- Updated MPI support for SGI MPT. Updated support for MPI_T and MPI collective operations in TAU's MPI wrapper.
- Added preliminary support for NVIDIA CUDA 9.0 and NVLink.
- Added support for the Caliper in TAU.
- TAU supports native generation of OTF2 traces for use with Vampir (configure -otf=download; export TAU_TRACE=1; export TAU_TRACE_FORMAT=otf2). This works with callsite profiling (export TAU_CALLSITE=1) with MPI and OpenSHMEM to show where an individual call was invoked in the source code.
25 Aug 2017
- Support for ADIOS profiling interface.
- Support for TAU_CALLSITE=1 and TAU_CALLSITE_OFFSET to specify the level of unwinding.
- Support for using both OpenSHMEM and MPI simultaneously.
- Support for Cray PMI (Process Management Interface) for ParaProf's 3D Topology display.
- Added -DISABLE_MEMORY_MANAGER configuration option.
- Added support for Clang/LLVM under -arch=craycnl (PrgEnv-llvm at ALCF).
- Support for Intel PIN
- Support for runtime selective instrumentation file (export TAU_SELECT_FILE=select.tau)
- Initial support for SOSflow
- Support for selective instrumentation (exclude/include) file/routine lists for GCC (export TAU_OPTIONS=-optTauSelectFile=select.tau).
- Support for Score-P 3.1
- Support for a new TAU plugin architecture
- Support for CoArray Fortran (tau_caf.sh)
- Support for Kokkos profiling API while using tau_exec
- Support for capturing user specified metadata (export TAU_METADATA="
- Support for workflows (paraprof extensions for TAU_APPLICATION_NAME, tau_coalesce)
- Support for json format (tau_trace2json, tau_prof2json.py )
- Reduced memory footprint of callpath profiling
- Added initial support for Clang C/C++ compilers under Power 8 Linux
- Support for OpenMP Tools API (OMPT)
- Support for power profiling
- Support for special marker events for capturing spikes
- Lower OpenMP runtime overhead
- Offline address lookup
- Support for tracking MPI in multi-threaded MPC programs
- Improved heap memory usage, CUPTI, ParaProf, and PerfExplorer See announcement.
- EDG v4.4 based parsers in PDT 3.19. Updated MAQAO and PEBIL.
- Support for user-level threads in MPC.
- OpenMP on-the-fly region registration.
- Metadata and CUBE enhancements in ParaProf and PerfExplorer.
- LLVM for IBM BG/Q.
- Improved Intel Xeon Phi Co-processor (MIC) support for PAPI. See announcement.
- Introduced Runtime Bounds Checking (RBC).
- Optimized event throttling in multi-threaded executions.
- CUPTI kernel instrumentation tracking.
- Enhancements and support for TAUdb in ParaProf and PerfExplorer.
- Support for Intel Xeon Phi Co-processor (MIC).
- Support for static binary rewriting using DyninstAPI. See announcement.
- Introduced TAUdb (formerly PerfDMF) database framework.
- Introduced tau_pebil_rewrite, a binary rewriter based on PEBIL (PDT 3.18.1).
- Support for C++ and Fortran in MAQAO (PDT 3.18.1) in tau_rewrite, a binary rewriter.
- Support for GPI.
- Support for Opari2 1.0.6.
- Improved support for TAU_SUMMARY=1.
- Support for topology displays for Fujitsu FX10, Cray XE, and IBM BG/Q.
- Support for both native and offload modes for Intel Xeon Phi (MIC).
- Support for CUDA device to device profiling.
- Updated support for communication matrix displays for one-sided calls.
- Enhanced ParaProf 3D window configuration for Mac OS X, AMD64 architectures. See announcement.
- Support for tracking UPC runtime in Cray, BUPC, GUPC: -optTrackUPCR
- Support for MPC
- Support for CLANG LLVM compiler
- Port to Fujitsu FX10/K computer (-arch=sparc64fx)
- Support for -c++=mpicxx -cc=mpicc -fortran=mpif90 in configure
- Support for TAU_LITE=1 runtime parameter
- PerfDMF supports alternate mean definition using perfdmf_loadtrial -z
- Cray topology visualization support
- CUPTI and PAPI v5.0 support
- 64 bit HPC Linux LiveDVD featuring TAU.
- SC'12 events See announcement.
- Port to ARM Linux, Intel MIC platforms
- CUDA 5.0 support
- UPC Runtime instrumentation (-optTrackUPCR) for Cray CCE & BUPC
- Opari2 1.0.3 and -optPreProcess (tau_macro.sh)
- tau2otf for OTF2
- Improvements in MPI wrapper
- ParaProf 3D for IBM BG/Q, ARM Linux
- PDT 3.18 release
- KTAU 3.0 release See announcement.
- Port to IBM BlueGene/Q
- ParaProf topology window
- CUDA 4.1 support
- Support for tracking device memory in CUDA
- Support for tracking queue wait time in OpenCL
- Opari2 based instrumentation of OpenMP programs
- Support for debugging callstacks (TAU_TRACK_SIGNALS=1) See announcement.
- tau_rewrite tool based on MAQAO
- OpenSHMEM Profiling
- Score-P Atomic/Context Events
- OpenMP 3.0 instrumentation with Opari2
- NVIDIA CUPTI v4.1 supported in TAU
- H2 database in PerfDMF
- Debugging support: TAU_TRACK_SIGNALS
- Mingw compiler support for Windows
- UPC source level instrumentation using Rose parser to support Cray CCE compiler
- PDT 3.17 with Rose and MAQAO binary instrumentor
- New HPC Linux: VirtualBox appliance (OVA) with TAU
- SHMEM profiling for Cray and SGI.
- Event based sampling (flat profiles).
- NVIDIA OpenCL and AMD OpenCL support
- ParaProf 3D topology display
- Improved support for profiling GPGPU applications
- Linker Based Instrumentation
- Tracking POSIX IO calls using linker-based instrumentation
- Tracking IO parameters
- New platforms
- Improved support for profiling GPGPU applications
- Profiling accelerator primitives with the PGI compiler
- ParaProf enhancements
- Memory leak detection for Fortran
Loads, stores and leaks can be detected automatically using source-level instrumentation. tau_instrumentor accepts a new
keyword "memory [file=
] routine= " in the instrument section (BEGIN_INSTRUMENT_SECTION/END_INSTRUMENT_SECTION) of the selective instrumentation file. See examples/memoryleakdetect/f90.
- Enhancements to Eclipse PTP plugin Scrolling is supported for options in the TAU analysis tab.
- Enhancements and bug fixes for PerfExplorer
10 June 2017
15 Nov 2013
27 May 2013
8 February 2013
9 November 2012
18 September 2012
12 July 2012
26 Mar 2012
10 Nov 2011
18 Aug 2011
13 May 2011
22 Mar 2011
11 Nov 2010See announcement.
9 July 2010See announcement.
3 Mar 2010See announcement.
16 Nov 2009See announcement.
18 Sep 2009See announcement.
15 May 2009See announcement.
22 Jan 2009
TAU can now interface with PGI's runtime library and extract performance information associated with kernels that execute on the GPGPUs. TAU tracks the interactions with the GPGPU as seen from the host and generates the performance data. This data includes the name of the routine, file, line number as well as block and grid sizes and individual variable names. This feature works with PGI 8.0.3+ compilers that support the #acc region/end region directives. These source annotations may be placed around loops to automatically generate GPGPU code that executes on CUDA enabled NVidia cards. Users do not need to write any GPGPU specific code explicity. Instead, they use a compiler flag (-ta=nvidia) to generate this code using a special add-on package with the PGI compiler.
This release improves support for Charm++ and NAMD. We have a wiki page that describes how to build and use TAU with NAMD.
16 Nov 2008
We add support for the PGI and IBM compilers for compiler-based instrumentation. Now, you may set:
% setenv TAU_OPTIONS '-optCompInst -optVerbose....' (see tau_cc.sh, % tau_f90.sh, tau_cxx.sh) % setenv TAU_MAKEFILE taudir/arch/lib/Makefile.tau-[options] % tau_f90.sh app.f90; tau_f90.sh app.o -o appto enable this feature. We have tested this on Cray XT3/4/5 systems with PGI compilers, x86_64 linux systems and IBM pSeries Linux, BG/P, AIX Power5 and 6 systems. With this new feature, we have completed support for GNU, IBM, PGI, Intel, and Pathscale compilers. The above -optCompInst flag will work uniformly across all platforms and languages (Fortran/C/C++). This feature works at the routine level and may be used to replace PDT for inserting instrumentation. PDT is still relevant for more detailed instrumentation at the fine-grained loop, memory allocation, and I/O tracking levels. We have updated the GNU compiler instrumentation module in TAU to support instrumentation of routines that reside in shared objects that are loaded at runtime. TAU can now exclude files from compiler-based instrumentation by specifying these in the exclude list in TAU's selective instrumentation file (specified using -optTauSelectFile=file.tau). The GNU compiler support for shared objects requires a BFD package installed with -fPIC (position independent code). When the default package (in /usr/lib) is not compiled this way, you may either specify -DISABLESHARED while configuring TAU or use -bfd=download that will download binutils-2.18 and compile it -fPIC and use it to create libTAU.so. This does not affect the use of TAU for static linking (used by default).
30 Sep 2008
TAU features compiler based instrumentation for Intel, GNU and PathScale compilers, a new python API for memory tracking, fixes for IBM BG/P configuration, and support for CQoS analysis and drawing charts from script files in PerfExplorer.
12 Aug 2008
TAU features a generic source code instrumentor in tau_instrumentor, paraprof enhancements including creation of a selective instrumentation file, and support for other file formats, using default values for TAU_THROTTLE (1), COUNTER1, storing weka files in ~/.ParaProf directory, GNU PDT parser, context events in POSIX I/O interposition library and a new lightweight TAU_PROFILER API.
TAU mentioned in HPC wire article
21 March 2008
The TAU performance system® was one of several performance evaluation tools mentioned in a HPC wire article about the Petascale Productivity from Open, Integrated Tools (POINT) project. Quote:
"The POINT project will improve and support a parallel performance environment that integrates the widely-used TAU, PAPI, KOJAK, and PerfSuite technologies as core components. Each tool will be enhanced to better support user needs and evolving scalable HPC technology, and to interoperate as part of a performance engineering system to be used routinely in the performance evaluation and optimization of domain science and engineering (S&E) applications running on HPC systems of extreme scale."
More information about the POINT project can be found at their website.
21 March 2008
TAU v2.17.1 has these new features:
Tracking MPI-I/O, Perfexplorer 2 with atomic events, jython interface, refactoring TAU and support for TAU_PROFILE_FORMAT environment variable, PAPI-C non-cpu native events, Eclipse/PTP plugin update, Scalasca 1.x support, GCC 4.3.x, IBM BG/P -BGPTIMERS and metadata, and updates for Apple OS X.
TAU v2.17 and PDT 3.12 released
9 November 2007
TAU v2.17 has these new features:
tau_wrap, a wrapper generator for external libraries, port to IBM BG/P (-arch=bgp), SiCortex, Cray CNL, and Windows Cluster 2003 (including MPI support). Improvements to the Eclipse plugin, paraprof, and perfexplorer. Added a new Posix I/O wrapper (-iowrapper) for tracking the volume and bandwidth of I/O. Added support for atomic and context events in the OTF traces generated by VampirTrace using TAU.
TAU v2.16.6 released
21 Sep 2007
TAU v2.16.6 has these new features:
static/dynamic phase/timer instrumentation constructs are now supported in the TAU instrumentation specification file, Cray XT4 compute node linux (-arch=craycnl), Eclipse/PTP plugin for external performance tools, tauex updates for MPI shared object loading, signal handlers for dumping performance data (SIGUSR1) and toggling instrumentation (SIGUSR2), support for OMPP profiles in paraprof, support for Intel 10.x Fortran/C/C++, NAGWare Fortran and g95 Fortran compilers.
TAU v2.16.5 released
31 May 2007
TAU v2.16.5 has these new features:
profile snapshots, I/O tracking in Fortran, configuration and support of multiple PerfDMF databases within ParaProf, support for Lahey 64 bit compiler under Linux, SiCortex 64 and 32 bit architectures, and support for gfortran based parser in PDT 3.11.1 for Mips Linux architecture.
TAU v2.16.4 released
1 May 2007
TAU v2.16.4 has these new features:
Clock synchronization in trace files, metadata fields in ppk files, perfexplorer custom charts with XML metadata fields, TAU portal scripts to upload data, support for persistent communication events in traces, KTAU OS level shared counter coupling, Eclipse/PTP updates for accessing TAU options and build configurations.
TAU v2.16.3 and PDT 3.11 released
27 March 2007
TAU v2.16.3 has these new features:
Eclipse PTP plugin update, memory leak detection enhancements, high level API, Python instrumentation, Paraprof's support for cube3 profiles, perfexplorer comparative displays and Jython interpreter support, PAPI enhancements (papithread, papi domains under x86 linux), tauex, and pure java implementation of tau2slog2.
TAU v2.16.2 and PDT 3.10 released
1 March 2007
TAU v2.16.2 has these new features:
TAU v2.16.1 and PDT 3.10 released
13 February 2007
TAU v2.15.5 released
30 June 2006
TAU v2.15.5 has these new features:
TAU portal at https://tau.nic.uoregon.edu, automatic memory leak detection for C/C++(malloc/free), Perfexplorer enhancements (normal probability plots, event data, distribution info of events), tau2otf supports compressed and multi-threaded OTF traces, tau_instrumentor, ParaProf and pprof enhancements.
TAU v2.15.4 released
8 June 2006
TAU v2.15.4 has these new features:
tau_poe tool for instrumenting AIX binaries at runtime, improvements in tau_instrumentor to support gotos in loops, support for tracking memory allocations and deallocations and associating these with the program callstack using TAU's malloc/free wrapper, improvements in tau_ompcheck tool, Derby support in PerfDMF, and enhancements to ParaProf and PerfExplorer.
TAU v2.15.3 released
27 April 2006
TAU v2.15.3 has these new features:
support for automatic outer loop level instrumentation in Fortran, support for PDT's gfortran parser, tau_ompcheck for correcting OpenMP directives in Fortran, Derby and DB2 support in PerfDMF, enhancements to Paraprof for phase based profiling, automatic instrumentation of pthread programs, Java trace writer API library, Cray XT3 extensions, and an upgradetau utility for installing TAU.
TAU v2.15.2 released
21 February 2006
TAU v2.15.2 has these new features:
support for automatic outer loop level instrumentation in C and C++ using PDT, Eclipse PTP environment, python 2.4 instrumentation, Jython support in Paraprof, port to FreeBSD and updates to tau_instrumentor.
OTF for IBM BG/L released
30 December 2005
TAU v2.15.1 released
22 December 2005
TAU v2.15.1 has these new features:
phaseconvert: Added a new utility to convert callpath profiles to phase based profiles given a set of phases. This supports not only TAU profiles, but also cube profiles and any other callpath profile that perfdmf supports.
tau2profile: Added a new utility to convert TAU trace files to profiles. Traces contain timestamped events while profiles contain aggregate summaries of performance metrics. This utility supports PAPI counter data as well, so TAU trace files with multiplecounter data are mapped to profiles with multiple metrics. It supports generation of profile series and interval profiles as well.
Enhancements to Paraprof
And Better support for Intel compilers for linking C and Fortran codes.
TAU v2.15 released
17 November 2005
We've added new paraprof phase and comparative displays. And support for Eclipse CDT, FDT in TAU. Tau now supports the Open Trace Format (OTF). Updates to the PerfExplorer Performance Data Mining tool have been made. Event profiling can now be throttled during runtime. Added support for ORC Open64 compiler and nested OpenMP calls. Traces are now multi-platform and can be generated on one platform and merged/converted on another. Added support for Cray XT3 (-arch=xt3, see wiki), and SHMEM wrappers. Added support for Solaris on x86_64 Opteron. Updated support for PAPI on IBM BGL and Cray XT3.
TAU v2.14.7 released
11 August 2005
We've added new tools for performance data mining and knowledge discovery [PerfExplorer], command line invocation of TAU, TAU Eclipse Java plugin, and updated our documentation.
TAU v2.14.6, PDT v3.4 and VTF3 v1.34 released
30 June 2005
We've added support for large trace files (> 2GB), GPSHMEM, and now we distribute JumpShot4 and SLOG2 SDK as part of TAU. TAU_COMPILER and tau_instrumentor are enhanced to better support automatic instrumentation of Fortran 90/95 codes using PDT v3.4.
TAU v2.14.5 released
8 June 2005
We've added support for importing CUBE(Kojak) profiles in paraprof. TAU has a new -MPITRACE option that produces trace files with events that are ancestors of MPI calls. These traces can be converted to the Epilog format (from Kojak) for use with the expert tool. TAU_COMPILER instrumentation tool has been updated to support OpenMP instrumentation with Kojak's Opari instrumentor. Paraprof has a new thread statistics table window with support for expanding a callgraph by clicking on a node. You can sort on a particular column by clicking on it its heading.
TAU v2.14.4 released
18 May 2005
We've added support for memory headroom calculation. Paraprof has a packed profile data format, reverse callpath views, and search capabilities. TAU has a new context user defined event where application specific events can be mapped to the program's callstack. TAU traces can now be converted to the Epilog trace format using tau2elg tool.
TAU v2.14.3 released
20 Apr 2005
We've added support for 3D profile displays in Paraprof. TAU now supports the JumpShot4 trace visualizer with the SLOG2 trace converter.
TAU v2.14.1 released
20 Jan 2005
We've added support for phase based profiling, dynamic timers, a tool to convert vtf3 trace files to TAU profiles, and several enhancements to Paraprof. Paraprof now has an option to show the complete callgraph (click-able to identify the callpath, with zoom in/out capabilities, options to select node colors and sizes). Paraprof has a new scalable histogram display which shows the no. of threads of a routine in each bin (between max and min values, with the ability to change the no. of bins). TAU features better support for multi-threaded executions, and support for PathScale compilers (C, C++, Fortran 95) for Opteron Linux platform. PDT v3.3.1 is also released with support for PathScale compilers.
TAU v2.14 released
TAU now supports Oracle, PostgreSQL and MySQL databases in PerfDMF.
TAU v2.13.7 released
TAU v2.12.9 released
TAU v2.12.9 introduces the new paraprof profile browser [ Europar03 ], DyninstAPI 4.0 support for rewriting binary images, file level selective instrumentation support, gprof style parallel callpath views for callpath profiles in paraprof, user specified depth in callpath profiles, Python API improvements, Opari updates for OpenMP instrumentation and EPILOG trace file format support from the KOJAK (FZJ) project.
TAU v2.12.5 released
TAU v2.12.5 supports Python bindings and automatic instrumentation of Python code.
Call Path profiling
TAU supports call path profiling. This allows a user to explore the time spent along a specific call path. Currently, the latest release (TAU v2.11.17) supports a two-level call path. See Call Path Profiling for further details. TAU also supports PETSc in this release.
New tool: tau_reduce
Frequently executing light-weight routines may distort the performance data by introducing unnecessary overhead. To weed out these routines, a new tool tau_reduce has been introduced in TAU. It reads the profile output and a rules file that specifies when a routine should not be instrumented, and produces a selective instrumentation file that lists routines that should be excluded from instrumentation. This information can be fed to tau_instrumentor based on PDT or tau_run based on DyninstAPI to reduce the instrumentation overhead for subsequent runs. See examples/reduce and utils/TAU_REDUCE.README for more information.
Support for EPILOG and EXPERT
TAU can generate EPILOG binary traces which can be analyzed using the EXPERT tool. See [ KOJAK ]. TAU also supports Hitachi SR8000, NEC SX and IA-64 Linux platforms. Under IA-64, Intel C/C++/F90 compilers are supported.
Runtime access to performance data
TAU v2.11.14 also supports runtime access to performance data that allows an application to query its performance metrics. TAU also features selective dumping of profile data and incremental dumping of data at runtime. TAU supports integrated performance analysis in the Uintah software. See [ ISHPC'02 paper ].
TAU supports selective instrumentation of source code (using PDT) and object code (using DyninstAPI). A selective instrumentation file can specify a list of routines that are to be instrumented or to be excluded from instrumentation.
Support for multiple counters
TAU can now support profiling with more than one quantity (such as wall-clock time, hardware performance counters). Different options can be selected by setting COUNTER[1-25] environment variables to indicate the counters to be profiled. TAU also supports PAPI v2.1 in this release. See -MULTIPLECOUNTERS configuration option.
TAU supports dynamic creation of profile groups. This allows users to enable and disable groups at runtime, as well as associate groups with files during instrumentation using tau_instrumentor. Support for profile groups is demonstrated in SAMRAI(LLNL) .
TAU supports F90 instrumentation using PDT .
Access to x86 timers under Linux
TAU supports access to low-overhead timers under Linux using the -LINUXTIMERS configuration option.
jracy released in TAU v2.10
TAU has a new profile browser (jracy) implemented in Java. Sample images of jracy can be seen in EVH1 Profiles .
TAU works with UPS .
XPARE (eXPeriment Alerting and REporting) is a system for performance experimentation that is integrated in a weekly testing harness for the Uintah / C-SAFE software development effort. With this system we can produce detailed weekly reports of Uintah / C-SAFE performance and alert code developers of performance problems as they arise.
TAU v 2.9.19 Released
TAU v 2.9.19 features support for OpenMP directive rewriting (Opari) based instrumentation for OpenMP programs. See LACSI 2001 paper.
TAU v 2.9.12 Released
TAU v 2.9.12 features support for several thread packages (SGI sproc, pthread, Java, Windows, OpenMP, Tulip, SMARTS) and for a runtime profile snapshot (TAU_DB_DUMP) facility in addition to extensions to its performance data mapping API. See the download section for instructions on downloading TAU.
TAU v 2.9 Released
TAU v2.9 features support for mixed model programming, support for PAPI, PCL for hardware performance counters and new ports (to IA-64). See the Download page for more information.
TAU supports Hybrid Execution Models
TAU supports PAPI and OpenMP with MPI (OpenMPI)
TAU v 2.8.11 Released
TAU v2.8x implements the performance mapping API that allows performance data to be correlated between different layers in a multi-layered software. It features support for Fortran 90 and MPI Profiling Interface. It supports access to hardware performance counters using PCL and PAPI on several platforms including Cray T3E, SGI, UltraSparc, IBM Power3, Intel Pentium+
Profiling User Events in PaRP
TAU now implements profiling of user defined event. These could be used to track memory statistics or any application specific statistics maintained on a per thread basis. Click here for more information on its use in the PaRP project.
Vampir and Smarts
TAU can generate event traces for Vampir for Smarts user level threads. This can be a valuable tool in evaluating efficient thread scheduling policies in SMARTS. Click here for more information.
TAU integrated with Pooma II
TAU Profiling package now supports pthreads using -pthread configure option. Version 2.3 released on Aug. 10, 1998 also supports user defined events. C programs can now be profiled using TAU using the same API as C++.
TAU IL Converter
TAU IL converter and program database for analysis tools uses an EDG front end to parse a C++ program and converts the intermediate language to a format that can be used by TAU tools. For more info see the documentation section.
The TAU Portable package can now generate traces that can be viewed using VAMPIR. For details see the Tutorial Tracing for VAMPIR