TAU - Tuning and Analysis Utilities -

Tuning and Analysis Utilities

TAU

PRL

Chapter 1. Installation

TAU (Tuning and Analysis Utilities) is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C++, C, Java and Python. The model that TAU uses to profile parallel, multi-threaded programs maintains performance data for each thread, context, and node in use by an application. The profiling instrumentation needed to implement the model captures data for functions, methods, basic blocks, and statement execution at these levels. All C++ language features are supported in the TAU profiling instrumentation including templates and namespaces, which is available through an API at the library or application level. The API also provides selection of profiling groups for organizing and controlling instrumentation. The instrumentation can be inserted in the source code using an automatic instrumentor tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java virtual machine, or manually using the instrumentation API. TAU's profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the Vampir or Paraver trace visualization tools. This chapter discusses installation of the TAU portable profiling package.

1.1. Installing TAU

After uncompressing and un-tarring TAU, the user needs to configure, compile and install the package. This is done by invoking:

% ./configure
% make install
    

TAU is configured by running the configure script with appropriate options that select the profiling and tracing components that are used to build the TAU library. The `configure' shell script attempts to guess correct values for various system-dependent variables used during compilation, and creates the Makefile(s) (one in each subdirectory of the source directory).

The following command-line options are available to configure:

1.1.1. Available configuration options

  • -prefix=<directory>

    Specifies the destination directory where the header, library and binary files are copied. By default, these are copied to subdirectories <arch>/bin and <arch>/lib in the TAU root directory.

  • -arch=<architecture>

    Specifies the architecture. If the user does not specify this option, configure determines the architecture. For IBM BGL, the user should specify bgl as the architecture. For SGI, the user can specify either of sgi32, sgin32 or sgi64 for 32, n32 or 64 bit compilation modes respectively. The files are installed in the <architecture>/bin and <architecture>/lib directories.

  • -c++=<C++ compiler>

    Specifies the name of the C++ compiler. Supported C++ compilers include KCC (from KAI/Intel), CC (SGI, Sun), g++ (from GNU), FCC (from Fujitsu), xlC (from IBM), guidec++ (from KAI/Intel), cxx (Tru64) and aCC (from HP), c++ (from Apple), icpc and ecpc (from Intel) and pgCC (from PGI).

  • -cc=<C Compiler>

    Specifies the name of the C compiler. Supported C compilers include cc, gcc (from GNU), pgcc (from PGI), fcc (from Fujitsu), xlc (from IBM), and KCC (from KAI/ Intel), icc and ecc (from Intel).

  • -pdt_c++=<C++ Compiler>

    Specifies a different C++ compiler for PDT (tau_instrumentor). This is typically used when the library is compiled with a C++ compiler (specified with -c++) and the tau_instrumentor is compiled with a different <pdt_c++> compiler. For e.g.,

    -c++=pgCC -cc=pgcc -pdt_c++=KCC -openmp ...

    uses PGI's OpenMP compilers for TAU's library and KCC for tau_instrumentor.

    -arch=bgl -pdt=/usr/pdtoolkit-3.4 -pdt_c++=xlC -mpi

    uses PDT, MPI for IBM BG/L and specifies the use of the front-end xlC compiler for building tau_instrumentor.

  • -fortran=<Fortran Compiler>

    Specifies the name of the Fortran90 compiler. Valid options are: gnu, sgi, ibm, ibm64, intel, cray, pgi, absoft, fujitsu, sun, kai, nec, hitachi, compaq, nagware, g95 and hp.

  • -tag=<Unique Name>

    Specifies a tag in the name of the stub Makefile and TAU makefiles to uniquely identify the installation. This is useful when more than one MPI library may be used with different versions of compilers. e.g.,

     
    % configure -c++=icpc -cc=icc -tag=intel71-vmi   \ 
                -mpiinc=/vmi2/mpich/include
    	
  • -pthread

    Specifies pthread as the thread package to be used. In the default mode, no thread package is used.

  • -charm=<directory>

    Specifies charm++ (converse) threads as the thread package to be used.

  • -tulipthread=<directory> -smarts

    Specifies SMARTS (Shared Memory Asynchronous Runtime System) as the threads package to be used. <directory> gives the location of the SMARTS root directory. Smarts

  • -openmp

    Specifies OpenMP as the threads package to be used. Open MPI

  • -opari=<dir>

    Specifies the location of the Opari OpenMP directive rewriting tool. The use of Opari source-to-source instrumentor in conjunction with TAU exposes OpenMP events for instrumentation. See examples/opari directory. OPARI Note: There are two versions of Opari: standalone - (opari-pomp-1.1.tar.gz) and the newer KOJAK - kojak-<ver>.tar.gz opari/ directory. Please upgrade to the KOJAK version (especially if you're using IBM xlf90) and specify -opari=<kojak-dir>/opari while configuring TAU.

  • -opari_region

    Report performance data for only OpenMP regions and not constructs. By default, both regions and constructs are profiled with Opari.

  • -opari_construct

    Report performance data for only OpenMP constructs and not Regions. By default, both regions and constructs are profiled with Opari.

  • -pdt=<directory>

    Specifies the location of the installed PDT (Program Database Toolkit) root directory. PDT is used to build tau_instrumentor, a C++, C and F90 instrumentation program that automatically inserts TAU annotations in the source code PDT. If PDT is configured with a subdirectory option (-compdir=<opt>) then TAU can be configured with the same option by specifying

    -pdt=<dir> -pdtcompdir=<opt>. 
    
  • -pdtarch=<architecture>

    Specifies the architecture used to build pdt, default the tau architecture.

  • -pcl=<directory>

    Specifies the location of the installed PCL (Performance Counter Library) root directory. PCL provides a common interface to access hardware performance counters on modern microprocessors. The library supports Sun UltraSparc I/II, PowerPC 604e under AIX, MIPS R10000/12000 under IRIX, Compaq Alpha 21164, 21264 under Tru64Unix and Cray Unicos (T3E) and the Intel Pentium family of microprocessors under Linux. This option specifies the use of hardware performance counters for profiling (instead of time). To measure floating point instructions, set the environment variable PCL_EVENT to PCL_FP_INSTR (for example). See the section "Using Hardware Performance Counters" in Chapter 4 for details regarding its usage.

    NOTE: If you want to profile multiple PCL counters set the "-MULTIPLECOUNTERS" options as well. And instead of using the PCL environment variable use COUNTER1, COUNTER2, ... COUNTER25 environment variables to specify the type of counter to profile.. PCL

  • -papi=<directory>

    Specifies the location of the installed PAPI (Performance Data Standard and API) root directory. PCL provides a common interface to access hardware performance counters and timers on modern microprocessors. Most modern CPUs provide on-chip hardware performance counters that can record several events such as the number of instructions issued, floating point operations performed, the number of primary and secondary data and instruction cache misses, etc. To measure floating point instructions, set the environment variable PAPI_EVENT to PAPI_FP_INS (for example). This option (by default) specifies the use of hardware performance counters for profiling (instead of time). When used in conjunction with -PAPIWALLCLOCK or -PAPIVIRTUAL, it specifies the use of wallclock or virtual process timers respectively. See the section "Using Hardware Performance Counters" in Chapter 4 for details regarding its usage.

    NOTE: If you want to profile multiple PAPI counters set the "-MULTIPLECOUNTERS" options as well. And instead of using the PAPI_EVENT environment variable use COUNTER1, COUNTER2, ... COUNTER25 environment variables to specify the type of counter to profile. PAPI

  • -papithreads - Same as papi, except uses threads to highlight how hardware performance counters may be used in a multi-threaded application. When it is used with PAPI, TAU should be configured with -papi=<dir> -pthread autoinstrument Shows the use of Program Database Toolkit (PDT) for automating the insertion of TAU macros in the source code. It requires configuring TAU with the -pdt=<dir> option. The Makefile is modified to illustrate the use of a source to source translator (tau_instrumentor).

  • -PAPIWALLCLOCK

    When used in conjunction with the -papi=<dir> option, this option allows TAU to use high resolution, low overhead CPU timers for wallclock time based measurements. This can reduce the TAU overhead for accessing wallclock time for profile and trace measurements. (See NOTE below.)

  • -PAPIVIRTUAL

    When used in conjunction with the -papi=<dir> option, this option allows TAU to use the process virtual time (time spent in the "user" mode) for profile measurements, instead of the default wallclock time. (See NOTE below.)

  • -CPUTIME

    Specifies the use of user+ system time (collectively CPU time) for profile measurements, instead of the default wallclock time. This may be used with multi-threaded programs only under the LINUX operating system which provides bound threads. On other platforms, this option may be used for profiling single-threaded programs only.

  • -MULTIPLECOUNTERS

    Allows TAU to track more than one quantity (multiple hardware counters, CPU- time, wallclock time, etc.) Configure with other options such as -papi=<dir>, -pcl=<dir>, -LINUXTIMERS, -SGITIMERS, -CPUTIME, -PAPIVIRTUAL, etc. See Section "Using Multiple Hardware Counters" in Chapter 4 for detailed instructions on setting the environment variables COUNTER<1-25> for this option. If -MULTIPLECOUNTERS is used with the -TRACE option, tracing employs the COUNTER1 environment variable for wallclock time.

    NOTE: The default measurement option in TAU is to use the wallclock time, which is the total time a program takes to execute, including the time when it is waiting for resources. It is the time measured from a real-time clock. The process virtual time (-PAPIVIRTUAL) is the time spent when the process is actually running. It does not include the time spent when the process is swapped out waiting for CPU or other resources and it does not include the time spent on behalf of the operating system (for executing a system call, for instance). It is the time spent in the "user" mode. The CPUTIME on the other hand, includes both the time the process is running (process virtual time) and the time the system is providing services for it (such as executing a system call). It is the sum of the process virtual (user) time and the system time (See man getrusage()).

    NOTE: If "-TRACE" and "-MULTIPLECOUNTERS" options are both set the environment variable "COUNTER1" must be set to "GET_TIME_OF_DAY".

  • -jdk=<directory>

    Specifies the location of the installed Java 2 Development Kit (JDK1.2+) root directory. TAU can profile or trace Java applications without any modifications to the source code, byte-code or the Java virtual machine. See README.JAVA on instructions on using TAU with Java 2 applications. This option should only be used for configuring TAU to use JVMPI for profiling and tracing of Java applications. It should not be used for configuring paraprof, which uses Java from the user's path.

  • -dyninst=<dir>

    Specifies the directory where the DynInst dynamic instrumentation package is installed. Using DynInst, a user can invoke tau_run to instrument an executable program at runtime or prior to execution by rewriting it. DyninstAPIPARA-DYN.

  • -vampirtrace=<directory>

    Specifies the location of the Vampir Trace package. With this option TAU will generate traces in Open Trace Format (OTF). More information at Technische Universitat Dresden

  • -vtf=<directory>

    Specifies the location of the VTF3 trace generation package. TAU's binary traces can be converted to the VTF3 format using tau2vtf, a tool that links with the VTF3 library. The VTF3 format is read by Intel trace analyzer, formerly known as vampir, a commercial trace visualization tool developed by TU. Dresden, Germany.

  • -slog2=<directory>

    Specifies the location of the SLOG2 SDK trace generation package. TAU's binary traces can be converted to the SLOG2 format using tau2slog2, a tool that uses the SLOG2 SDK. The SLOG2 format is read by the Jumpshot4 trace visualization software, a freely available trace visualizer from Argonne National Laboratories. [Ref: http://www-unix.mcs.anl.gov/perfvis/download/index.htm#slog2sdk]

  • -slog2

    Specifies the use of the SLOG2 trace generation package and the Jumpshot trace visualizer that is bundled with TAU. Jumpshot v4 and SLOG2 v1.2.5delta are included in the TAU distribution. When the -slog2 flag is specified, tau2slog2 and jumpshot tools are copied to the <tau>/<arch>/<bin> directory. It is important to have a working javac and Java (preferably v1.4+) in your path. On Linux systems, where /usr/bin/java may be a place holder, you'll need to modify your path accordingly.

  • -mpiinc=<dir>

    Specifies the directory where MPI header files reside (such as mpi.h and mpif.h). This option also generates the TAU MPI wrapper library that instruments MPI routines using the MPI Profiling Interface. See the examples/NPB2.3/config/make.def file for its usage with Fortran and MPI programs. MPI

  • -mpilib=<dir>

    Specifies the directory where MPI library files reside. This option should be used in conjunction with the -mpiinc=<dir> option to generate the TAU MPI wrapper library.

  • -mpilibrary=<lib>

    Specifies the use of a different MPI library. By default, TAU uses -lmpi or -lmpich as the MPI library. This option allows the user to specify another library. e.g., -mpilibrary=-lmpi_r for specifying a thread-safe MPI library.

  • -shmeminc=<dir>

    Specifies the directory where shmem.h resides. Specifies the use of the TAU SHMEM interface.

  • -shmemlib=<dir>

    Specifies the directory where libsma.a resides. Specifies the use of the TAU SHMEM interface.

  • -shmemlibrary=<lib>

    By default, TAU uses -lsma as the shmem/pshmem library. This option allows the user to specify a different shmem library.

  • -nocomm

    Allows the user to turn off tracking of messages (synchronous/asynchronous) in TAU's MPI wrapper interposition library. Entry and exit events for MPI routines are still tracked. Affects both profiling and tracing.

  • -epilog=<dir>

    Specifies the directory where the EPILOG tracing package EPILOG is installed.This option should be used in conjunction with the -TRACE option to generate binary EPILOG traces (instead of binary TAU traces). EPILOG traces can then be used with other tools such as EXPERT. EPILOG comes with its own implementation of the MPI wrapper library and the POMP library used with Opari. Using option overrides TAU's libraries for MPI, and OpenMP.

  • -epiloglib=<dir>

    Specifies the directory of where the Epilog library is to be found. Ex: if directory structure is: /usr/local/epilog/fe/lib/ let the install options be: -epilog=/usr/local/epilog -epiloglib=/usr/local/epilog/fe/lib.

  • -epilogbin=<dir>

    Specifies the directory of where the Epilog binaries are to be found.

  • -epiloginc=<dir>

    Specifies the directory of where the epilog's included sources headers are to be found.

  • -MPITRACE

    Specifies the tracing option and generates event traces for MPI calls and routines that are ancestors of MPI calls in the callstack. This option is useful for generating traces that are converted to the EPILOG trace format. KOJAK's Expert automatic diagnosis tool needs traces with events that call MPI routines. Do not use this option with the -TRACE option.

  • -pythoninc=<dir>

    Specifies the location of the Python include directory. This is the directory where Python.h header file is located. This option enables python bindings to be generated. The user should set the environment variable PYTHONPATH to <TAUROOT>/<ARCH>/lib/bindings-<options> to use a specific version of the TAU Python bindings. By importing package pytau, a user can manually instrument the source code and use the TAU API. On the other hand, by importing tau and using tau.run(`<func>'), TAU can automatically generate instrumentation. See examples/python directory for further information.

  • -pythonlib=<dir>

    Specifies the location of the Python lib directory. This is the directory where *.py and *.pyc files (and config directory) are located. This option is mandatory for IBM when Python bindings are used. For other systems, this option may not be specified (but -pythoninc=<dir> needs to be specified).

  • -PROFILE

    This is the default option; it specifies summary profile files to be generated at the end of execution. Profiling generates aggregate statistics (such as the total time spent in routines and statements), and can be used in conjunction with the profile browser racy to analyze the performance. Wallclock time is used for profiling program entities.

  • -PROFILECALLPATH

    This option generates call path profiles which shows the time spent in a routine when it is called by another routine in the calling path. "a => b" stands for the time spent in routine "b" when it is invoked by routine "a". This option is an extension of -PROFILE, the default profiling option. Specifying TAU_CALLPATH_DEPTH environment variable, the user can vary the depth of the callpath. See examples/calltree for further information.

  • -PROFILEPHASE

    This option generates phase based profiles. It requires special instrumentation to mark phases in an application (I/O, computation, etc.). Phases can be static or dynamic (different phases for each loop iteration, for instance). See examples/phase/README for further information.

  • -PROFILESTATS

    Specifies the calculation of additional statistics, such as the standard deviation of the exclusive time/counts spent in each profiled block. This option is an extension of -PROFILE, the default profiling option.

  • -DEPTHLIMIT

    Allows users to enable instrumentation at runtime based on the depth of a calling routine on a callstack. The depth is specified using the environment variable TAU_DEPTH_LIMIT. When its value is 1, instrumentation in the top-level routine such as main (in C/C++) or program (in F90) is activated. When it is 2, only routine invoked directly by main and main are recorded. When a routine appears at a depth of 2 and at 10 and we set the limit at 5, then the routine is recorded when its depth is 2, and ignored when its depth is 10 on the calling stack. This can be used with -PROFILECALLPATH to generate a tree of height <h> from the main routine by setting TAU_CALLPATH_DEPTH and TAU_DEPTH_LIMIT variables to <h>.

  • -PROFILEMEMORY

    Specifies tracking heap memory utilization for each instrumented function. When any function entry takes place, a sample of the heap memory used is taken. This data is stored as user-defined event data in profiles/traces.

  • -PROFILEHEADROOM

    Specifies tracking memory available in the heap (as opposed to memory utilization tracking in -PROFILEMEMORY). When any function entry takes place, a sample of the memory available (headroom to grow) is taken. This data is stored as user-defined event data in profiles/traces. Please refer to the examples/headroom/README file for a full explanation of these headroom options and the C++/C/F90 API for evaluating the headroom.

  • -COMPENSATE

    Specifies online compensation of performance perturbation. When this option is used, TAU computes its overhead and subtracts it from the profiles. It can be only used when profiling is chosen. This option works with MULTIPLECOUNTERS as well, but while it is relevant for removing perturbation with wallclock time, it cannot accurately account for perturbation with hardware performance counts (e.g., L1 Data cache misses). See TAU Publication [Europar04] for further information on this option.

  • -PROFILECOUNTERS

    Specifies use of hardware performance counters for profiling under IRIX using the SGI R10000 perfex counter access interface. The use of this option is deprecated in favor of the -pcl=<dir> and -papi=<dir> options described above.

  • -SGITIMERS

    Specifies use of the free running nanosecond resolution on-chip timer on the R10000+. This timer has a lower overhead than the default timer on SGI, and is recommended for SGIs (similar to the -papi=<dir> -PAPIWALLCLOCK options).

  • -CRAYTIMERS

    Specifies use of the free running nanosecond resolution on-chip timer on the CRAY X1 cpu (accessed by the rtc() syscall). This timer has a significantly lower overhead than the default timer on the X1, and is recommended for profiling. Since this timer is not synchronized across different cpus, this option should not be used with the -TRACE option for tracing a multi-cpu application, where a globally synchronized realtime clock is required.

  • -LINUXTIMERS

    Specifies the use of the free running nanosecond resolution time stamp counter (TSC) on Pentium III+ and Itanium family of processors under Linux. This timer has a lower overhead than the default time and is recommended. When generating trace data with these timers it is recommended that the user set the environment variable TAU_SYNCHRONIZE_CLOCKS to true so that TAU can synchronize the timers.

  • -TRACE

    Generates event-trace logs, rather than summary profiles. Traces show when and where an event occurred, in terms of the location in the source code and the process that executed it. Traces can be merged and converted using tau_merge and tau_convert utilities respectively, and visualized using Vampir, a commercial trace visualization tool. VAMPIR

  • -muse

    Specifies the use of MAGNET/MUSE to extract low-level information from the kernel. To use this configuration, Linux kernel has to be patched with MAGNET and MUSE has to be install on the executing machine. Also, magnetd has to be running with the appropriate handlers and filters installed. User can specify pack- age by setting the environment variable TAU_MUSE_PACKAGE. MUSE

  • -noex

    Specifies that no exceptions be used while compiling the library. This is relevant for C++.

  • -useropt=<options-list>

    Specifies additional user options such as -g or -I. For multiple options, the options list should be enclosed in a single quote. For example

    %./configure -useropt='-g -I/usr/local/stl'
          
  • -help

    Lists all the available configure options and quits.

1.1.2. tau_setup

tau_setup is a GUI interface to the configure and installtau tools.

1.1.3. installtau script

To install multiple (typical) configurations of TAU at a site, you may use the script `installtau'. It takes options similar to those described above. It invokes ./configure <opts>; make clean install; to create multiple libraries that may be requested by the users at a site. The installtau script accepts the following options:

% installtau -help

TAU Configuration Utility 
***************************************************
Usage: installtau [OPTIONS]
  where [OPTIONS] are:
-arch=<arch>  
-fortran=<compiler>  
-cc=<compiler>   
-c++=<compiler>   
-useropt=<options>  
-pdt=<pdtdir>  
-pdtcompdir=<compdir>  
-pdt_c++=<C++ Compiler>  
-papi=<papidir>  
-vtf=<vtfdir>  
-otf=<otfdir>  
-slog2=<dir> (for external slog2 dir)
-slog2 (for using slog2 bundled with TAU)
-dyninst=<dyninstdir>  
-mpiinc=<mpiincdir>  
-mpilib=<mpilibdir>  
-mpilibrary=<mpilibrary>  
-perfinc=<dir> 
-perflib=<dir> 
-perflibrary=<library> 
-mpi
-tag=<unique name> 
-nocomm
-opari=<oparidir>  
-epilog=<epilogdir>  
-epiloginc=<absolute path to epilog include dir> (<epilog>/include default) 
-epilogbin=<absolute path to epilog bin dir> (<epilog>/bin default)  
-epiloglib=<absolute path to epilog lib dir> (<epilog>/lib default)  
-prefix=<dir>  
-exec-prefix=<dir> 
******************************************************************

These options are similar to the options used by the configure script.

1.1.4. Examples:

  • a) Profiling a Multithreaded C++ program (compiled with xlC)

     % configure -pthread
         % make clean; make install
         % set path=($path <TAU DIRECTORY>/rs6000/bin)
         % cd examples/threads
         % make;
         % hello
    

    It has two threads: the profiling data should show functions executing on each thread

    % pprof
    

    This is the text based profile browser.

    % paraprof
    
  • b) TAU with Java

    %./configure -c++=g++ -jdk=/usr/local/packages/jdk1.4
    % make install
    % set path=($path <taudir>/<tauarch>/bin)
    % setenv LD_LIBRARY_PATH \ 
      $LD_LIBRARY_PATH:<taudir>/<tauarch>/lib
    % cd examples/java/pi
    % java -XrunTAU Pi 200000
    % paraprof
    
  • c) Profiling an MPI program using the TAU MPI wrapper library

    % configure -mpi
    % make clean; make install
    % cd examples/pi
    % make
    % poe cpi -procs 4 -rmpool 2
    % pprof or paraprof

    Note: Using the MPI Profiling Interface TAU can generate profile data for all MPI routines as well.

  • d) Profiling an application written in C++ (compiled with icpc) using automatic source code instrumentation and using CPU time instead of (the default) wall-clock time. Download PDT (Program Database Toolkit) from http://www.cs.uoregon.edu/research/pdtoolkit ]

    % cd pdtoolkit-<x>
    % configure  -XLC -prefix=/usr/local/pdt
    % make clean install
    

    Next configure TAU to use PDT for automatic source code instrumentation.

    % cd tau-2.x
    % configure -c++=icpc -cc=icc -pdt=<pdtoolkit root directory> -CPUTIME
    % make clean; make install
    % cd examples/taucompiler/c++
    % make
    

    This takes klargest.cpp, an uninstrumented file, parses it (PDT), and invokes tau_instrumentor, which takes the PDT output and generates an instrumented C++ file, which when linked with the TAU library, generates performance date when executed.

    % klargest
    % pprof
    % paraprof
    
  • e) Use CPUTIME measurements for a multi-threaded application using pthreads under LINUX.

    % configure -pthread -CPUTIME
    
  • f) Use multiple hardware performance counters

    % configure -MULTIPLECOUNTERS -papi=/usr/local/papi \
      -PAPIWALLCLOCK -PAPIVIRTUAL -LINUXTIMERS \
      -mpiinc=/usr/local/mpich/include \
      -mpilib=/usr/local/mpich/ lib/ \
      -pdt=/usr/local/pdtoolkit -useropt=-O2
    % setenv COUNTER1 LINUX_TIMERS
    % setenv COUNTER2 PAPI_FP_INS
    % setenv COUNTER3 PAPI_L1_DCM ...
    
  • g) Use TAU with PDT and MPI on IBM BG/L

    % cd pdtoolkit-3.x
    % configure -XLC -exec-prefix=bgl; make clean install
    % cd tau-2.x
    % configure -mpi -arch=bgl -pdt=/usr/local/pdtoolkit-3.x -pdt_c++=xlC
    
  • h) Tracing an MPI program (compiled with xlC) and displaying the traces in Vampir or VNG using Open Trace Format (OTF)

    % configure -c++=xlC -cc=xlc -fortran=ibm -mpi -otf=/usr/local/otf-1.2.6 -TRACE
    % make clean; make install
    % cd examples/taucompiler/f90
    % make
    % poe ./ring -procs 128  
    % tau_treemerge.pl
    % tau2otf tau.trc tau.edf app.otf -z -n 8
    

    creates a compressed OTF trace (-z) with 8 parallel streams (-n 8). The main OTF file is called app.otf.

    % vampir app.otf
    

    In the Menu, choose Preferences -> Color Styles -> Activities and choose a distinct color for each activity.

  • h) Profiling an OpenMP F90 program using IBM

    % configure -c++=xlC -cc=xlc -fortran=ibm -mpi -opari=<dir> -pdt=<dir> -opari=<dir>
    % cd examples/taucompiler/opari_f90
    % make
    % setenv OMP_NUM_THREADS 2
    % mandel
    % pprof
    

NOTE: Also see Section "Running the Application" in Chapter 2 (Compiling) for an explanation of simple examples that are included with the TAU distribution.

1.1.5. upgradetau

This script is provided to rebuild all TAU configurations previously built in a different TAU source directory. Give this command the location of a previous version of tau followed by any additional configurations and it will rebuild tau with these same options.

1.1.6. tau_validate

This script will attempt to validate a tau installation. it only argument is TAU's architecture directory. These are some options:

  • -v Verbose output

  • --html Output results in HTML

  • --build Only build

  • --run Only run

Here is a simple example:


bash : ./tau_validate --html x86_64 &> results.html
tcsh : ./tau_validate --html x86_64 >& results.html