*****************************************************************************
**			TAU Portable Profiling Package			   **
**			http://www.acl.lanl.gov/tau		           **
*****************************************************************************
**    Copyright 1997-2001				   	   	   **
**    Department of Computer and Information Science, University of Oregon **
**    Advanced Computing Laboratory, Los Alamos National Laboratory        **
**    Research Center Juelich, ZAM Germany			           **
*****************************************************************************
/*******************************************************************
 *                                                                 *
 *        Tuning and Analysis Utilities Installation Procedure     *
 *                           Version 2.10                          *
 *                                                                 *
 *******************************************************************
 *    For installation help, see INSTALL.                          *
 *    For release notes, see README.                               *
 *    For JAVA instructions, see README.JAVA                       *
 *    For licensing information, see LICENSE.                      *
 *    For a tutorial on using TAU, open html/index.html in your    *
 *        web browser.                                             *
 *    For more information, including updates and new releases,    *
 *        see http://www.acl.lanl.gov/tau                          *
 *    For help, reporting bugs, and making suggestions, please     *
 *        send e-mail to tau-bugs@cs.uoregon.edu                   *
 *******************************************************************/


General Installation Procedure: 
-------------------------------
Microsoft Windows users should refer to instructions in Windows-Readme.txt. 

The following instructions are meant for Unix Users.

1.  Configure the package for your system.

After uncompressing and untarring tau, the user needs to configure, compile and
install the package. This is done by invoking:

% ./configure
% make install

TAU is configured by running the configure script with appropriate options that
select the profiling and tracing components that are used to build the TAU 
library.  The `configure' shell script attempts to guess correct values for 
various system-dependent variables used during compilation, and creates the 
Makefile(s) (one in each subdirectory of the source directory).

% ./configure -help 
TAU Configuration Utility 
***********************************************************************
Usage: configure [OPTIONS]
  where [OPTIONS] are:
-c++=<compiler>  ............................ specify the C++ compiler.
-cc=<compiler> ................................ specify the C compiler.
-useropt='<parameters>' ............... list of commandline parameters.
-pthread .................................. Use pthread thread package.
-sproc .................................. Use SGI sproc thread package.
-tulipthread=<dir> .......... Specify location of Tulip/Smarts package.
-smarts .................. Use SMARTS API for threads (use with above).
-openmp ........................................... Use OpenMP threads.
-opari=<dir>... Specify location of Opari OpenMP tool (use with above).
-pcl=<dir> ..... Specify location of PCL (Performance Counter Library).
-papi=<dir> ............... Specify location of PAPI (Performance API).
-pdt=<dir> ........ Specify location of PDT (Program Database Toolkit).
-jdk=<dir> ...... Specify location of JAVA 2 Development Kit (jdk1.2+).
-dyninst=<dir> ................... Specify location of DynInst Package.
-mpiinc=<dir> ............. Specify location of MPI include dir and use
                           the TAU MPI Profiling and Tracing Interface.
-mpilib=<dir> ............. Specify location of MPI library dir and use
                           the TAU MPI Profiling and Tracing Interface.
-TRACE ..................................... Generate TAU event traces.
-PROFILE ............ Generate profiles (summary statistics) (default).
-PROFILESTATS .................. Enable standard deviation calculation.
-PROFILECOUNTERS .... Use Hardware Performance Counters (default time).
-SGITIMERS .......... Use fast nanosecond timers on SGI R10000 systems.
-CPUTIME .......... Use usertime+system time instead of wallclock time.
-PAPIWALLCLOCK ........ Use PAPI to access wallclock time. Needs -papi.
-PAPIVIRTUAL   .......... Use PAPI for virtual (user) time calculation.
-noex .................. Use no exceptions while compiling the library.
-help ...................................... display this help message.

***********************************************************************

The following  command-line options are available to configure:

-prefix=<directory>
   
   Specifies the destination directory where the header, library and binary 
   files are copied. By default, these are copied to subdirectories <arch>/bin 
   and <arch>/lib in the TAU root directory. 
   
-arch=<architecture>
   
   Specifies the architecture. If the user does not specify this option, 
   configure determines the architecture. For SGI, the user can specify either 
   of sgi32, sgin32 or sgi64 for 32, n32 or 64 bit compilation modes 
   respectively. The files are installed in the <architecture>/bin and 
   <architecture>/lib directories.
   
-c++=<C++ compiler>
   
   Specifies the name of the C++ compiler. Supported  C++ compilers include  
   KCC (from KAI), CC,  g++ (from GNU), FCC (from Fujitsu) and pgCC (from PGI). 
   
-cc=<C Compiler>
   
   Specifies the name of the C compiler. Supported C compilers include cc, 
   gcc (from GNU), pgcc (from PGI), fcc (from Fujitsu) and KCC (from KAI).
   
-pthread
   
   Specifies pthread as the thread package to be used. In the default mode, no 
   thread package is used. 
   
-tulipthread=<directory>
   
   Specifies Tulip threads (HPC++) as the threads package to be used as well 
   as the location of the root directory where the package is installed. 
   [ Ref: http://www.acl.lanl.gov/tulip ]
   
-tulipthread=<directory> -smarts
   
   Specifies  SMARTS (Shared Memory Asynchronous Runtime System) as the 
   threads package to be used. <directory> gives the location of the SMARTS 
   root directory. [ Ref: http://www.acl.lanl.gov/smarts ]

-openmp
   Specifies OpenMP as the threads package to be used. 
   [ Ref: http://www.openmp.org ]

-opari=<dir>
   Specifies the location of the Opari OpenMP directive rewriting tool. 
   The use of Opari source-to-source instrumentor in conjunction with
   TAU exposes OpenMP events for instrumentation. 
   [ Ref: http://www.fz-juelich.de/zam/kojak/opari/ ]
   
-pdt=<directory>
   
   Specifies the location of the installed PDT (Program Database Toolkit) root 
   directory. PDT is used to build tau_instrumentor, a C++, C and F90 
   instrumentation program that automatically inserts TAU annotations in the 
   source code.  
   [ Ref: http://www.acl.lanl.gov/pdtoolkit ]
   
-pcl=<directory>
  
   Specifies the location of the installed PCL (Performance Counter Library) 
   root directory. PCL provides a common interface to access hardware 
   performance counters on modern microprocessors. The library supports 
   Sun UltraSparc I/II, PowerPC 604e under AIX, MIPS R10000/12000 under IRIX, 
   Compaq Alpha 21164, 21264 under Tru64Unix and Cray Unicos (T3E) and the 
   Intel Pentium family of microprocessors under Linux. This option specifies 
   the use of hardware performance counters for profiling (instead of time).  
   To measure floating point instructions, set the environment variable 
   PCL_EVENT to PCL_FP_INSTR (for example). Refer to the TAU User's Guide or
   PCL Documentation (pcl.h) for other event names.
   [ Ref : http://www.fz-juelich.de/zam/PCL ]

-papi=<directory>

   Specifies the location of the installed PAPI (Performance API) root 
   directory. PAPI specifies a standard application programming interface (API)    
   for accessing hardware performance counters available on most modern 
   microprocessors similar. To measure floating point instructions, set the
   environment variable PAPI_EVENT to PAPI_FP_INS (for example). Refer to the
   TAU User's Guide or PAPI Documentation for other event names.
   [ Ref : http://icl.cs.utk.edu/projects/papi/api/ ]
   
-jdk=<directory>
   Specifies the location of the Java 2 development kit (jdk1.2+). See
   README.JAVA on instructions on using TAU with Java 2 applications. 

-dyninst=<directory>
   Specifies the location of the DynInst (dynamic instrumentation) package. 
   See README.DYNINST for instructions on using TAU with DynInstAPI for 
   binary runtime instrumentation (instead of manual instrumentation). 
   [ Ref: http://www.cs.umd.edu/projects/dyninstAPI/ ]

-mpiinc=<dir>
   
   Specifies the directory  where mpi header files reside (such as mpi.h and 
   mpif.h). This option also generates the TAU MPI wrapper library that 
   instruments MPI routines using the MPI Profiling Interface. See the 
   examples/NPB2.3/config/make.def file for its usage with Fortran and MPI 
   programs and examples/pi/Makefile for a C++ example that uses MPI. 
   
-mpilib=<dir>
   
   Specifies the directory where mpi library files reside. This option should 
   be used in conjunction with the -mpiinc=<dir> option to generate the TAU 
   MPI wrapper library. 
   
-PROFILE 

   This is the default option; it specifies summary profile files to be 
   generated at the end of execution. Profiling generates aggregate statistics 
   (such as the total time spent in routines and statements), and can be used 
   in conjunction with the profile browser racy to analyse the performance. 
   Wallclock time is used for profiling  program entities. 
   
-PROFILESTATS
   
   Specifies the calculation of additional statistics, such as the standard 
   deviation of the exclusive time/counts spent in each profiled block. This 
   option is an extension of -PROFILE, the default profiling option.
   
-PROFILECOUNTERS
   
   Specifies use of hardware performance counters for profiling under IRIX  
   using the SGI R10000 perfex counter access interface. The use of this option 
   is deprecated in favor of the -pcl=<dir> and -papi=<dir> options described 
   above. 
   
-SGITIMERS
   
   Specifies use of the free running nano-second resolution on-chip timer on 
   the MIPS R10000. This timer has a lower overhead than the default timer on 
   SGI, and is recommended for SGIs. 

-CPUTIME
   Uses usertime + system time instead of wallclock time. It gives the CPU
   time spent in the routines.  This currently works only on LINUX systems 
   for multi-threaded programs and on all systems for single-threaded programs. 
   
-PAPIWALLCLOCK
   Uses PAPI (must specify -papi=<dir> also) to access high resolution CPU 
   timers for wallclock time. The default case uses gettimeofday() which 
   has a higher overhead than this. 

-PAPIVIRTUAL
   Uses PAPI (must specify -papi=<dir> also) to access process virtual time.
   This represents the user time for measurements. 


-TRACE
   
   Generates event-trace logs, rather than summary profiles. Traces show when 
   and where an event occurred, in terms of the location in the source code and
   the process that executed it. Traces can be merged and converted using 
   tau_merge and tau_convert utilities respectively, and  visualized using 
   Vampir, a commercial trace visualization tool. [ Ref http://www.pallas.de ]
   
-noex
   
   Specifies that no exceptions be used while compiling the library. This is 
   relevant for C++. 
   
-useropt=<options-list>
   
   Specifies additional user options such as -g or -I.  For multiple options, 
   the options list should be enclosed in a single quote.
   
-help
   
   Lists all the available configure options and quits. 

   Examples:

   % ./configure -c++=KCC 
   Use TAU with KCC
 
   % ./configure -c++=CC -useropt='-g -I/local/apps/STL/'
   Use TAU with SGI CC and add the above user defined options to the 
   commandline.

   % ./configure -TRACE -PROFILE 
   Enable both profiling and tracing.

   % ./configure -c++=KCC -SGITIMERS -tulipthread=/home/smarts/build/smarts-1.0
     -smarts -arch=sgin32 -prefix=/usr/local/packages/tau
   Use TAU with KCC and fast nanosecond timers on SGI and use SMARTS with -n32
   options and install the files in /usr/local/packages/tau

   % ./configure -c++=KCC -cc=cc -arch=sgi64 -mpiinc=/local/apps/mpich/include
     -mpilib=/local/apps/mpich/lib/IRIX64/ch_p4 -SGITIMERS -pdt=/local/apps/pdt
   Use TAU with KCC, and cc on 64 bit SGI systems and use MPI wrapper libraries
   with SGI's low cost timers and use PDT for automated source code 
   instrumentation.

   % ./configure -c++=guidec++ -cc=guidec -papi=/usr/local/packages/papi -openmp
     -mpiinc=/usr/packages/mpich/include -mpilib=/usr/packages/mpich/lib
   Use OpenMP+MPI using KAI's Guide compiler suite and use PAPI for accessing
   hardware performance counters for measurements.

2. Compilation.

   Type `make install' to compile the package. 
   Type `make tests' to compile the example programs that are included with
   this distribution.

   Make installs the library and its stub makefile  in <prefix>/<arch>/lib 
   subdirectory and installs utilities such as pprof and racy in 
   <prefix>/<arch>/bin subdirectory.

   
   Add to your .cshrc file the $(TAU_ARCH)/bin subdirectory.
   e.g.,
   # in .cshrc file
   set path=($path /usr/local/packages/tau/sgi64/bin)

3. Instrumentation.
   JAVA requires no special instrumentation. To use TAU with JAVA, the 
   LD_LIBRARY_PATH environment variable must have the TAU <arch>/lib directory
   in its path. See README.JAVA on instructions regarding its usage.
   For other languages such as C++, C, and Fortran 90, TAU instrumentation in 
   the form of macros or routines must be added  to the source code to 
   identify routine transitions. It can be done automatically using the C++ 
   instrumentor - tau_instrumentor,  based on the Program Database Toolkit, or 
   manually using the instrumentation API (Application Programmers Interface). 
   The API is explained in detail in the documentation available at
   http://www.acl.lanl.gov/tau download page and can be seen in the examples 
   directory. This process involves identifying functions and associating each 
   function with one or more TAU profile groups. This enables selectively 
   profiling groups of functions. By default all instrumented functions that 
   are invoked are profiled.
   
   % cd examples/instrument
   % ./simple
   % pprof
   % racy

   To use tau_instrumentor, the C++ source code instrumentor: 
   a. Install pdtoolkit. [ Ref: http://www.acl.lanl.gov/pdtoolkit ] 
      % ./configure -arch=IRIX64 -KCC

   b. Install TAU using the -pdt configuration option.
      % ./configure -pdt=/usr/local/packages/pdtoolkit-1.0 -c++=KCC -arch=sgi64 

   c. Modify the makefile to invoke cxxparse from PDT which generates a 
      program database file (.pdb) that contains program  entities (such as 
      routine locations) and tau_instrumentor that uses the .pdb file and the 
      C++ source code to generate an instrumented version of the source code.  
      See examples/autoinstrument/Makefile. 
      
      % cd examples/autoinstrument; make
      % klargest 
      % pprof

   To illustrate the use of TAU Fortran 90 instrumentation API, we have 
   included the NAS Parallel Benchmarks 2.3 LU and SP suites in the 
   examples/NPB2.3 directory [Ref http://www.nas.nasa.gov/NAS/NPB/ ].
   See the config/make.def makefile that shows how TAU can be used with 
   MPI  (with the TAU MPI Wrapper library) and Fortran 90. To use this, TAU
   must be configured using the -mpiinc=<dir>  and -mpilib=<dir> options. The
   default Fortran 90 compiler used is f90. This may be changed by the user in
   the makefile. LU is completely instrumented and uses the instrumented MPI
   library whereas SP has minimal instrumentation in the top level routine
   and relies on the instrumented MPI wrapper library. 
 
4. Racy.

   Racy is the GUI for TAU performance analysis. It brings up a project
   management window. Type in any filename with a .pmf extension. e.g.,
   matrix.pmf and it will bring up the racy main window.

5. TAU System Requirements :
   -------------------------
I) The Profiling Library needs a recent C++ compiler. Our recommended list:
	a) Kuck and Associates' (http://www.kai.com) KCC compiler
	b) KAI's KAP/Pro (http://www.kai.com) OpenMP guidec++ compiler
	c) SGI (http://www.sgi.com) MipsPro 7.2+ CC compiler 
	d) PGI (http://www.pgroup.com) 3.0 pgCC compiler for Linux
	e) GNU (http://www.gnu.org) gcc-2.95 g++ compiler
	f) IBM (http://www.ibm.com) xlC C++ compiler for IBM SP
        g) SUN (http://www.sun.com) Sun CC 5.0+ compiler
        h) Compaq (http://www.compaq.com) cxx 6.x compiler  
 
II) Platforms :
   TAU has been tested on 
	a) SGI IRIX 6.5 systems (Origin 2000) with KCC, CC, g++, guidec++.
	b) LINUX x86 PC clusters with 
		i) 	KAI KCC compiler, 
		ii) 	GNU g++/egcs compiler,
		iii)	PGI pgCC, pgcc, pgf90 compiler suite,
	        iv) 	Fujitsu C++/f90 compiler suite,
		v)      KAI KAP/Pro compiler suite.
	c) Sun Solaris2 with g++, KCC. 
	d) HP PA-RISC systems running HP-UX with g++. 
	e) Cray T3E with Cray C++ compiler, and KAI KCC.
	f) Compaq Tru64 Alpha with g++, cxx.
        g) Compaq Alpha Linux clusters with g++.
	i) Microsoft Windows. Tested with MS Visual C++ v5.1.  
	j) IBM SP AIX (RS6000) systems with KCC, and xlC compilers.
	k) PowerPC Linux with g++
	l) IA-64 Linux with g++ and SGI Pro64 compilers.
	   

   TAU may work with minor modifications on other platforms.
	
III) Software Requirements :
   a) Tcl/Tk
   TAU's GUI racy needs Tcl 7.4/Tk 4.0 or better. The default is 8.0. 
   Tcl/Tk can be downloaded from http://www.scriptics.com 
   
   b) xauth
   The display needs to be secure. xhost+ should not be used. Xauth style
   security is required. See TAU FAQ on how to use this. Contact your 
   system administrator if your X-server is not configured for Xauth 
   cookies. 

   c) xrdb
   The configure script ensures that the display is ok using xrdb.

    
6. Modifying user's Makefile for Tracing/Profiling.

   TAU provides a makefile stub file which is placed in the installation
   directory <prefix>/<arch>/lib/Makefile.tau[-optionlist]. Users need to 
   include this makefile and use the make variables TAU_INCLUDE TAU_LIBS
   and TAU_DEFS appropriately in their makefiles. See (examples/instrument
   Makefile)  

7. Examples of configuration and usage on the IBM SP
        
     % cd tau-2.x
     Example I:
     Profiling a Multithreaded C++ program (compiled with xlC)
     
     % configure -pthread
     % make clean; make install
     % set path=($path <TAU DIRECTORY>/rs6000/bin)
     % cd examples/threads
     % make; 
     % hello
     
       It has two threads: the profiling data should show functions executing on
       each thread
     % pprof
       This is the text based profile browser.
     % racy  
       This is the gui - type hello.pmf and click on "Create" in the project 
       management window.
     
     Example II:
     Profiling an MPI program using the TAU MPI wrapper library.
     
     % configure -mpiinc=/usr/lpp/ppe.poe/include -mpilib=/usr/lpp/ppe.poe/lib
     % make clean; make install
     % cd examples/pi
     % make CXX=mpCC
       It is very important to compile the application with the mp version of the
       compiler for MPI jobs. e.g., mpKCC, mpCC, etc.
     % poe cpi -procs 4 -rmpool 2
     % pprof or racy
       Note: Using the MPI Profiling Interface TAU can generate profile data for 
       all MPI routines as well.
     
     Example III:
     Profiling an application written in C++ (compiled with KCC) using automatic 
     source code instrumentation and using CPU time instead of (the default) 
     wallclock time.
     [ For KCC you'll need % module load KCC]
     Download PDT (Program Database Toolkit) from http://www.acl.lanl.gov/pdtoolkit
     
     % cd pdtoolkit-1.x
     % configure 
     % make ; make install
       This takes a while...
     
     Next configure TAU to use PDT for automatic source code instrumentation.
     % cd tau-2.x
     % configure -c++=KCC -cc=cc -pdt=<pdtoolkit-1.x root directory> -CPUTIME
     		e.g.,   ... -pdt=/u1/sameer/pdtoolkit-1.3 ...
     % make clean; make install
     % cd examples/autoinstrument
     % make 
       This takes klargest.cpp, an uninstrumented file, parses it (PDT), and 
       invokes tau_instrumentor, which takes the PDT output and generates an 
       instrumented C++ file, which when linked with the TAU library, generates
       performance date when executed.
     % klargest
     % pprof
     % racy
     
     Example IV:
     Tracing an MPI program (compiled with KCC) and displaying the traces in 
     Vampir.
     
     % configure -c++=KCC -cc=cc -mpiinc=/usr/lpp/ppe.poe/include 
       	  -mpilib=/usr/lpp/ppe.poe/lib -TRACE
     % make clean; make install
     % cd examples/pi
     % make CXX=mpKCC
     % poe cpi -procs 4 -rmpool 2 2000
       Calculate the value of pi using 2000 iterations. 
     
     % tau_merge tautrace.*.trc cpi.trc
     % tau_convert -vampir cpi.trc tau.edf cpi.pv
     
     % vampir cpi.pv 
     
     In the Menu, choose Preferences -> Color Styles -> Activities and choose a 
     distinct color for each activity. 
     
     Example V:
     Profiling an OpenMPI (OpenMP + MPI) C program using xlC.
     % configure -openmp -mpiinc=/usr/lpp/ppe.poe/include 
         -mpilib=/usr/lpp/ppe.poe/lib  
     % cd examples/openmpi
     % make CXX=mpCC_r CC=mpcc_r
     % setenv OMP_NUM_THREADS 2
     % poe stommel -procs 2 -rmpool 2 
     % pprof
   
8. Using TAU with POOMA
   Set the environment variable TAUDIR to point to the directory where TAU is
   installed. Follow the following procedure.

    FOR POOMA/SMARTS Users:
    -----------------------
    1. Configure PDT
    ****************
    % cd /usr/local/packages/pdtoolkit-1.0
    [FOR SGI]
    % configure -KCC -arch=IRIX64
    [FOR LINUX PCs]
    % configure -KCC
    
    2. Configure TAU
    ****************
    % cd /usr/local/packages/tau-2.7
    [FOR SGI]
    % ./configure -arch=sgi64 -c++=KCC -tulipthread=/usr/local/packages/smarts-1.0 -smarts -SGITIMERS -pdt=/usr/local/packages/pdtoolkit-1.0
    [FOR LINUX PCs]
    % ./configure -arch=linux -c++=KCC -tulipthread=/usr/local/packages/smarts-1.0 -smarts -pdt=/usr/local/packages/pdtoolkit-1.0
    % make install
    
    3. Configure SMARTS
    *******************
    % cd /usr/local/packages/smarts-1.0
    [FOR SGI]
    % configure --with-arch=iris4d --prefix /usr/local/packages/smarts-1.0 --with-taudir=/usr/local/packages/tau-2.7 --enable-64bit --enable-profile
    [FOR LINUX PCs]
    % configure --with-arch=i386-linux --prefix /usr/local/packages/smarts-1.0 --with-taudir=/usr/local/packages/tau-2.7 --enable-profile
    % make
    % make install
    
    4. Configure Pooma II 
    *********************
    % setenv TAUDIR     /usr/local/packages/tau-2.7
    % setenv PDTDIR	    /usr/local/packages/pdtoolkit-1.0
    % setenv SMARTSDIR  /usr/local/packages/smarts-1.0
    [FOR SGI]
    % ./configure --arch SGI64KCC --suite PP --parallel --profile --opt --ex
    [FOR LINUX PCs]
    % ./configure --arch LINUXKCC --suite PP --parallel --profile --opt --ex
    % setenv POOMASUITE PP
    % make
    % cd examples/Solvers/SimpleJacobi
    % make
    % cd $POOMASUITE
    % SimpleJacobi --pooma-threads <n>
    % pprof
    % racy 
       
    
    
    
