*****************************************************************************
**                      TAU Portable Profiling Package                     **
**                      http://tau.uoregon.edu                             **
*****************************************************************************
**    Copyright 1997-2011						   **
**    Department of Computer and Information Science, University of Oregon **
**    Research Centre Juelich, Germany                                     **
**    Advanced Computing Laboratory, Los Alamos National Laboratory        **
*****************************************************************************

Change log:
------------
Version 2.21 changes (from 2.20):
1. Added support for rewriting binaries using Maqao (tau_rewrite) using PDT 3.17+.
2. Added support for event based sampling based on unwinding (TAU_EBS_UNWIND=1).
3. Added support for OpenSHMEM.
4. Added support for context events and atomic events in Score-P.
5. Added support for H2 database in PerfDMF/PerfExplorer. 
6. Added support for Eclipse remote component. 
7. Added support for UPC instrumentation in TAU. 
8. Added support for CUDA and CUPTI v4.1 for NVIDIA. 
9. Added a new view for visualizing atomic and context events in 3D topology display in ParaProf.
10. Added Opari2 for OpenMP instrumentation (with support for OpenMP 3.0 constructs). 
11. Added support for CUBE4 reader in ParaProf. 
12. Added debugging support with TAU_TRACK_SIGNALS=1 to capture callstack as metadata at point of failure.
13. Added TAU_SUMMARY=1 and pprof -f profile.Min/Max with paraprof -dumpsummary to generate just min/max/stddev/mean statistics instead of per-node data.
14. Added support for systemwide tracking using tau.conf. 
15. Added support for MINGW compiler to create binaries for Windows.
16. Added support for Cray XK6 CUDA with Cray CCE compilers v8.0.x.
17. Added support for UPC compiler scripts (tau_upc.sh, tauupc).
18. Updated Opari2. 
19. Added EBS based on callstack unwinding.
20. Added Cray DMAPP library wrapper (configure -dmapp, -optTrackDMAPP TAU_OPTIONS).
21. Added -useropt=-DTAU_SYSTEMWIDE_TRACK_MSG_SIZE_AS_CTX_EVENT for tracking message size with context events.
22. Added -optLinkOnly for compiling .o file without instrumentation, then linking in TAU libraries.
23. Added support for IBM BlueGene/Q (-arch=bgq in configure).
24. Added support for callsite profiling (TAU_CALLSITE=1 at runtime). 
25. Added an updated Opari2. 
25. Added support for OpenACC profiling using the updated PGI 12.3 runtime library instrumentation API.
26. Cupti and OpenCL updates (tracking time spent in queued on the GPU).
27. Updates to ParaProf for IBM BG/Q topology displays. 
28. Updates to tau_rewrite for Maqao instrumentation. 
29. Added tau_macro.sh tool used for pre-processing (-optPreProcess) for Opari2.
30. Added support for -arch=arm_linux.
31. Added jogl.jar and libjogl.* changes for IBM BG/Q and ARM Linux.
32. Enhancements for TAU sampling and TAU_TRACK_SIGNALS for IBM BG/Q.
33. Enhancements to TAU to support parameter and phase based profiling in Score-P. 
34. Added support for Cray and Berkeley UPC runtime library wrapper using -optTrackUPCR TAU_OPTIONS.
35. Reduced MPI overhead using a hash table, fixed BGP & BGQTIMERS. 
36. Added support for ARM (arm_linux) and Intel MIC. 
37. Added support for OTF2 (tau2otf2 -> tau2otf). 



Version 2.20 changes (from 2.19):
1. Added support for GPGPUs using tau_exec -cuda and tau_exec -opencl. Configure with -cuda=<dir>.
2. Added support for 3D topology displays in paraprof. 
3. Added support for tracking per-communicator performance data -PROFILECOMMUNICATORS.
4. Added support for binary rewriting of static executables using DyninstAPI 7.0. 
5. Added support for derived metrics, cut, copy and paste of metrics in paraprof, perfexplorer.
6. Added support for sampling based measurements in tau_exec -ebs* (README.sampling).
7. Added support for ARMCI profiling (GA v5.0) in tau_exec -armci. Configure with -armci=<dir>.
8. Added support for loop level instrumentation in the binary rewriter (tau_run a.out -o a.i -f select.tau).
9. Added preliminary support for Eclipse PTP remote components. 
10. Integrated tau_exec and tau_wrap (tau_exec -loadlib=<dir>/libwrapped.so) for instrumenting an external library.
11. Added support for pre-computing mean and std. dev. at the end of execution with TAU_PROFILE_FORMAT=merged.
12. Added support for compiler-based instrumentation for Cray CCE compilers. 
13. TAU ported to Cray XE6 with updated default directories (gemini).  
14. Improved support for Cray Shmem. 
15. Improved support for thread based communication in tau2otf and ParaVer support in tau_convert -paraver.
16. Throttled functions are now marked explicitly in their names [THROTTLED]. 
17. Before reverting to compiler based instrumentation the user is prompted unless -optRevert is specified. 
18. Updates to memory and I/O tracking support for Mac OS X (in tau_exec).
19. Added support for the Yorick programming language. 
20. Added Java profiling support using JVMTI.
21. Added support for Score-P (www.score-p.org) measurement system. Configure with -scorep=<dir>.
22. Added support for importing Google perftools performance data (paraprof -f google).
23. Added support for profiling CUDA kernels on a separate thread.
24. Bug fixes (papi+tracing, profile merged with tau_exec -io, outer-loop level instrumentation dyinstAPI)
25. Added support for topology display with user specified topologies loadable from a text file in paraprof. 
26. Added support for PGI 11.x compilers with support for accelerator primitives. 
27. Added support for a scrollbar in the paraprof 3D window.
28. CUDA 4.0 support added and tracking of gpgpu thread execution is now supported. 
29. Bug fixes for PGI -optCompInst for C++, tau_exec -memory for Intel -optCompInst. 
30. PostgreSQL jar file support updated in PerfDMF. 
31. Added support for Score-P LD_PRELOAD'ing using tau_exec and tau_run.
32. Added support for SOL2CC compiler for Solaris CC compiler. 
33. Bug fixes for tau_instrumentor (single line Fortran DO loop instrumentation).
34. Added support for Cray CCE -optCompInst for OpenMP.
35. Bug fix for PAPI and TAU_TRACE=1 for initialization of papi. 
36. Added support for demangling kernel names for CUDA executions. 
37. Added a new tool tau_gen_wrapper <header> <lib> that generates wrappers. 
38. Fixed papi initialization bug with C++ static ctors on Cray XE6. 
39. Scrollbars in paraprof 3D window. 
40. Support for NAG Fortran.
41. Update for tau_run rewriter for getting MPI rank from executable. 
42. Sped up paraprof/perfexplorer's loading of trials and computing derived metrics. 
43. Perfexplorer bugs fixed with custom charts. New charting option in custom charts for a single event in multiple experiments. 
44. Collating performance data bug fixed. 
45. Cray CCE -optCompInst bug fixed.
46. Added support for SHMEM communication tracking (profiling + tracing).
47. Added API in TAU to track one sided communication from a remote node.
48. Enhancements and bug fixes in tau_wrap.
49. EBS: Added profiling support for event based sampling.
50. Added support for system wide TAU configuration based on <taudir>/tau_system_defaults/tau.conf file. Support job id tracking in file name (PROFILEDIR). 
51. Cleaned up GPGPU support with CUPTI.
52. Updated Paraprof 3D topology display with interval event and atomic event selection. 


-------------------------------
Version 2.19 changes (from 2.18):
1. Added support for Chapel.
2. Added support for UPC (Berkeley upcc compiler, GASP, supports -optCompInst).
3. Added support for automatically generating LD_PRELOADable .so files from a .h file using tau_wrap (-r libname.so).
4. Derived metrics window in ParaProf and PerfExplorer.
5. -bfd=download now downloads binutils-2.20 instead of 2.18.
6. Added an aligned stacked bar chart in PerfExplorer that is similar to unchecking "Stack Bars Together" box in paraprof's options window.
7. Paraprof now automatically adjusts the memory used (tau_javamax.sh)
8. Added support for Intel compilers on Cray XT systems.
9. tau_validate now uses TAU_VALIDATE_PARALLEL and TAU_VALIDATE_SERIAL env vars to run the tests (see --help). 
10. Added support for external tool configuration in PerfExplorer.
11. Updated PerfExplorer code to Weka 3.6.1.
12. Added support for DBSCAN clustering.
13. Updated Jython support to 2.5.1 that supports Python v2.5. 
14. Created a utility to reconstruct a Paraver trace from TAU EBS samples.
15. Paraprof 3D communication matrix display has cross hairs and value boxes.
16. Enabled tree selection model for multi-selection.
17. New expression parsing window in Paraprof.
18. Paraprof 3D windows now work on IBM BG/P using ppc64 JOGL.
19. Group changer window in paraprof. 
20. Added support for outer loop level instrumentation in tau_instrumentor's spec file mode. 
21. When PDT based source instrumentation fails, compiler-based instrumentation is used as a fallback. Disable with -optNoCompInst in TAU_OPTIONS env. var.
22. Added support for PAPI-C v4.0 in TAU. Retains backward compatibility with earlier PAPI versions. 
23. Added support for Cray CCE compilers on XT systems (module PrgEnv-cray).
24. Added support for tracking pthread barrier wait times. 
25. Added support for TAU over MRNet.
26. Added a new tool (tau_exec) to evaluate I/O, memory and communication
27. PerfExplorer has a new derived metric pane and updates to configuration
28. tau_exec can also load wrapper libraries created by tau_wrap using -loadlib
29. Added support for sampling based profiling (README.sampling)
30. Added support for SHMEM wrappers for Cray XT
31. Added support for Cray XMT (-arch=crayxmt)
32. Added support for ScoreP (aka silc).
33. ParaProf 2D and 3D communication matrices show nodes and not threads. 
34. Totals for context events and atomic events are now accessible in paraprof.
35. Refined the support for event based sampling (EBS). 




Version 2.18 changes (from 2.17):
1. Added support for PIN based runtime instrumentation for Windows.
2. Added support for compiler based instrumentation for PGI and IBM compilers.
3. Added support for thread-safe throttling of events.
4. Added support for reading annotated snapshot perfsuite profiles generated by TAU.
5. Enhanced support for GNU based compiler instrumentation to support instrumented shared objects. 
6. Added support for parsing C99 code using PDT. 
7. Python API enhancements include support for setting node, data purging and exit.
8. tau_ompcheck now supports more OpenMP directives. 
9. PerfExplorer now includes a CQoS classifier and a gap computing module.
10. Compiler based instrumentation supports selective instrumentation for the file level. 
11. Added a -bfd=download configure option which will download and build binutils with -fPIC for compiler based instrumentation. 
12. Added support for Intel v11 compilers. 
13. Snapshot enhancements (read snapshot.*.*.* files, paraprof -f snapshot)
14. Better configuration support for SiCortex, IBM BG/P and Cray XT5 (PGI, Pathscale, GNU), Intel compilers for Apple.
15. Added support for preloading pthread calls enable with -useropt=-DTAU_PTHREAD_PRELOAD
16. Added support for -DISABLESHARED configure option that does not build libTAU.so.
17. Callpath profiling is now a runtime option enabled by setting the TAU_CALLPATH env var.
18. PGI Accelerator API support, pgi 7.x stl string fix for OpenMP, pgi 8.0 Fortran compiler based instrumentation fix.
19. Python C-API offers lower overhead than legacy ltau.py API. Now it is the default. 
20. TAU_PROFILE_FORMAT "merged" generates merged profiles. 
21. perfdmf_configure --create-default creates a Derby DB without any questions.
22. Eclipse plugin uses a new XML workflow for uploading performance data on disk to db. 
23. 3D Communication Matrix in ParaProf.
24. TAU_CALLPATH_DEPTH of 0 and 1 for context events.
25. TAU_TRACK_HEAP and TAU_TRACK_HEADROOM environment variables - samples at function entry and exit, context events.
26. PerfExplorer has a working Derby database that can be used to load ppk files. 
27. Default mode: TAU_TRACE=1 disables TAU_PROFILE. You may set it to get both.



Version 2.17 changes (from 2.16):
1. Added support for IBM BG/P (-arch=bgp).
2. Added a new tool for generating wrapper libraries, tau_wrap.
3. Improvements in Eclipse plugin for external tool support.
4. Improvements in paraprof and perfexplorer.
5. Improvements for SiCortex support and tauex. 
6. Added support for atomic events in TAU library layered over VampirTrace.
7. Added a Posix I/O wrapper (-iowrapper) for tracking volume and bandwidth of I/O.
8. Added an MPI wrapper library for Windows Cluster 2003. 
9. Added support for Scalasca 1.0. Works with both Kojak and Scalasca. 
10. Added Opari in TAU.
11. Added IBM BG/P metadata for torus node information in profiles.
12. PerfExplorer adds support for user-defined events and improvements in custom charts.
13. Posix I/O tracking implemented without need for enabling profiling (for tracing).
14. Improved tau_inc.pl for generating include lists for Scalasca/Kojak based on callpath profiling. 
15. Eclipse TAU plugin has support for two stage communication analysis.
16. Added -BGPTIMERS for IBM BG/P. Compatible with -BGLTIMERS.
17. Env vars TAU_VERBOSE, TAU_SYNCHRONIZE_CLOCKS, TAU_PROFILE_FORMAT (snapshot)
18. GCC 4.3.0 compatibility
19. Added bandwidth and bytes written info for MPI I/O write routines.
20. Added support for GNU, PathScale and PGI compilers on Cray XT systems [ORNL].
21. ParaProf can now generate selective instrumentation files.
22. TAU_THROTTLE = 0 disables throttling of events. Use TAU_VERBOSE=1 to see it.
23. perfdmf_configure now stores weka.jar files in ~/.ParaProf directory.
24. Added support for DyninstAPI 5.2.
25. Bug fixes for tau_instrumentor, context events, and tau2slog2.
26. Added support for pointer based profiling API (examples/profilercreate/README) [LSU].
27. -spec option for tau_instrumentor allows generic timer instrumentation support [FZJ].
28. Paraprof allows new windows for multiple metrics to compare data [SiCortex].
29. Posix I/O tracking now uses context events instead of user-defined events.
30. Added support for compiler based instrumentation for Intel 9.1, 10.x, GNU, and PathScale compilers. 
31. Added extensions in PerfExplorer to support CQoS analysis, drawing charts from script [CCA]. 
32. Bug fixes in paraprof for selective instrumentation, printer support [NASA].
33. taucxx, taucc, tauf90 now use -optCompInst by default, tau_[cxx,cc,f90].sh use -optPDTInst by default.
34. Added -opari support in installtau. 
35. Fixes for IBM BGL/BGP configuration.
36. Added support for tracking memory utilization and headroom in Python [ALCF].



Version 2.16 changes (from 2.15):
1. Added a new tool for correcting network time drifts in traces (tau_timecorrect).
2. Added support for an Eclipse analysis wizard and a graphical instrumentor.
3. Added support for 3D stereo visualizations in ParaProf.
4. Added support for a source browser in ParaProf.
5. Added support for generating source code information in tau_instrumentor.
6. Added support for Perflib based instrumentation and perf2tau [Jeff Brown, LANL]. 
7. Updated KTAU support in TAU for registering fork for kernel profiling [ANL]. 
8. Added tau_validate tool for checking if the TAU library is built correctly [UTK].
9. Added support for loading multiple ppk files in paraprof on the commandline [ORNL]. 
10. Enhancements in ParaProf and PerfExplorer for using metadata. [PERI]
11. Added support for capturing date and other cpu information in profiles. [LLNL]
12. Added support for Vampirtrace [LLNL]. 
13. Added support for Scalasca 0.5, and KOJAK 2.2 [FZJ, UTK]. 
14. Enhancements to Eclipse PTP plugin to support PAPI counter selection. [UTK]
15. Supports PDT v3.10 with EDG v3.8 C++/C parsers [LLNL].
16. Added support for application signatures [RENCI].
17. Added support for SiCortex Linux platform [SiCortex].
18. Added support for tracking leaks and dynamic memory allocation/deallocations in Fortran.
19. Improved tau memory tracking module to handle multi-line statements in Fortran.
20. Python profiler overhead is greatly reduced.
21. tauex script added for switching between libraries.
22. -optShared option added in tau_compiler.sh for linking in TAU's shared objs.23. Easy to use TAU API (TAU_START("string"), TAU_STOP("string")) introduced.
24. Paraprof enhancements include support for Cube 3.
25. PAPI threads (configure -papithreads) and PAPI Domains added for x86 linux.
26. Clock synchronization in traces.
27. Metadata fields in ppk files.
28. Custom charts with XML metadata in PerfExplorer. 
29. TAU portal scripts to upload data to perfdmf database.
30. Added support for persistent communication events in traces. 
31. Added support for KTAU OS level shared counter coupling.
32. Eclipse/PTP updates for accessing TAU options and build configurations.
33. Added support for tracking Fortran I/O. 
34. Added support for accessing multiple databases, configuration of databases,
    and context event displays in paraprof. 
35. Added support for -arch=mips32 for SiCortex 32 bit compilation.
36. Updates for Epilog on Cray XT3 support using TAU.
37. Updates for Lahey 64 bit Fortran under Linux. 
38. Added support for generating and viewing profile snapshots.
39. Added support for specifying phases and timers (static/dynamic) in the 
    instrumentation specification file (see examples/timerphase).
40. Updates for Eclipse/PTP plugin for supporting external tools such as 
    VampirTrace, Kojak and Perfsuite using TAU's tool plugin. 
41. Added Support for Cray Compute Node Kernel for XT4 (-arch=craycnl).
42. Updates for tauex to include tau_load.sh functionality for generating
    MPI performance data for shared library MPI. 
43. Added signal handlers (SIGUSR1 and SIGUSR2) to dump performance data and
    toggle instrumentation (enable/disable instrumentation) respectively.
44. Full compiler names and -show option is available for compilers scripts. 
45. Added support for reading in OMPP profiles in paraprof. 
46. Added support for Intel 10.x compilers, NAGWare Fortran, and g95 compilers.


Version 2.15 changes (from 2.14):
1. Added support for phase and comparative displays in ParaProf [UO]
2. Updated PerfExplorer [UO]
3. Added suport for Eclipse CDT, FDT [LANL]
4. Added support for OTF (tau2otf) [LLNL]
5. Added support for runtime throttling of events (TAU_THROTTLE) [UCAR]
6. Added support for ORC Open64 compiler [U. Houston/NCSA]
7. Added support for Solaris on x86_64 (Opterons) [SUN]
8. Added support for nested OpenMP calls [SUN, Aachen]
9. Added support for Cray XT3 and SHMEM wrapper [PSC] 
10. Added support for multi-platform traces and a trace writer library [UFL, ORNL]
11. Added support for top level timer in OpenMP [UCAR]
12. Added support for PAPI on BGL and XT3 [ANL, PSC]
13. Added support for converting TAU traces to profiles. 
14. Added support for converting TAU callpath profiles to phase profiles [LLNL].
15. Enhancements to Paraprof. 
16. Better support for Intel compilers for linking C and Fortran codes to TAU [NOAA]. 
17. Added support for FreeBSD [ARL]. 
18. Added support for Eclipse PTP [LANL]. 
19. Added support for scripting in Paraprof using Jython to create custom views [LLNL].
20. Added support for Python 2.4 with instrumentation for C calls [LLNL]. 
21. Added support for loop level instrumentation for C and C++ [UTK, LANL]. 
22. Added support for parameter based profiling (-PROFILEPARAM) [UTK].
23. Added support for tau_load.sh for runtime MPI library instrumentation [UTK]. 
24. Added support for outer-loop level instrumentation for Fortran [UTK, LLNL].
25. Added a new tool: tau_ompcheck that completes OpenMP Fortran directives [NCAR]. 
26. Added support for preprocessing Fortran sources in tau_compiler.sh (-optPreProcess) [GSFC].
27. Added support for invoking tau_ompcheck in tau_compiler.sh [NCAR]. 
28. Added support for DB2 and Derby in PerfDMF [UTK, LLNL]. 
29. Added support for Infiniband MPICH on Opterons [NERSC]. 
30. Added support for Cray XT3 Memory headroom information and Cray Timers [PSC]. 
31. Added support for GNU Gfortran parser in PDT for tau_compiler.sh [LANL]. 
32. Added support for parameter based profiling (-PROFILEPARAM) for workload characterization [UTK].
33. Added support for upgrading from one version of TAU to another (upgradetau) [NERSC]. 
34. Added support for automatic instrumentation of pthread programs using PDT [Walt Disney]. 
35. Added Java TAU trace writer library [U. Reading]. 
36. Added support for gotos in outer-loop level instrumentation [UTK].
37. Added support for automatic MPI library level instrumentation using tau_poe [UTK]. 
38. Updated tau_ompcheck [NCAR]. 
39. Better support for instrumentation and parsing of Fortran programs [Goddard].
40. PerfExplorer enhancements (normal probability plots, event data, distribution info of events)
41. Automatic memory leak detection (-optDetectMemoryLeaks) for C/C++ malloc/free [UTK]. 
42. TAU Portal (tau.nic.uoregon.edu) to access database. 


Version 2.14 changes (from 2.13):
1. MPI-2 support and Fortran wrappers added. 
2. Support for Oracle database in PerfDMF. 
3. VTF support for multiple PAPI counters in Vampir/VTF format trace files. 
4. Improvements in Paraprof displays and database connectivity. 
5. Improvements in tau_compiler.sh to automatically instrument applications.
6. Added support for phase based profiling and dynamic timers. 
7. Introduced vtf2profile tool to get profiles from VTF3 traces.
8. Added histograms, full callgraph, not-normalized displays to paraprof. 
9. Added support for PathScale compilers and -exec-prefix option. 
10. Improved support for locking of performance data in multi-threaded apps. 
11. Added 3D displays in Paraprof.
12. Added support for SLOG2 traces (to use TAU with Jumpshot) [ANL]. 
13. Added bettter support for configuring for BG/L (-arch=bgl) [ANL].
14. Added support for depth limit profiling and tracing (-DEPTHLIMIT) [ORNL].
15. Changes to the MPI wrapper library (for S3D) [ORNL]. 
16. TAU_MPI_MESSAGE_SIZE now reports sizes for MPI_Send, Recv, Allreduce, etc.[ORNL].
17. Added support for Charm thread library [UIUC, LLNL].
18. Added support for gfortran compiler (-fortran=gfortran). 
19. Added support for reverse callpaths in paraprof [LLNL].
20. Added support for storing trials in paraprof [UTK].
21. Added support for user defined context events (callpaths) [ANL]. 
21. Added support for measuring memory headroom available (-PROFILEHEADROOM, examples/headroom) [ANL].
22. Added tau2elg trace conversion tool to convert to Epilog trace format [UTK].
23. Added search options to paraprof windows [LLNL]. 
24. Added support for -MPITRACE option for Kojak [UTK].
25. Paraprof has text table window now for callpath profiles [LLNL]. 
26. Changes to TAU_COMPILER to support Opari in Kojak [UTK].
27. Fixed bugs in tau2elg to support Kojak v 2.1 and 2.1.1 [FZJ].
28. Fixed a bug in TAU_COMPILER (when opari is not used) [UTK]]. 
29. Added support for cube (importer) in paraprof [UTK].
30. Added support for PGI v6.0 compilers.
31. Added Jumpshot/Slog2 package to TAU [ANL].
32. Added support for trace files > 2GB in TAU and VTF3 [TACC].
33. TAU no longer needs merged pdb files from PDT's F95 parser [UTK].
34. Enhancements in Paraprof to choose metrics for summary table, std. dev [LLNL].
35. TAU_COMPILER does not need -optReset for IBM xlf90 to eliminate -D* flags.
36. TAU scripts (tau_[cxx,cc,f90].sh) for use on commandline [UFL]. 
37. TAU Java Eclipse plugin [LANL]. 
38. Updated documentation.
39. Added PerfExplorer performance data mining and knowledge discovery framework [LLNL].
40. Enhancements in MPI libraries for scalability [LLNL]. 
41. Phase based profiling allows you to identify phases in paraprof.
42. Added tau_setup GUI for TAU installations [LANL].


Version 2.13 changes (from 2.12):
1. Paraprof enhancements.
2. TAU MPI wrapper library layer enhancements [CCA].
3. Better support for autoinstrumentation of F95 source code using PDT [LANL].
4. Support for autoinstrumentation of Java using JDK 1.3 and 1.4.x JVMPI.
5. Introduced the TAU Trace Input Library (TIL) [VNG, TUDresden].
6. Added support for detecting papi wallclock timer overflow [LLNL]. 
7. Added support for Power4 Linux 64 bit compilation (-arch=ibm64linux) [LLNL].
8. Paraprof enhancements for groups and multiple counters with multithreaded loading [LANL].
9. Added TAU Instrumentation Language for enhancing tau_reduce. 
10. Added support for RTTI with g++ [ITT].
11. Added support for PAPI 3 so that TAU works with both PAPI 2 & 3 [UTK].
12. Paraprof enhancements for callpath profiling [LLNL].
13. Timer overhead measurements for callpath profiling. 
14. Compensation of timing overhead introduced. 
15. Malloc/free wrappers pinpoint memory allocation bugs (examples/malloc) [LLNL].
16. Added memory utilization tracking (examples/memory) [LLNL].
17. Added muse user defined events with TAU interrupt handlers [LANL].
18. Paraprof improvements (clickable callpaths, image, XML support) [LLNL]. 
19. Fuzzy matching of file names in tau_instrumentor (/home/foo.cpp ./foo.cpp) [TACC].
20. Added support for TAU_TRACK_MEMORY_HERE() [LLNL].
21. Improvements in PerfDMF and ParaProf's ability to connect to database [LLNL].
22. Added support for native PAPI events (setenv COUNTER1 PAPI_NATIVE_<nm>) [LLNL].
23. Added support for DyninstAPI v4.1 [UMD]. 
24. Added support for VTF3 binary trace generation library for Vampir.
25. Added hardware performance counters and other user defined events to trace.
26. Introduced hierarchical trace merging using tau_merge (both offline/online).
27. Added -PROFILEMEMORY option that tracks memory at each routine entry [LLNL].
28. Improved support for MySQL and PostgreSQL databases in PerfDMF. 
29. Added automated trace merge/convert with tau2vtf using TAU_TRACEFILE env. 
30. Added $(TAU_COMPILER) shell script/makefile variable for automatic instr.

  
 

Version 2.12 changes (from 2.11):
1. Enhancements in jracy for supporting multiple counter data [LLNL].
2. Improved memory handling and drawing speeds in jracy [LLNL].
3. Configuration changes for LAM MPI, PAPI, Tru64 [Utah, NCSA, LANL].
4. Added support for Python bindings [CACR, LLNL]. 
5. Added MPI shared library examples [CACR]. 
6. Added support for building multiple configurations (installtau) [LANL, LLNL].
7. Added support for Python under AIX and OSX [LLNL].
8. Bug fixes for IA-64 and Intel 7.1 compiler [NCSA].
9. Added TAU_CALLPATH_DEPTH env. variable specification for callpath profiling [LLNL].
10. Added support for -arch=ibm64. It suppports PAPI 64 bit/Power4. [UTK]
11. Bug fixes for shared libraries with MPI, g++/KCC under AIX 5.1. [LLNL]
12. Introduced paraprof profile browser (jracy symlinks to paraprof). [ASCI]
13. Added support for dumping profiles in python using a prefix. [LLNL]
14. Added support for DyninstAPI 4.0 including binary rewriting. [U. Maryland]
15. Added support for KOJAK's implementation of Opari and EPILOG. [FZJ]
16. Added support for file level selective instrumentation (PDT, Dyninst). [Utah] 
17. Fixed Apple's OS X sscanf bug for reading long doubles in pprof. 
18. Added support for DyninstAPI under AIX. [NERSC]
19. Added support for Cray X1 and AMD Opteron (ASCI Red Storm). [Cray]
20. Added support for MAGNET/MUSE. [LANL]
21. Added support for Performance Database [ASCI]. 
22. Added support for Multiplecounters with CRAY_TIMERS, MUSE and message size [CCA]. 


Version 2.11 changes (from 2.10):
1. Added -i header option for tau_instrumentor [CASC]. 
2. Added -LINUXTIMERS option for low overhead Linux wallclock time [CACR].
3. Added -c|-c++|-fortran options to tau_instrumentor [CACR].
4. Lowered the overhead of timers of disabled profile groups [CACR].
5. Added support for PAPI v2.1 [CACR]. 
6. Updated PCL bindings. 
7. Added support for selective instrumentation [CACR]. 
8. Added support for multiple counters [CACR]. 
9. Added support for Paraver trace visualizer (CEPBA) in tau_convert. 
10. Opari and PDT related changes (examples/opari/pdt_f90) [FZJ].
11. Added support for online access to performance data [CACR]. 
12. Added support for LINUXTIMERS for PGI and other Linux compilers [FZJ].
13. Changes to online access API [CACR].
14. Improved jracy GUI [ALPS]. 
15. Added support for EPILOG tracing package [FZJ]. 
16. Added support for Hitachi SR8000 [FZJ]. 
17. Added support for browsing by profile groups in jracy [ALPS, LLNL]. 
18. Made some modifications to Paraver trace format conversion [CEPBA]. 
19. Added support for NEC SX-5 [HLRS]. 
20. Added support for -mpilibrary option [LLNL].
21. Added support for g++ 2.96/3.x for tau_merge/tau_convert [ST].
22. Fixed a problem with the MPI wrapper library for Intel IA-64 compilers [NCSA].
23. Added support for tracking message sizes using user defined events [Rutgers].
24. Added support for low overhead, high resolution timers under IA-64 Linux [NCSA].
25. Added support for alternative returns in PDT based C instrumentation [PETSc, ANL].
26. Added a new tool - tau_reduce for reducing instrumentation overhead.  
27. Added support for callpath profiling.
28. Fixed pprof to support exclusive percentage in callpath profiling.
29. Changes for CCA, jracy & DyninstAPI on IRIX, Sun. 


Version 2.10 changes (from 2.9):
1. Better support for C instrumentation [HDF5].
2. Fixes for IBM.
3. Added support for multiple instrumentation requests per line [CACR F90/C++].
4. Added support for detecting threaded versions of MPI at configuration.
5. Made some modifications for PDT v2.1 [CACR C++/C].
6. Added jracy, TAU's new Java based profile browser to replace racy.
7. Added support for specifying a fortran compiler during configuration.
8. Added support for auto-detection of mpi libs and include dirs (-mpi).
9. Added IBM specific libs for MPI so we don't have to use mpCC, mpKCC [CACR].
10. Added TAU_LDFLAGS to MPI Makefiles [CACR].
11. Added support for enabling/disabling profile groups at runtime [PDT, CACR]. 

Version 2.9 changes (from 2.8):
1. Better support for mixed model programming
2. Changes for KCC and KAP/Pro.
3. Added support for MPI with DyninstAPI.
4. Added support for selective profiling in Java (-XrunTAU:exclude=java,sun)
5. Java RMI support changes.
6. Introduced TAU Java source instrumentation API.
7. Added support for enabling and disabling group level instrumentation.
8. Added support for PCL 2.0.
9. Fixed tau_instrumentor for PDT 1.3 using SGI CC and examples.
10. Fixed F90 bug on string concatenation.
11. Changed TauGroup_t to 64 bits (unsigned long).[Mapping addresses].
12. Added TAU_SHLIBS so DSO's are created everytime.
13. Support for incremental profile dumps.
14. PAPI on Solaris and other platforms requires linking with a static library.
15. Added support for Compaq Alpha (cxx, cc, f90).
16. Fix for MPT 1.4 under IRIX 6.5.
17. Changes in tau_merge to support Uintah.
18. Added support for SGI sproc threads.
19. Added support for dumping profile data in a consistent state (profile snapshot).
20. Added support for Opari OpenMP directive rewriting tool [EWOMP'01].
21. Improved MPI wrapper library support [Uintah].
22. Added support for gcc-3.0 (pprof).
23. Added a bug fix for Vampir (tau_convert -pv -longsymbolbugfix) [SAMRAI].
24. Changed Opari options (omperf to pomp name change).
25. Added support for dynamically assigning group names [SAMRAI].
26. Added support for evaluating perturbation of TAU_DB_DUMP() [Uintah].
27. Added support for C in tau_instrumentor.
28. Fixed RtsLayer bug for PDT based instrumentation of multi-threaded C++ 
    applications.
29. Added -noinline flag to tau_instrumentor to suppress instrumentation of 
    inlined functions [POOMA].
30. Added support for F90 in tau_instrumentor. 
31. Added support for abnormal exit in C [UPS].
32. Added support for Opari-1.1 [flush_enter/exit calls].
33. Added MPI wrapper layer for SGI Fortran [SAGE]. 
34. Made changes to SGI Fortran MPI layer [MPI_Init].
35. Added IA-64 support (threads, PDT, MPI ...) using RH 7.1 gcc 2.96.

Version 2.8 changes (from 2.7):
1. Added support for PAPI (Perf. API for accessing HW Perf. Counters).
2. Added better support for Dyninst.
3. Added support for CPUTIME (pthread/Linux). 
4. Added support for multi-language programming for Java + C (JNI). 
5. Added support for mpiJava. 
6. Added support for tracing all MPI interprocess communication (incl. async.)
7. Added support for PAPIWALLCLOCK (with -papi=<...>) for low overhead timers.
8. Added support for PAPIVIRTUAL (with -papi=<...>) for user time using PAPI.
9. Added support for OpenMP and OpenMPI (PGI, KAP, IBM, SGI)
10. More compilers: IBM xlC, xlc, xlf90 on SP (See INSTALL file)

Version 2.7 changes (from 2.6):
1. Added Support for JAVA (JDK 1.2+).
2. Added support for DYNINST Dynamic Instrumentation Package from U. Maryland.
3. Added support for SUN 5.0 CC, F90 compilers
4. Added support for Microsoft Windows. 
 
Version 2.6 changes (from 2.5):
1. TAU Mapping API introduced.
2. More platforms: Cray T3E with F90, Alpha/Linux, Intel/Linux 
   with PGI and Fujitsu compilers (C++/C/F90)
3. Added support for threadsafety in Fortran/C. 
4. Added support for Program Database Toolkit for instrumenting C++ 
   sources using tau_instrumentor 
5. Added support for Performance Counter Library for accessing Hardware
   Performance Counters on Cray, Intel, Alpha, UltraSparcs, MIPS, and 
   IBM Power platforms
6. TAU MPI wrapper library introduced for profiling MPI routines. 
7. Added NAS Parallel Benchmark 2.3 LU & SP suites as Fortran90/MPI examples.

Version 2.5 changes (from 2.4):
1. Automatic instrumentation support using DUCTAPE.
2. Changes in directory structure and configuration.
3. Integrated with POOMA and SMARTS.

Version 2.4 changes (from 2.3):
1. Added support for SMARTS and Tulip user level threads.
2. Added support for Fortran and F90 API.
3. Added threadsafe user defined events.
4. Added threadsafe trace library.

Version 2.3 changes (from 2.2):
1. Added pthread support.
2. Added C-API support with the same lib/API.
3. Introduced User Events

Version 2.2 changes (from 2.1):
1. Added callstack profile viewing tool
2. Blitz++ compatibility changes.

Version 2.1 changes (from 2.0):
1. Better colors in racy
2. Support for T3E.
3. Support for Tcl/Tk 8.0 as the default.
4. Introduced Callstack profiling.
5. Blitz specific changes. 

Version 2.0 changes (from 1.0):
1. Introduced Tracing.
