TAU User Guide

Updated November 14, 2024, for use with version 2.34 or greater.

Copyright © 1997-2012 Department of Computer and Information Science, University of Oregon Advanced Computing Laboratory, LANL, NM Research Centre Juelich, ZAM, Germany

Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of University of Oregon (UO) Research Centre Juelich, (ZAM) and Los Alamos National Laboratory (LANL) not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. The University of Oregon, ZAM and LANL make no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.

UO, ZAM AND LANL DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE UNIVERSITY OF OREGON, ZAM OR LANL BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.


Table of Contents

TAU preface
I. Tau User Guide
1. Tau Instrumentation
1.1. Types of Instrumenation
1.2. Dynamic instrumentation through library pre-loading
1.3. TAU scripted compilation
1.3.1. Instrumenation
1.3.2. Compiler Based Instrumentation
1.3.3. Source Based Instrumentation
1.3.4. Options to TAU compiler scripts
1.4. Selectively Profiling an Application
1.4.1. Custom Profiling
2. Profiling
2.1. Running the Application
2.2. Reducing Performance Overhead with TAU_THROTTLE
2.3. Profiling each event callpath
2.4. Using Hardware Counters for Measurement
3. Tracing
3.1. Generating Event Traces
4. Analyzing Parallel Applications
4.1. Text summary
4.2. ParaProf
4.3. Jumpshot
5. Quick Reference
6. Some Common Application Scenario
6.1. Q. What routines account for the most time? How much?
6.2. Q. What loops account for the most time? How much?
6.3. Q. What MFlops am I getting in all loops?
6.4. Q. Who calls MPI_Barrier() Where?
6.5. Q. How do I instrument Python Code?
6.6. Q. What happens in my code at a given time?
6.7. Q. How does my application scale?
II. ParaProf - User's Manual
7. Introduction
7.1. Using ParaProf from the command line
7.2. Supported Formats
7.3. Command line options
8. Views and Sub-Views
8.1. To Create a (Sub-)Views
9. Profile Data Management
9.1. ParaProf Manager Window
9.2. Loading Profiles
9.3. Database Interaction
9.4. Creating Derived Metrics
9.5. Main Data Window
10. 3-D Visualization
10.1. Triangle Mesh Plot
10.2. 3-D Bar Plot
10.3. 3-D Scatter Plot
10.4. 3-D Topology Plot
10.5. 3-D Commication Matrix
11. Thread Based Displays
11.1. Thread Bar Graph
11.2. Thread Statistics Text Window
11.3. Thread Statistics Table
11.4. Call Graph Window
11.5. Thread Call Path Relations Window
11.6. User Event Statistics Window
11.7. User Event Thread Bar Chart
12. Function Based Displays
12.1. Function Bar Graph
12.2. Function Histogram
13. Phase Based Displays
13.1. Using Phase Based Displays
14. Comparative Analysis
14.1. Using Comparitive Analysis
15. Miscellaneous Displays
15.1. User Event Bar Graph
15.2. Ledgers
15.2.1. Function Ledger
15.2.2. Group Ledger
15.2.3. User Event Ledger
15.3. Selective Instrumentation File Generator
16. Preferences
16.1. Preferences Window
16.2. Default Colors
16.3. Color Map
III. PerfExplorer - User's Manual
17. Introduction
18. Installation and Configuration
19. Running PerfExplorer
20. Cluster Analysis
20.1. Dimension Reduction
20.2. Max Number of Clusters
20.3. Performing Cluster Analysis
21. Correlation Analysis
21.1. Dimension Reduction
21.2. Performing Correlation Analysis
22. Charts
22.1. Setting Parameters
22.1.1. Group of Interest
22.1.2. Metric of Interest
22.1.3. Event of Interest
22.1.4. Total Number of Timesteps
22.2. Standard Chart Types
22.2.1. Timesteps Per Second
22.2.2. Relative Efficiency
22.2.3. Relative Efficiency by Event
22.2.4. Relative Efficiency for One Event
22.2.5. Relative Speedup
22.2.6. Relative Speedup by Event
22.2.7. Relative Speedup for One Event
22.2.8. Group % of Total Runtime
22.2.9. Runtime Breakdown
22.3. Phase Chart Types
22.3.1. Relative Efficiency per Phase
22.3.2. Relative Speedup per Phase
22.3.3. Phase Fraction of Total Runtime
23. Custom Charts
24. Visualization
24.1. 3D Visualization
24.2. Data Summary
24.3. Creating a Boxchart
24.4. Creating a Histogram
24.5. Creating a Normal Probability Chart
25. Views
25.1. Creating Views
25.2. Creating Subviews
26. Running PerfExplorer Scripts
26.1. Analysis Components
26.2. Scripting Interface
26.3. Example Script
27. Derived Metrics
27.1. CreatingExpressions
27.2. Selecting Expressions
27.3. Expression Files
IV. TAUdb
28. Introduction
28.1. Prerequisites
28.2. Installation
29. Using TAUdb
29.1. perfdmf_createapp (deprecated - only supported for older PerfDMF databases)
29.2. perfdmf_createexp (deprecated - only supported for older PerfDMF databases)
29.3. taudb_loadtrial
29.4. TAUdb Views
30. Database Schema
30.1. SQL for TAUdb
31. TAUdb C API
31.1. TAUdb C API Overview
31.2. TAUdb C Structures
31.3. TAUdb C API
31.4. TAUdb C API Examples
31.4.1. Creating a trial and inserting into the database
31.4.2. Querying a trial from the database

List of Figures

4.1. Main Data Window
4.2. Main Data Window
6.1. Flat Profile
6.2. Flat Profile with Loops
6.3. MFlops per loop
6.4. Callpath Profile
6.5. Tracing with Vampir
6.6. Scalability chart
8.1. Add View
8.2. View Creator Window
9.1. ParaProf Manager Window
9.2. Loading Profile Data
9.3. Creating Derived Metrics
9.4. Main Data Window
9.5. Unstacked Bars
10.1. Triangle Mesh Plot
10.2. 3-D Mesh Plot
10.3. 3-D Scatter Plot
10.4. 3-D Topology Plot
10.5. 3-D Commication Matrix
11.1. Thread Bar Graph
11.2. Thread Statistics Text Window
11.3. Thread Statistics Table, inclusive and exclusive
11.4. Thread Statistics Table
11.5. Thread Statistics Table
11.6. Call Graph Window
11.7. Thread Call Path Relations Window
11.8. User Event Statistics Window
11.9. User Event Thread Bar Chart Window
12.1. Function Bar Graph
12.2. Function Histogram
13.1. Initial Phase Display
13.2. Phase Ledger
13.3. Function Data over Phases
14.1. Comparison Window (initial)
14.2. Comparison Window (2 trials)
14.3. Comparison Window (3 threads)
15.1. User Event Bar Graph
15.2. Function Ledger
15.3. Group Ledger
15.4. User Event Ledger
15.5. Selective Instrumentation Dialog
16.1. ParaProf Preferences Window
16.2. Edit Default Colors
16.3. Color Map
20.1. Selecting a dimension reduction method
20.2. Entering a minimum threshold for exclusive percentage
20.3. Entering a maximum number of clusters
20.4. Selecting a Metric to Cluster
20.5. Confirm Clustering Options
20.6. Cluster Results
20.7. Cluster Membership Histogram
20.8. Cluster Membership Scatterplot
20.9. Cluster Virtual Topology
20.10. Cluster Average Behavior
21.1. Selecting a dimension reduction method
21.2. Entering a minimum threshold for exclusive percentage
21.3. Selecting a Metric to Cluster
21.4. Correlation Results
21.5. Correlation Example
22.1. Setting Group of Interest
22.2. Setting Metric of Interest
22.3. Setting Event of Interest
22.4. Setting Timesteps
22.5. Timesteps per Second
22.6. Relative Efficiency
22.7. Relative Efficiency by Event
22.8. Relative Efficiency one Event
22.9. Relative Speedup
22.10. Relative Speedup by Event
22.11. Relative Speedup one Event
22.12. Group % of Total Runtime
22.13. Runtime Breakdown
22.14. Relative Efficiency per Phase
22.15. Relative Speedup per Phase
22.16. Phase Fraction of Total Runtime
23.1. The Custom Charts Interface
24.1. 3D Visualization of multivariate data
24.2. Data Summary Window
24.3. Boxchart
24.4. Histogram
24.5. Normal Probability
25.1. Potential scalability data organized as a parametric study
25.2. Selecting a table
25.3. Selecting a column
25.4. Selecting an operator
25.5. Selecting a value
25.6. Entering a name for the view
25.7. The completed view
25.8. Selecting the base view
25.9. Completed sub-views