Name
tau_exec — TAU execution wrapping script
Synopsis
tau_exec
[
options
] [--] {
exe
} [
exe options
]
Description
Use this script to perform runtime performance tracking on either an instrumented or uninstrumented executable. Options include memory and IO tracking, event based sampling, hardware accelerator tracking and data collection from library-provided instrumention API's such as mpi communication events and RAJA and Kokkos instrumention hooks.
Options
- -v
-
verbose mode
- -s
-
show the command generated by tau_exec without running it
- -qsub
-
BG/P qsub mode
- -io
-
track io
- -memory
-
track memory
- -memory
-
enable memory debugger
- -cuda
-
track GPU events via CUDA (Must be configured with -cuda=<dir>, Preferred of CUDA 4.0 or earlier)
- -cupti
-
track GPU events via Nvidia's CUPTI interface (Must be configured with -cupti=<dir>, Preferred for CUDA 4.1 or later).
- -um
-
in conjunction with -cupti adds support for the Unified Memory GPUs. Requires CUDA 6.5 or later.
- -opencl
-
track GPU events via OpenCL
- -openacc
-
track openacc events. Supports TAU configurations with -arch=craycnl or PGI compilers on x86_64 Linux
- -ompt
-
track OpenMP events via OMPT interface
- -power
-
track power events via PAPI's perf RAPL interface
- -numa
-
track DRAM events. Requires PAPI with recent perf support for x86_64
- -armci
-
track ARMCI events via PARMCI (Must be configured with -armci=<dir>)
- -shmem
-
track SHMEM events
- -numa
-
Activates hardware counters to measure remote DRAM accesses and total node accesses. These counters must be available from PAPI in the selected TAU configuration.
- -ts-sample-flags=<flags>
-
flags to pass to PT TS sample_ts command. Overrides TAU_TS_SAMPLE_FLAGS env. var.
- -ts-report-flags=<flags>
-
flags to pass to PT TS report_ts command. Overrides TAU_TS_REPORT_FLAGS env. var.
- -ebs
-
enable Event-based sampling to capture runtime event profiles without instrumentation. See README.sampling for more information
- -ebs_period=<count >
-
sampling period (default 1000)
- -ebs_source=<counter>
-
sets sampling metric (default "itimer")
- -ebs_resolution=<file|function|line>
-
sets sampling granularity (default "function")
- -syscall
-
track SYSCALL
- -ptts
-
Launch ThreadSpotter. It must be available in the system path.
- -um
-
enable Unified Memory events via CUPTI
- -sass=<level>
-
tracks GPU events via CUDA with source code locator activity
- -csv
-
output sass profile in CSV format
- -T<option>
-
: specify TAU option
- -loadlib=<file.so >
-
: specify additional load library
- -XrunTAU-<options>
-
specify TAU library directly
- -gdb
-
run program in gdb debugger
- -rocm
-
capture events and metadata from the ROCm performance API
- -tau_ebs_resolution=<file|function|line>
-
process sampled events at the file/function/line level depending on the given argument. line is the default. the environment variable TAU_EBS_RESOLUTION can be set to one of these options to achieve the same effect.
- -monitoring
-
monitors hardware counters and other commands by polling periodically as specified in a tau_monitoring.json file included in the run directory. Example:
{ "periodic": true, "periodicity seconds": 1.0, "/proc/stat": { "comment": "This will exclude all core-specific readings.", "exclude": ["^cpu[0-9]+.*"] }, "/proc/meminfo": { "comment": "This will include three readings.", "include": [".*MemAvailable.*", ".*MemFree.*", ".*MemTotal.*"] }, "/proc/net/dev": { "disable": true, "comment": "This will include only the first ethernet device.", "include": [".*eno1.*"] }, "lmsensors": { "disable": true, "comment": "This will include all power readings.", "include": [".*power.*"] }, "net": { "disable": true, "comment": "This will include only the first ethernet device.", "include": [".*eno1.*"] }, "nvml": { "disable": false, "comment": "This will include only the utilization metrics.", "include": [".*utilization.*"] }
Notes
Defaults if unspecified: -T MPI. MPI is assumed unless SERIAL is specified
CUDA kernel tracking is included, if A CUDA SYNC call is made after
each kernel launch and cudaThreadExit()
is called before the exit of each thread that uses CUDA.
OPENCL kernel tracking is included, if A OPENCL SYNC call is made after
each kernel launch and clReleaseContext()
is called before the exit of each thread that uses CUDA.
tau_python is similar to tau_exec and can replace the 'python' command when launching a python application. The -tau_python_interpreter=<interpreter> argument allows specification of a python interpreter other than the one used to configure TAU.
Examples
mpirun -np 2 tau_exec -io ./ring
mpirun -np 8 tau_exec -ebs -ebs_period=1000000 -ebs_source=PAPI_FP_INS ./ring
tau_exec -T serial,cupti -cupti ./matmult (Preferred for CUDA 4.1 or later)
tau_exec -T serial -cuda ./matmult (Preferred for CUDA 4.0 or earlier)
tau_exec -T serial -opencl (OPENCL)