tau_exec — TAU execution wrapping script


tau_exec [ options ] [--] { exe } [ exe options ]


Use this script to perform memory or IO tracking on either an instrumented or uninstrumented executable.



verbose mode


show the command generated by tau_exec without running it


BG/P qsub mode


track io


track memory


enable memory debugger


track GPU events via CUDA (Must be configured with -cuda=<dir>, Preferred of CUDA 4.0 or earlier)


track GPU events via Nvidia's CUPTI interface (Must be configured with -cupti=<dir>, Preferred for CUDA 4.1 or later).


in conjunction with -cupti adds support for the Unified Memory GPUs. Requires CUDA 6.5 or later.


track GPU events via OpenCL


track openacc events. Supports TAU configurations with -arch=craycnl or PGI compilers on x86_64 Linux


track OpenMP events via OMPT interface


track power events via PAPI's perf RAPL interface


track DRAM events. Requires PAPI with recent perf support for x86_64


track ARMCI events via PARMCI (Must be configured with -armci=<dir>)


track SHMEM events


Activates hardware counters to measure remote DRAM accesses and total node accesses. These counters must be available from PAPI in the selected TAU configuration.


flags to pass to PT TS sample_ts command. Overrides TAU_TS_SAMPLE_FLAGS env. var.


flags to pass to PT TS report_ts command. Overrides TAU_TS_REPORT_FLAGS env. var.


enable Event-based sampling to capture runtime event profiles without instrumentation. See README.sampling for more information

-ebs_period=<count >

sampling period (default 1000)


sets sampling metric (default "itimer")


sets sampling granularity (default "function")


Launch ThreadSpotter. It must be available in the system path.


enable Unified Memory events via CUPTI


tracks GPU events via CUDA with source code locator activity


output sass profile in CSV format


: specify TAU option

-loadlib=< >

: specify additional load library


specify TAU library directly


run program in gdb debugger


capture events and metadata from the ROCm performance API


process sampled events at the file/function/line level depending on the given argument. line is the default. the environment variable TAU_EBS_RESOLUTION can be set to one of these options to achieve the same effect.


monitors hardware counters and other commands by polling periodically as specified in a tau_monitoring.json file included in the run directory. Example:

  "periodic": true,
  "periodicity seconds": 1.0,
  "/proc/stat": {
    "comment": "This will exclude all core-specific readings.",
    "exclude": ["^cpu[0-9]+.*"]
  "/proc/meminfo": {
    "comment": "This will include three readings.",
    "include": [".*MemAvailable.*", ".*MemFree.*", ".*MemTotal.*"]
  "/proc/net/dev": {
    "disable": true,
    "comment": "This will include only the first ethernet device.",
    "include": [".*eno1.*"]
  "lmsensors": {
    "disable": true,
    "comment": "This will include all power readings.",
    "include": [".*power.*"]
  "net": {
    "disable": true,
    "comment": "This will include only the first ethernet device.",
    "include": [".*eno1.*"]
  "nvml": {
    "disable": false,
    "comment": "This will include only the utilization metrics.",
    "include": [".*utilization.*"]


Defaults if unspecified: -T MPI. MPI is assumed unless SERIAL is specified

CUDA kernel tracking is included, if A CUDA SYNC call is made after each kernel launch and cudaThreadExit() is called before the exit of each thread that uses CUDA.

OPENCL kernel tracking is included, if A OPENCL SYNC call is made after each kernel launch and clReleaseContext() is called before the exit of each thread that uses CUDA.

tau_python is similar to tau_exec and can replace the 'python' command when launching a python application. The -tau_python_interpreter=<interpreter> argument allows specification of a python interpreter other than the one used to configure TAU.


mpirun -np 2 tau_exec -io ./ring

mpirun -np 8 tau_exec -ebs -ebs_period=1000000 -ebs_source=PAPI_FP_INS ./ring

tau_exec -T serial,cupti -cupti ./matmult (Preferred for CUDA 4.1 or later)

tau_exec -T serial -cuda ./matmult (Preferred for CUDA 4.0 or earlier)

tau_exec -T serial -opencl (OPENCL)