tau_exec — TAU execution wrapping script


tau_exec [ options ] [--] { exe } [ exe options ]


Use this script to perform memory or IO tracking on either an instrumented or uninstrumented executable.



verbose mode


show the command generated by tau_exec without running it


BG/P qsub mode


track io


track memory


enable memory debugger


track GPU events via CUDA (Must be configured with -cuda=<dir>, Preferred of CUDA 4.0 or earlier)


track GPU events via Nvidia's CUPTI interface (Must be configured with -cupti=<dir>, Preferred for CUDA 4.1 or later).


in conjunction with -cupti adds support for the Unified Memory GPUs. Requires CUDA 6.5 or later.


track GPU events via OpenCL


track openacc events. Supports TAU configurations with -arch=craycnl or PGI compilers on x86_64 Linux


track OpenMP events via OMPT interface


track power events via PAPI's perf RAPL interface


track DRAM events. Requires PAPI with recent perf support for x86_64


track ARMCI events via PARMCI (Must be configured with -armci=<dir>)


track SHMEM events


Activates hardware counters to measure remote DRAM accesses and total node accesses. These counters must be available from PAPI in the selected TAU configuration.


flags to pass to PT TS sample_ts command. Overrides TAU_TS_SAMPLE_FLAGS env. var.


flags to pass to PT TS report_ts command. Overrides TAU_TS_REPORT_FLAGS env. var.


enable Event-based sampling. See README.sampling for more information

-ebs_period=<count >

sampling period (default 1000)


sets sampling metric (default "itimer")


Launch ThreadSpotter. It must be available in the system path.


enable Unified Memory events via CUPTI


tracks GPU events via CUDA with source code locator activity


output sass profile in CSV format


: specify TAU option

-loadlib=< >

: specify additional load library


specify TAU library directly


run program in gdb debugger


Defaults if unspecified: -T MPI. MPI is assumed unless SERIAL is specified

CUDA kernel tracking is included, if A CUDA SYNC call is made after each kernel launch and cudaThreadExit() is called before the exit of each thread that uses CUDA.

OPENCL kernel tracking is included, if A OPENCL SYNC call is made after each kernel launch and clReleaseContext() is called before the exit of each thread that uses CUDA.


mpirun -np 2 tau_exec -io ./ring

mpirun -np 8 tau_exec -ebs -ebs_period=1000000 -ebs_source=PAPI_FP_INS ./ring

tau_exec -T serial,cupti -cupti ./matmult (Preferred for CUDA 4.1 or later)

tau_exec -T serial -cuda ./matmult (Preferred for CUDA 4.0 or earlier)

tau_exec -T serial -opencl (OPENCL)