Name

tau_exec — TAU execution wrapping script

Synopsis

tau_exec [ options ] [--] { exe } [ exe options ]

Description

Use this script to perform runtime performance tracking on either an instrumented or uninstrumented executable. Options include memory and IO tracking, event based sampling, hardware accelerator tracking and data collection from library-provided instrumention API's such as mpi communication events and RAJA and Kokkos instrumention hooks.

Options

-v

verbose mode

-s

show the command generated by tau_exec without running it

-qsub

BG/P qsub mode

-io

track io

-memory

track memory

-memory

enable memory debugger

-cuda

track GPU events via CUDA (Must be configured with -cuda=<dir>, Preferred of CUDA 4.0 or earlier)

-cupti

track GPU events via Nvidia's CUPTI interface (Must be configured with -cupti=<dir>, Preferred for CUDA 4.1 or later).

-um

in conjunction with -cupti adds support for the Unified Memory GPUs. Requires CUDA 6.5 or later.

-opencl

track GPU events via OpenCL

-openacc

track openacc events. Supports TAU configurations with -arch=craycnl or PGI compilers on x86_64 Linux

-ompt

track OpenMP events via OMPT interface

-power

track power events via PAPI's perf RAPL interface

-numa

track DRAM events. Requires PAPI with recent perf support for x86_64

-armci

track ARMCI events via PARMCI (Must be configured with -armci=<dir>)

-shmem

track SHMEM events

-numa

Activates hardware counters to measure remote DRAM accesses and total node accesses. These counters must be available from PAPI in the selected TAU configuration.

-ts-sample-flags=<flags>

flags to pass to PT TS sample_ts command. Overrides TAU_TS_SAMPLE_FLAGS env. var.

-ts-report-flags=<flags>

flags to pass to PT TS report_ts command. Overrides TAU_TS_REPORT_FLAGS env. var.

-ebs

enable Event-based sampling to capture runtime event profiles without instrumentation. See README.sampling for more information

-ebs_period=<count >

sampling period (default 1000)

-ebs_source=<counter>

sets sampling metric (default "itimer")

-ebs_resolution=<file|function|line>

sets sampling granularity (default "function")

-syscall

track SYSCALL

-ptts

Launch ThreadSpotter. It must be available in the system path.

-um

enable Unified Memory events via CUPTI

-sass=<level>

tracks GPU events via CUDA with source code locator activity

-csv

output sass profile in CSV format

-T<option>

: specify TAU option

-loadlib=<file.so >

: specify additional load library

-XrunTAU-<options>

specify TAU library directly

-gdb

run program in gdb debugger

-rocm

capture events and metadata from the ROCm performance API

-tau_ebs_resolution=<file|function|line>

process sampled events at the file/function/line level depending on the given argument. line is the default. the environment variable TAU_EBS_RESOLUTION can be set to one of these options to achieve the same effect.

-monitoring

monitors hardware counters and other commands by polling periodically as specified in a tau_monitoring.json file included in the run directory. Example:

{
  "periodic": true,
  "periodicity seconds": 1.0,
  "/proc/stat": {
    "comment": "This will exclude all core-specific readings.",
    "exclude": ["^cpu[0-9]+.*"]
  },
  "/proc/meminfo": {
    "comment": "This will include three readings.",
    "include": [".*MemAvailable.*", ".*MemFree.*", ".*MemTotal.*"]
  },
  "/proc/net/dev": {
    "disable": true,
    "comment": "This will include only the first ethernet device.",
    "include": [".*eno1.*"]
  },
  "lmsensors": {
    "disable": true,
    "comment": "This will include all power readings.",
    "include": [".*power.*"]
  },
  "net": {
    "disable": true,
    "comment": "This will include only the first ethernet device.",
    "include": [".*eno1.*"]
  },
  "nvml": {
    "disable": false,
    "comment": "This will include only the utilization metrics.",
    "include": [".*utilization.*"]
  }

Notes

Defaults if unspecified: -T MPI. MPI is assumed unless SERIAL is specified

CUDA kernel tracking is included, if A CUDA SYNC call is made after each kernel launch and cudaThreadExit() is called before the exit of each thread that uses CUDA.

OPENCL kernel tracking is included, if A OPENCL SYNC call is made after each kernel launch and clReleaseContext() is called before the exit of each thread that uses CUDA.

tau_python is similar to tau_exec and can replace the 'python' command when launching a python application. The -tau_python_interpreter=<interpreter> argument allows specification of a python interpreter other than the one used to configure TAU.

Examples

mpirun -np 2 tau_exec -io ./ring

mpirun -np 8 tau_exec -ebs -ebs_period=1000000 -ebs_source=PAPI_FP_INS ./ring

tau_exec -T serial,cupti -cupti ./matmult (Preferred for CUDA 4.1 or later)

tau_exec -T serial -cuda ./matmult (Preferred for CUDA 4.0 or earlier)

tau_exec -T serial -opencl (OPENCL)