Name

tau_spark-submit — Launches PySpark applications with TAU instrumentation

Notes

Tau can profile PySpark applications using Spark 2.2 or later and Python 2.7 or later with the numpy package installed. TAU must be configured with the -pythoninc and -pythonlib options specifying an appropriate Python installation.

The SPARK_HOME environment variable must be set to the location of your Spark installation. Replace spark-submit in your normal Spark application invocation with tau_spark-submit. Options for tau_spark-submit can be set using the TAU_SPARK_PYTHON_ARGS environment variable.

A PySpark application profiled using tau_spark-submit will generate one profile file per task executed.

Example

     export TAU_SPARK_PYTHON_ARGS="-T serial,python"

    tau_spark-submit --master local[4] ./als.py

Documentation

Additional documentation and examples can be found in the pyspark subdirectory of the examples directory in your TAU installation.