**Table of Contents**

Cluster analysis is a valuable tool for reducing large
parallel profiles down to representative groups for investigation.
Currently, there are two types of clustering analysis implemented in
PerfExplorer. Both *hierarchical* and
*k-means* analysis are used to group parallel
profiles into common clusters, and then the clusters are summarized.
Initially, we used similarity measures computed on a single parallel
profile as input to the clustering algorithms, although other forms
of input are possible. Here, the performance data is organized into
multi-dimensional vectors for analysis. Each vector represents one
parallel thread (or process) of execution in the profile. Each
dimension in the vector represents an event that was profiled in the
application. Events can be any sub-region of code, including
libraries, functions, loops, basic blocks or even individual lines
of code. In simple clustering examples, each vector represents only
one metric of measurement. For our purposes, some dissimilarity
value, such as *Euclidean* or
*Manhattan* distance, is computed on the vectors.
As discussed later, we have tested hierarchical and $k$-means
cluster analysis in PerfExplorer on profiles with over 32K threads
of execution with few difficulties.

Often, many hundreds of events are instrumented when profile data is collected. Clustering works best with dimensions less than 10, so dimension reduction is often necessary to get meaningful results. Currently, there is only one type of dimension reduction available in PerfExplorer. To reduce dimensions, the user specifies a minimum exclusive percentage for an event to be considered "significant".

To reduce dimensions, select the "Select Dimension Reduction" item under the "Analysis" main menu bar item. The following dialog will appear:

Select "Over X Percent". The following dialog will appear:

Enter a value, for example "1".