Next: The object-independent Trace Up: Layer 2: Event Previous: Layer 2: Event

Possibilities of Standardization

Dozens of monitoring and analysis tools for parallel or distributed computer systems have been implemented, but they are incompatible with each other. Standardization would allow object-independent tools to be developed, and it would also ease the sharing and exchange of traces and of the tools themselves. In the past there were two main attempts to solve this problem: a working group on Standards in Performance Instrumentation and Visualization at the 1989 LANL workshop on performance monitoring tools [14], and a BOF session on Standardizing Trace Formats at the Supercomputing 1990 conference [25]. Until now it seems that there is no final solution. In the following we will discuss the five possibilities of standardizing the event trace access.

The first two variants (1+2) rely on the definition of a fixed standard record format for event traces. They both have the same major disadvantage: since there is a great variety of existing parallel or distributed computing systems, operating systems and applications, it will be difficult to define a fixed trace format which is general enough to meet all requirements.

The simplest possibility (1) would be to agree on a fixed standard trace format and to implement or to change all monitoring systems in such a way that they produce this format. If that is not possible, as in the case of an existing hardware monitor, a program to convert the traces into the standard trace format can be used. This approach can only be successful if one restricts oneself to one area of applications, relying on common features defining the fixed trace format. But this would only be a partial solution. One example is the PICL-Format [5] which is very popular but can only be used in message-passing systems. Another approach is the Simple Trace Interchange Format STIF [9].

The second approach, which is rather an extension to approach (1), provides a standard software monitor which generates the correct standard trace format. It is possible to implement these functions for different operating systems and programming languages, making instrumented code portable. Nevertheless, the main problem of finding a general trace format remains. An example for this variant is the proposal of the working group at the LANL workshop mentioned above [14].

A fixed trace format is not flexible enough to meet all requirements during the analysis of arbitrary parallel and distributed systems. Another very promising approach is to standardize the interface functions of the analysis system. There are also some variants:

For approach (3), each vendor or monitor system developer implements functions which can read his particular trace format, but presents the data according to a standard interface specification. These functions are linked with standard analysis tools. This is similar to the approach for graphic tools, where different device drivers are used for different output devices. The main disadvantages of this variant are the need for access functions for each trace format one wants to use (this can be many), and that they have to be linked with each analysis tool.

Approach (4) is a combination of standard trace format and standard access functions, however the trace format is not fixed. It is self-describing, i.e. the trace contains information about structure and representation of its data. There are two possibilities: the format description for the whole trace is located in a (standardized) trace header, or each value is prefixed with a so-called tag. Examples for a self-describing trace format stored in a header are the Traceview tool [15] and the Pablo environment [26]. Tags are used for coding protocol data units with ASN.1 [11][10].

By using self-description, this approach is very general and should be able to handle even future trace formats. A small disadvantage arises from existing monitor tools which cannot provide this trace format, but filter tools can be used to convert the trace into the self-describing format.

This last disadvantage is avoided by approach (5). The description of the trace format is not stored in a trace header but in a separate file which is generated by the monitor tool. If necessary a user can also create this description manually and can analyze arbitrary event traces this way. The adaptation to a new trace format consists of generating the corresponding trace description file.

As this approach is the best solution, we use it in our event trace analysis environment SIMPLE. For describing the record format we developed the event Trace Description Language TDL. We also developed the Problem-Oriented Event Trace interface function library POET, which serves as a standard access interface for arbitrary event traces. Both tools are described in the next subsection.



Next: The object-independent Trace Up: Layer 2: Event Previous: Layer 2: Event


mohr@cs.uoregon.edu
Fri Feb 25 11:04:10 PST 1994