Next: A Hierarchical Layered Up: Standardization of Event Traces Previous: Standardization of Event Traces

Introduction

Over the last decade, the development of various high-performance parallel and distributed computer systems has progressed at an explosive rate. Their computation speed can outperform state-of-the-art serial supercomputers, and they are far less expensive. However, software for driving these parallel machines is still in its infancy. Programming parallelism can be very painful and frustrating. In addition, debugging a parallel program and searching for performance bottlenecks is a difficult and time-consuming process.

Many projects from University and research institutions have developed and implement tools and environments to ease the use and programming of parallel systems. Dozens (hundreds?) of parallel programming tools are being developed and some of them are becoming commercially available (a survey on parallel debugging tools [16] which is already three years old, lists 28 important tools). The majority of the analysis tools is event-based and uses event traces for representing the dynamic behavior of the system under investigation, the object system. In the following, we will call such a tool (environment) an event trace monitoring and analysis system. Each system has its own design goals and philosophy for solving a particular class of problems on a particular class of parallel machines. Due to the diversity of tools and complex parallel computer platforms, using these tools often results in confusion and frustration. Additionally, the user has to learn and use different tools when working with more than one object system.

The limitation on particular problem classes and machines is not obvious because all tools comprise the same basic functionality. Therefore, in this article we discuss approaches to implementing object-independent event trace monitoring and analysis systems. Object-independent means that the system can be used for the analysis of arbitrary (non-sequential) computer systems with arbitrary operating systems, programming languages and running different applications. This means that there is no need to change the program code of the analysis system and recompile it, when a different measurement has to be analyzed.

In order to allow a systematic and structured discussion, a hierachical layered model for event trace monitoring and analysis systems is introduced first. This model shows that there are three main components in such a system which are affected by the problem of object-independence. They are discussed in the following three sections. Section 3 deals with some aspects of object-independent monitoring. In section 4 we discuss different approaches to standardize the access to event traces, as standardization would allow object-independent tools to be developed, and it would also ease the sharing and exchange of traces and of the tools themselves. Then we will introduce our own proposal: the object-independent TDL/POET event trace access interface. In section 5 we present our approach to application-independent but problem-oriented implementation of analysis tools. The distributed hardware monitor system ZM4 and the SIMPLE event trace analysis environment were implemented with respect to these considerations, and have been used in many 'real-world' applications throughout the last three years. An overview of the projects in which the ZM4/SIMPLE tools were used is given in the last section.

Next: A Hierarchical Layered Up: Standardization of Event Traces Previous: Standardization of Event Traces

mohr@cs.uoregon.edu
Fri Feb 25 11:04:10 PST 1994