The distributed parallel performance analysis architecture described in this paper has been recently designed and prototyped at Dresden University of Technology in Dresden, Germany. Based on the experience gained from the development of the performance analysis tool Vampir, the new architecture uses a distributed approach consisting of a parallel analysis server, which is supposed to be running on a partition of a parallel clustered environment, and a visualization client running on a remote graphics workstation. Both components interact with each other over a socket based network connection. In the discussion that follows, the parallel analysis server together with the visualization client will be referred to as VNG. The major goals of the distributed parallel approach can be formulated as follows:
VNG consists of two major components, analysis server (vngd) and visualization client (vng). Each can run on a different kind of platform. Figure 1 depicts an overview of the envisioned overall architecture. Boxes represent modules of the components whereas arrows indicate the interfaces between the different modules. The thickness of the arrows is supposed to give a rough measure of the data volume to be transferred over an interface whereas the length of an arrow represents the expected latency for that particular link.
On the left hand side of Figure 1 we can see the analysis server, which is to be executed on a dedicated segment of a parallel machine having access to the trace data generated by an application being traced. The server is a heterogeneous program (MPI combined with pthreads), which consists of worker and boss processes. The workers are responsible for trace data storage and analysis. Each of them holds a part of the overall trace data to be analyzed. The bosses are responsible for the communication to the remote clients. They decide how to distribute analysis requests among the workers and once the analysis requests are completed, the bosses merge the results into one response that is to be sent to the client.
The right hand side of Figure 1 depicts the visualization client(s) running on a local desktop graphics workstation. The client is freed from time consuming calculations. Therefore, it has a straightforward sequential GUI implementation. The look and feel is very similar to performance analysis tools like Vampir, Jumpshot and others. For visualization purposes, it communicates with the analysis server according to the user's inputs. Multiple clients can connect to the analysis server at the same time.
As mentioned above, the shape of the arrows indicates the quality of the communication links with respect to throughput and latency. Knowing this, we can deduce that the client-to-server communication was designed to not require high bandwidths. In addition, only moderate latencies are required in both directions. Two types of data are transmitted: control information and condensed analysis results. Following this approach, the goal of parallel analysis on the server and remote visualization is achieved. The big arrows connecting the program traces with the worker processes indicate that high bandwidth is a major goal to get fast access to whatever segment of the trace data the user is interested in. This is achieved by reading data in parallel by the worker processes.
To support multiple client sessions, the server makes use of multi-threading on the boss and worker processes. The next section provides detailed information about the analysis server architecture.