Pacemaker Project

University of Oregon

 

GenSet: Some Details

GXL: the Graph Exchange Language

GenSet consumes and produces directed, attributed graphs encoded in XML. The particular XML encoding we have chosen is GXL, an emerging standard interchange format in the reverse engineering and graph transformation research communities. [More on GXL]

Alternatives

GenSet can also consume the simpler "Rigi standard format" (RSF) triples form. RSF is the simplest format to produce with "one off" scripts written in Perl, Awk, Sed, etc. Also, many existing reverse engineering tools already produce RSF.

GenSet can also produce "dot" files compatible with the AT&T open source GraphViz project. Dot files can be read by a number of graph visualization tools, including dotty, grappa, and OmniGraffle. We currently use dotty to view GenSet output on Unix/X platforms and OmniGraffle to view GenSet output on Mac OSX.

GXL and Xarch

GXL is generic XML representation for directed, attributed graphs. Xarch, by contrast, is an XML representation of directed attributed graphs that represent software architectures; i.e., it is a more specialized form of graph. GenSet is based on GXL because it will consume a wide variety of information structures that were not originally designed for compatibility with architectural analysis tools; it is the job of GenSet to transform them into structures that can sensibly be attached to architectural representations like Xarch.

For interoperability with tools based on the Xarch representation, two kinds of filters will be provided. A filter from Xarch to GXL will simply encode the architecture-specific Xarch structure into the generic GXL structure. One or more filters from GXL to Xarch will perform the opposite translation, but they will not be applicable to arbitrary GXL files: They will translate GXL files that have already been transformed to represent the semantic structures that "fit" in the Xarch representation.

Obtaining GXL graphs

Many research groups are currently at work on tools that will produce and/or consume GXL graphs, including reverse engineering tools that extract GXL representations from legacy source code. Some GenSet inputs will be obtained from these tools; we will produce others by writing simple text filters, and using filters that others write.

Example

A common task in reverse engineering is to "lift" a detailed implementation relation (e.g., calls) to a higher (coarser) design level. For example, if we have a call graph relating program entities (C functions, Java classes, etc.), and we can relate those program entities to source files, and we have a directory structure, and we have grouped parts of the directory tree into a tree of subsystems, then we might want to "lift" the call relation to the subsystems. That is, we would infer a dependence relation between subsystem A and subsystem B if some function in some file in some directory in subsystem A called some function in some file in some directory in subsystem B. Holt's research group at Waterloo has provided a "guinea pig" example of this and other transformations using relations extracted from the Linux kernel. We have reproduced that transformation using their raw data, and adding as a twist the ability to "focus" on a few subsystems of interest (collapsing the other "uninteresting" subsystems into their nearest interesting ancestor). The input is a set of approximately 7000 relevant edges (and an additional 3000 irrelevant edges); the "lifted" call graph is produced in a few seconds.

Lifted call-graph of Linux kernel

Personnel

The GenSet project is directed by Michal Young. Contributors to the development of GenSet include

Craig Kaes created the initial prototype system
Sachiko Honda created the initial graphical interface
Lam Tan Nguyen complete redesign and reimplementation of internal data structures; revisions of other parts of GenSet
John Fiskio-Lasseter complete redesign and reimplementation of interpreter; revisions of other parts of GenSet
Joannie Humphreys performance engineering (leading to greater than 10x speedup)
Yan Weng GXL 1.0 input; GUI re-work
Daniel Ellsworth Input filters (PDB and Xarch to GXL)

 

Up to [Project] [Department] Contact: Michal@cs.uoregon.edu Updated 5-sep-02
Effort sponsored by the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-00-2-0620. [more]