Perpetual Testing Logo

Exchanging and Manipulating Attributed Graphs

Many analysis tools manipulate attributed graphs. These graphs may represent architectures, design models, or structures extracted from programs. A standard for exchanging this graph-structured data would make it easier to combine existing tools for ad hoc explorations as well as prototype tool construction.

Yet another "standard" format?
Pipefitting
- How tools support a standard
GenSet: an "awk" for attributed graphs

Yet another "standard" format?

There are too many "standard" formats for exchanging structured data already. Rather than introduce yet another, we selected an existing format to standardize on. The format we chose is the graph description language processed by the "dot" tool from the AT&T graph visualization toolset. To the best of our knowledge, this language does not have a name of its own, so we call it "dot."

Why the "dot" format?

We chose the "dot" format for several reasons, among them:

Dot is a simple textual format that can be manipulated with tools like awk and perl, and even edited in a text editor. This is important for rapid construction of experimental tools from combinations of off-the-shelf and easily constructed parts, and is important also for rapid testing and debugging.
The dot format is easily extendible with new attributes on nodes and edges; many other graph formats can easily be translated to and from dot.
The graph visualization tools distributed free of charge by AT&T are valuable; choosing the dot format saved us the effort of providing this functionality in some other way.
The dot format is already widely used by others building experimental tools for software engineering.

What about ACME?

The ACME format for exchanging architectural descriptions is similar in some ways than dot, and it is an important standardization effort in the EDCS research community. Like dot, ACME consists of a simple "core" graph which can be decorated with an arbitrary set of annotations. However, ACME is specialized to the task of describing software architectures; for example, it mades a distinction between components and connectors. ACME is the right choice for tools that can be used for more than one architectural description formalism, but which would not make sense for other attributed graphs. Dot is a better choice for tools that manipulate attributed graphs without regard for whether they describe architectural structures, program control flow, or something else entirely. When one wishes to perform a generic graph manipulation on a graph that happens to represent a software architecture, the dot format can be used as a textual encoding of an ACME graph.

What about Corba? XML? ...

There are many other very generic interchange standards that we might have used. Some of these, like CORBA, are interface standards through which one can provide interfaces to any kind of object, not just attributed graphs. The power of such an interface comes at a price: Combining tools using such a standard requires more effort than using a standard interchange format, and is therefore less suitable for rapid construction of research prototypes and ad hoc tools.

XML is an intriguing possibility, because like dot it is a simple textual interface that can be easily manipulated with simple tools. However, XML as a standard for graph interchange is no standard at all, because graphs could be represented in XML in an enormous variety of mutually incompatible ways. What might make sense in the future, though, is to encode dot structures in XML in much the same way as we now encode ACME structures in dot, providing yet another level at which certain classes of generic tools can be developed and shared.

Pipefitting

Choosing a standard representation is only a start. We also need simple "pipefitting" components.

How tools support an interchange standard

A standard format is a start, but in general it is still difficult to compose off-the-shelf tools. Simple "pipefitting" components can help by performing simple manipulations on objects in the interchange format, for example adjusting the output of one tool to be more suitable as input for another. Think for example of the important role that simple tools like sed, awk, and grep play in fitting together Unix tools that manipulate streams of text. Our goal is to provide a similar level of pipefitting support for a attributed graph objects.

GenSet: An "awk" for attributed graphs

GenSet is a programmable tool for simple graph manipulations. It is intended to fill a niche similar to the role that Awk plays in the world of Unix text streams: It can perform a class of graph manipulations which are too complex for simpler, less flexible tools, but with orders of magnitude less programming effort than would be required to write a special-purpose tool in C or Java.

One of the manipulations that GenSet can carry out is to translate node attributes into labeled edges and vice versa. After all, edges and attributes are both just ways of representing the same underlying mathematical concept of a relation. Second, GenSet can compute new node attributes by finding a fixed-point solution to equations relating attributes at each node to attributes at its neighboring nodes. (GenSet uses standard flow analysis algorithms for this.) Taken together, attribute computations and conversion between attributes and edges provides a very compact and simple way to specify many useful graph transformations.

[Simple Example]

Status and availability

The first working prototype of GenSet was completed in June 1998 by Craig Kaes. It is implemented entirely in Java (JDK 1.1.6), with no known platform machine or operating system dependencies, although it has been tested only on Sparc Solaris and Solaris x86. It is currently an "alpha" level release, with many shortcomings and at least a few (known) bugs. Most conspicuously absent at this time is user documentation. We are happy to share it with brave souls who care to experiment, but at this time we do not have the resources to offer more than informal support on an as-time-allows basis. [Contact Michal Young to obtain the most recent version.]

Plans

GenSet will continue to evolve rapidly over the next several months. At some point --- perhaps summer 99 --- we will take stock of lessons learned so far and embark on a complete redesign. Until then, we will be making incremental improvements, including:

User documentation; this is a high priority for fall 98
Additional operations. In particular, GenSet needs operations corresponding to the "project" and "select" operations on relational databases.
Efficiency. Some key data structures must be revised so that GenSet can be applied to graphs with many thousands of nodes with acceptable efficiency.
Auxillary tools: While GenSet fills a uniquely important niche among pipefitting components for exchanging attributed graphs, a variety of other pipefitting components are needed. Of particular interest are conversions between the dot format and other important formats. We already have a few of these, e.g., conversions between dot and fc2 (used by the Meije verification tools), and others are under construction (e.g., we have a somewhat limited capability for converting between ACME and dot, which we use in our Chera tool).

Contact: Michal Young