Java Implementation of the TINY Compiler
Last updated 2008/04/04 20:28:42
Following are the source files for a Java implementation of the TINY
compiler example given in the text book Compiler Construction - Principles
and Practice by Kenneth Louden. This code is adapted from the C
implementation presented in the book. The purpose of this rewrite of Louden's
example is to explore an object oriented approach to compiler implementation
in Java.
I believe the architecture of a compiler lends itself to a natural object
oriented design. Traditionally,
compiler implementations have been organized very algorithmically, and
tend to have large switch statements or other forms of multiple branch
logic. The use of switch statements suggests the possibility of using
a class hierarchy and dynamic method binding instead. In this design, a class
hierarchy is used for the grammatical structure of the TINY language. The
goal is to capitalize on as much commonality as possible as well as to
use the strict type checking of Java to make the compiler implementation
as error free as possible.
Like Louden's original implemenation, this implementation is organized
according to the phases of development - scanner, parser, code analysis,
and code generation. The scanner is generated by JFLEX and the parser is
generated by CUP. This
implementation should generate the same code (for the TM machine
tm.c given
in Louden's book).
High level description of the implementation
A Makefile provides the rules for compiling
this implementation using the Java compiler.
tinyc.java is the driver for running the compiler.
The basic unit of scanning is the Token class which is defined
in Token.java. A Token encapsulates a textual scanned
lexical unit along with its line number and file name and a type
characterizing the token.
The core of the compiler design is in the classes in the file
ASTnode.java. These classes capture the grammar of the
TINY language. There is a base class ASTnode that derives from Token
and represents the common aspects of all nodes in the abstract syntax tree.
As such, it has abstract methods for printing and code generation.
Some of the main classes deriving from ASTnode are Expression for expressions
Statement for statements, and Program for the entire TINY
program.
Lexer.java defines
and implement a Lexer class that enapsulates the lexical analysis
actions. tiny.jflex defines a JFLEX implementation
of the scanner core, resulting in the class TinyLexer which extends
Lexer.
TinyLexer-hand.java contains a hand coded
implementation of the scanner for Tiny that does not use JFLEX.
tiny.cup defines a CUP implementation of the
parser core that results in a TinyParser class.
This class encapsulates the syntax analysis actions.
Symtab.java defines
and implements a very simple symbol table used by the compiler.
Finally, code generation is handled by a Code class
(Code.java)
as well as implementations of the codegen method of ASTnode.
tiny-compiler.tar is an archive of
all of the source files
You may also want to look at a
C++ version of the Tiny compiler.
Some Notes on the Coding Style
As noted above, the base classes in ASTnode.java define some abstract methods.
For example, a method print is used to pretty print the syntax tree.
The implementations of this method for each of the concrete classes in
ASTnode.java can be found throughout the classes there. The C++ language
offers an advantage over Java source code organization: we could collect
all of the method implementations into a single source file consisting
only of the method implementations. In the Java implementation, we must
have the method implementation within the class definitions. Of course,
the classes give us object and type orientation for the grammatical elements.
In C++, we can have this organization but still arrange source on a
functional basis, which I think would be an advantage.