CIS (4|5)61

WF 8:30-9:50am

348 McKenzie


Welcome to CIS 461/CIS 561, Introduction to Compiler Construction. This is a course in construction of programming language translators (compilers and interpreters). Why in the world could you possibly care? Actually, there are a few reasons.

Why study compiler construction?

First, even if you never build a full compiler for a general purpose programming language, there are many practical uses for the techniques of compiler construction, including construction of simple transducers using lexical analysis and parsing tools to program analysis using type-checking and data flow analysis techniques to various kinds of application "engines" using standard approaches to run-time organization.

Second, you will gain a lot of insight into tools you use every day, why they work the way they do and have the limitations they have, and what is happening when they don't work.

Third, you'll see one of the classic success stories of theory and systems research in computer science. It is a wonderful illustration of synergistic interaction of theory and systems, in which a theoretical research agenda is driven by practical problems and succeeds in utterly transforming practice. Plus it's just plain cool.

Learning Objectives

Upon successful complection of this course, you should be able to:

  • Distinguish lexical, syntactic, static semantic, and dynamic semantic rules of a programming language, and explain why they belong to each category.
  • Produce a clean, readable lexical analysis specification from which a tool like lex can generate an efficient lexical analyzer.
  • Code a top-down parser, recursive descent parser for a simple context free grammar.
  • Produce a grammar specification suitable for generating a bottom-up parser for a more complex context free grammar.
  • Distinguish concrete syntax from abstract syntax, and produce an abstract syntax structure while parsing the concrete syntax of a programming language.
  • Describe and implement the dynamic dispatch mechanism for single inheritance object-oriented languages that use virtual method tables.
  • Distinguish flow-sensitive from flow-insensitive static semantic rules, and implement both.
  • Generate code for the classic stack approach to allocating activation records for methods, functions, and procedures, and recognize language features which could force a language implementation to use other approaches.

Lecture

Wednesday, Friday from 8:30am to 9:50, in McKenzie 348.

Staff

Instructor
Michal Young, 243 Deschutes Hall

Text

I do not require a particular text, but I do recommend finding a useful text for reference. I suggest selecting a used book, since none of these are perfect and new copies can be quite expensive. I will make a few copies of classic compiler textbooks available in the shelf in Deschutes Room 100. That may suffice for this term. In the longer term, if you work on compiler construction you will likely want a good reference on your personal bookshelf.

Textbooks to consider

The “Dragon Book” is a pretty good one to keep on your bookshelf:
Compilers: Principles, Techniques, & Tools by Alfred A. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman
This book is considered a classic and is particularly strong on parsing theory, but the parts on parsing theory can be hard to absorb on a first reading. I think it's not as strong as some other books on type checking and code generation.

Crafting a Compiler by Charles N. Fischer, Ron K. Cytron, and Richard J. LeBlanc.
This book has clearer explanations than the dragon book, but might be a little less useful as a reference. It has gone through three editions that I know of. My favorite is the earliest, which presented program examples in an Ada-like pseudocode. I found that more reasonable than the second version (Crafting a Compiler in C), which used C source code for examples, because details of the code got in the way of clarity. The newest edition uses pseudocode again, but goes into some depth (not always correctly, I think) on using visitors and other object-oriented patterns. Its coverage of the underlying theory of lexing and parsing is not as thorough as the dragon book, which could be an issue.

Modern Compiler Implementation in Java by Andrew W. Appel, Jens Palsberg.
I have not read the edition co-authored by Palsberg, but I have heard good things about it and have every reason to believe that it will be very clear and thorough, and (because Jens is a leading researcher in types and program analysis) it should be particularly good in type checking. There is an earlier edition using ML that would be a very good choice if you are an ML programmer (but CIS 461/561 is not the right time to build your first ML program). The earlier edition in Java should have been titled Modern Compiler Implementation in Java Syntax but ML Style, but nonetheless was among the clearer explanations of why the project compiler was structured as it was. The main reason I don't just make this the class text is that it is tied to a specific project, which differs from the project we will undertake.

There are other possibilities too. Don't get a Compiler Construction for Dummies or Learn Compilers in 48 Hours style quick how-to book, but pretty much any good academic textbook on compiler construction will probably be good enough for our purposes.

Pre-requisites

CS 425, CS 624, or a similar course in programming languages is a pre-requisite to CIS 461/561.

Project

We will produce a full running compiler for a small language called Quack. Quack is an object-oriented language with single inheritance and type inference (roughly like auto in C++). It does not have arrays or generics (templates), and it does not have primitive (unboxed) types. We will compile Quack programs to byte code for a simple virtual machine. I will do what I can to limit the effort required, including allowing teams of two to work together. The satisfaction of completing a working compiler, and the skills and knowledge gained in completing the project, will be commensurate with the effort it demands.

Approach

Project Plan

We will follow a pretty conventional outline from lexical analysis through parsing, static semantic analysis, and run-time structures and code generation.

Theory and Engineering

Modern compiler construction techniques and tools rest on a rich foundation of theory. While we won't have time for a thorough investigation of that theory, we will spend some time on a few important parts, including:

  • Regular languages, regular expressions, and the automata that recognize regular languages. This is the theory that gives us modern tools for lexing, the first stage in the front end of a compiler.
  • Context-free grammars, top-down parsing, and bottom-up parsing. We will spend more than a week mastering the automatic construction of bottom-up, LALR(1) parsers from suitable grammars. This is the construction used by parser generators like Yacc, Bison, and Lark.
  • Flow analysis. We will study and implement basic flow-sensitive and flow-insensitive, intra-procedural data flow analysis for type inference and static detection of use-before-initialization errors.
  • Classes and types. Like many object-oriented languages, Quack associates types with classes, and the subtype relation with the subclass relation. It has the usual rules for covariance and contravarience in method calls. I think this will be review for students who studied covariance and contravariance in CIS 425 or a similar course, but using these rules to implement type inference will be a substantial challenge.

In addition I hope to spend some time looking at the basic engineering of several contemporary language systems, including CPython, the Java JVM, and conventional "ahead-of-time" compiled languages like C and C++. We won't be able to take a deep dive into any of them, but our project will give us a good vantage point for considering the design choices they faced and why they are constructed as they are.

This will be hard

Compiler construction has a reputation as a very tough course. Both graduate and undergraduate students have told me it was the hardest course they ever took in college ... and also that they were glad they took it. It is just very hard to fit this topic into a semester, let alone a 10-week academic term. I try to keep the work load reasonable by, among other things, encouraging you to do parts of the project in small teams.

There will also be assignments that are not part of the project, and two exams.

Schedule (tentative)

Here is a rough schedule. Expect a midterm in week 5 or 6.

Week   Topic Working on Due
1 The structure of a compiler
2 Run time organization Tiny calculator
3 Lexical Analysis Begin lexer Thursday  
4 Grammars and top-down (LL) parsing   Lexer due Thursday
5 Bottom-up (LR) parsing Begin parser
6 Bottom-up parsing Parsing Parser due Thursday
7 Static semantic analysis Parsing & abstract syntax tree (AST) building Midterm Thursday
8 Static semantics analysis Type checking AST due Tuesday
9 Analysis, Optimization Code generator
10 Wrap up and prospectus   Full compiler due Friday