The program consists of more than thirty, 80 minute lectures presented by internationally recognized leaders in programming languages and security research.

Typing Ad Hoc Data Kathleen Fisher

Traditionally, types describe the internal data manipulated by programs. To accommodate the variety of desired data structures, language designers and type theorists have developed a wide variety of types and type constructors. But not all useful data is in programs! In fact, enormous amounts of it sit on disks or stream by on wires in a dizzying array of encodings and formats. It turns out that many of the types developed for internal data can be used to describe external data: tuples, records, unions, options, and lists come to mind as obvious examples. Perhaps more surprisingly, recursive types, singletons, functions, parametric polymorphism, and dependent types are relevant as well. Using types to describe external data leads naturally to the insight that we can reuse the same type to define an internal data structure and to generate parsing and printing functions to map between the two representations. The PADS project has exploited this idea, building data description languages based on the type structure of C (PADS/C) and on ML (PADS/ML) and exploring the theoretical basis for such languages with the Data Description Calculus (DDC). Continuing the analogy, it turns out that other concepts from the types world are also relevant to ad hoc data processing, including generic programming, type inference, type isomorphisms, and subtyping. In this series of talks, I will describe the domain of ad hoc data processing, the PADS data description languages, the supporting tool infrastructure, and the semantic basis for these languages. Time permitting, I will explore the question of type inference, describing quantitative techniques we are currently developing to construct a description of ad hoc data given example instances.

More information about the PADS project is available from www.padsproj.org, including a list of the many people who have contributed to the project.

Understanding Multilingual Software Jeff Foster

Many software systems are written in multiple languages, and yet there has been little research to date in reasoning about such programs. Most multi-lingual programs are built using foreign function interfaces (FFIs), which define what must be done to translate between native and foreign data representations and how the foreign code should safely interact with the native code. These requirements are easy to get wrong, and mistakes often lead to subtle, hard-to-find bugs.

This series of lectures will discuss an approach to addressing this problem using static analysis. We will describe Saffire, a pair of tools for checking type safety across the OCaml FFI and the Java Native Interface (JNI). We will begin with the basics of type inference, and then show how to express polymorphic inference using semiunification constraints. We will next discuss the challenges of static analysis for FFI code, and elaborate type inference to handle the issues in analyzing JNI glue code. Then we will present a dataflow-style type inference algorithm for reasoning about OCaml FFI glue code, and we will end by discussing current results and the challenges of making these systems work on existing code bases.

Types for Safe C-Level Programming Dan Grossman

C remains the de facto standard for systems programming even though its lack of memory safety is a notorious source of software defects and security vulnerabilities. In addition to legacy reasons, C is desirable in some domains precisely because it provides a lower level view of data and more manual resource management than modern, safe languages. Conversely, it provides some higher-level features (scope and abstract types) than assembly languages.

This series of lectures will address how we can provide a memory-safe dialect of C that retains the C view of data and resource management. The relevant foundational theory is type systems, particularly universal and existential types. By applying the theory, we can design a C-like language without unsafe type casts, dangling pointers, and data races. We will also discuss more recent work on applying language theory to define what it means for C code to be portable and to use these ideas to build helpful tools for programmers.

Web and Database Application Security Zhendong Su

Web and database applications contain many critical faults and are susceptible to serious failures and security threats. A recent study shows that the top three types of exploited vulnerabilities in 2006 were for web applications, for the first time surpassing the number of exploited buffer overflow vulnerabilities. These lectures will cover recent language-based approaches for improving the reliability and security of web and database applications:

  • static analyses for enforcing correctness of dynamically generated database queries;
  • runtime checking mechanisms for detecting SQL injection attacks;
  • static analyses for detecting SQL injection and cross-site scripting vulnerabilities.

Statistical Debugging Ben Liblit

A large user community is the most powerful debugging tool any developer can have, if you can figure out how to tap into it. This series of lectures explores what it takes to turn a thousand end users into your testing team. We begin with an examination of lightweight instrumentation and feedback techniques that offer a window into the dynamic (mis)behavior of software in the field. We then consider machine learning techniques to create statistical models of program failure that help programmers identify and repair buggy code. Lastly, we roll up our sleeves and get into the nitty- gritty technical and social details of building and deploying a complete, robust Cooperative Bug Isolation (CBI) system.

XML and Web Application Programming Anders Møller

Programming with XML and WWW leads to many interesting opportunities and challenges to design of programming languages and static analyses. These lectures will present an overview of the basic XML technologies, focusing on formal models of schema languages. We then study the state of the art of programming languages designed for processing XML data, such as, XDuce, XACT, and XQuery, and discuss the challenges for further research. Finally, we survey modern frameworks for Web application programming.

Programming Models for Distributed Computing Yannis Smaragdakis

These talks first examine distributed computing from a languages/software engineering standpoint, and subsequently use the distributed programming domain to motivate program generation: a general and powerful approach to automating programming tasks.

We begin with a simple-to-state but important problem: how to define middleware that allows the programmer to think of a distributed application largely as a centralized one, while maintaining efficiency and expressiveness. Our NRMI middleware facility is the first to support call-by-copy-restore semantics for arbitrary pointer-based data, letting the programmer think of a remote procedure call much like a local one in many useful cases. We show that NRMI is significantly easier to use than traditional RPC middleware, yet maintains efficiency and full programmer control.

If we take the task of simplifying distributed programming to the extreme, we encounter systems that allow unsuspecting programs to execute in a distributed environment. Typically such systems use program generation extensively. We briefly present our J-Orchestra system for automatically enabling Java programs to execute in a distributed setting. We then discuss the impact that the J-Orchestra program transformation techniques have had on a large open-source project (JBoss).

Staged Programming Walid Taha

Multi-stage programming is a paradigm for writing generic programs that do not pay a runtime overhead. The key underlying technology is program generation. In addition, languages designed to support this paradigm (such as MetaOCaml) help the programmer avoid many of the difficulties that are traditionally encountered in developing program generators. These lectures will introduce you to the basics of this paradigm as well as of programming in MetaOCaml. Numerous examples will be used to illustrate the practice of multi-stage programming.

Fault Tolerant Computing David August

As chip densities and clock rates increase, processors are becoming more susceptible to intermittent faults caused by external events such as background radiation. These faults, called transient faults, do not cause permanent damage but may alter computation or state, resulting in incorrect program execution. Computer scientists must now learn to build reliable systems from unreliable components. In this track, we will explore the nature of transient faults and of mechanisms designed to protect against them. We will examine the role languages and compilers may play in creating radiation hardened programs. We will also consider new opportunities made possible by languages which embrace potentially incorrect behavior. We will evaluate these ideas both empirically, by running experiments, and theoretically, by using formal methods to prove properties of fault-tolerant software.

Scalable Defect Detection Manuvir Das, Daniel Wang, Zhe Yang

Software in the real world has two characteristics that make automated program analysis difficult to apply: First, programs are large enough that scalability is of critical importance (anything super-linear in practice is out of the question). Second, programmers work against tight deadlines and are loathe to invest time examining defect messages that are not derived from actual executions. Therefore, accuracy is equally important.

We in the Program Analysis Group at Microsoft have spent the last few years overcoming these challenges, using two techniques: The first technique is to perform a combination of weak but fast global analysis everywhere and precise but slow analysis in a few, automatically determined, regions of the code. The second technique is to require programmer added interface annotations of function pre-conditions and post-conditions, followed by strong modular analysis to validate the code. The technique is strengthened through the use of weak global analysis to automatically infer many of the interface annotations.

In this series of lectures, we will cover the analysis principles and techniques behind our solutions, practical choices made along the way, and empirical results. We have used our tools to find tens of thousands of defects that have been fixed by programmers during development of currently available products.