ICSE 2009: ICSE09
slide show picture of Vancouver
31st International
Conference on
Software
Engineering®
31st International Conference on Software Engineering, Vancouver, Canada, May 16-24, 2009.   Sign up for announcements!

RSS feed image RSS Feed

Information for
Potential Conference
Exhibitors

Accepted Research Papers

The research paper track received 405 submissions. Each submission was rigorously reviewed by at least 2 members of the program committee. Based on these reviews, the most promising submissions were discussed at length at a two-day face-to-face meeting of all PC members. In the end, the program committee selected 50 papers for inclusion in the program. The titles and abstractions of the accepted papers are listed below.

This years Most Influential Paper (MIP) is "N Degrees of Separation: Multi-Dimensional Separation of Concerns" by P Tarr, H Ossher, W Harrison, SM Sutton Jr.

Papers

Filter papers by category:

Show all
Code Generation and Transformation
Collaborative Development
Components
Concurrency
Debugging
Development Paradigms and Software Process
Development Tools
Dynamic Adaptation
Maintenance
Model Synthesis
Modeling
Program Analysis
Program Comprehension
Software Quality and Metrics
Testing
Web Applications


A Case-study on Using an Automated In-Process Software Engineering Measurement and Analysis System in an Industrial Environment (Irina Diana Coman, Alberto Sillitti, Giancarlo Succi)

Authors: Irina Diana Coman, Alberto Sillitti, Giancarlo Succi
Presenting at: TBD

Abstract: Automated systems for measurement and analysis are not adopted on a large scale in companies, despite the opportunities they offer. The fear of the “Big Brother” and the lack of reports giving insights into the real adoption process and concrete usages in industry are barriers to this adoption. We performed a case-study on the adoption and long-term usage (2 years) of such a system in a company focusing on the adoption process and the related challenges we encountered.

Back to top

Accurate Interprocedural Null-Dereference Analysis for Java (Mangala Gowri Nanda, Saurabh Sinha)

Link to poster and additional post-conference resources!

Authors: Mangala Gowri Nanda, Saurabh Sinha
Presenting at: TBD

Abstract: Null dereference is a commonly occurring defect in Java programs, and many static-analysis tools identify such defects.  However, most of the existing tools perform a limited interprocedural analysis.  In this paper, we present an interprocedural path-sensitive and context-sensitive analysis for identifying null dereferences. Starting at a dereference statement, our approach performs a backward demand-driven, path-sensitive analysis to identify true and false null-propagation paths precisely. The backward demand-driven analysis avoids an exhaustive program exploration that permits the analysis to scale to large programs. We present the results of empirical studies conducted using large open-source and commercial products. Our results show that: (1) our approach detects less false positives, and significantly more interprocedural true positives, than other commonly used tools; (2) the analysis scales to large subjects; and (3) the identified defects are often fixed in subsequent releases, which indicates that the reported defects are important.

Back to top

Analyzing Critical Process Models through Behavior Model Synthesis (Christophe Damas, Bernard Lambeau, Francois Roucoux, Axel van Lamsweerde)

Authors: Christophe Damas, Bernard Lambeau, Francois Roucoux, Axel van Lamsweerde
Presenting at: TBD

Abstract: Process models capture tasks performed by agents together with their control flow. Building and analyzing such models is important but difficult in certain areas such as safety-critical healthcare processes. Tool-supported techniques are needed to find and correct flaws in such processes. On another hand, event-based formalisms such as Labeled Transition Systems (LTS) prove effective for analyzing agent behaviors. The paper describes a blend of state-based and event-based techniques for analyzing task models involving decisions. The input models are specified as guarded high-level message sequence charts, a language allowing us to integrate material provided by stakeholders such as multi-agent scenarios, decision trees, and flowchart fragments. The input models are compiled into guarded LTS, where transition guards on fluents support the integration of state-based and event-based analysis. The techniques supported by our tool include model checking against process-specific properties, invariant generation, and the detection of incompleteness, unreachability, and undesirable non-determinism in process decisions. They are based on a trace semantics of process models, defined in terms of guarded LTS, which are in turn defined in terms of pure LTS. The techniques complement our previous palette for synthesizing behavior models from scenarios and goals. The paper also describes our preliminary experience in analyzing cancer treatment processes using these techniques.

Back to top

Automatic Creation of SQL Injection and Cross-Site Scripting Attacks (Adam Kiezun, Philip J Guo, Karthick Jayaraman, Michael D Ernst)

Authors: Adam Kiezun, Philip J Guo, Karthick Jayaraman, Michael D Ernst
Presenting at: TBD

Abstract: We address the problem of finding security vulnerabilities in Web applications.  SQL Injection (SQLI) and cross-site scripting (XSS) attacks are wide-spread forms of attack in which the attacker crafts the input to the application to execute malicious code and access or modify user data.  In the most serious attacks (second-order XSS), an attacker can corrupt a database and cause subsequent users to execute malicious code. We present an automatic technique for creating inputs that expose SQLI and XSS vulnerabilities.  The technique builds on systematic testing. Our technique uses a novel approach to tracking the flow of symbolic values through the database, thus precisely targeting second-order XSS. Our technique creates real attack vectors, has few false positives, incurs no runtime overhead for the deployed application, works without modification of application code, and handles dynamic language constructs.  We implemented the technique for PHP, and found 68 previously unknown vulnerabilities in five applications.

Back to top

Automatic Dimension Inference and Checking for Object-Oriented Programs (Sudheendra Hangal, Monica Lam)

Authors: Sudheendra Hangal, Monica Lam
Presenting at: TBD

Abstract: This paper introduces UniFi, a tool that attempts to automatically detect dimensionality errors in Java programs. UniFi performs interprocedural, contextsensitive analysis to infer dimensional relationships across primitive type and string variables in Java code. It then monitors these dimensional relationships as the program evolves, flagging inconsistencies that may be errors in the program. UniFi exploits features of object-oriented languages, but can be used for other languages as well. Prior work in this area requires programmers to explicitly annotate programs with dimensions and units, and deals mostly with physical dimensions like Mass, Length and Time. In contrast, UniFi requires no programmer annotations and supports program-specific dimensions, thus providing fine-grained dimensional consistency checking. We have run UniFi on real-life Java code and found that it is useful in exposing dimensionality errors.

Back to top

Automatically Capturing Source Code Context of NL-Queries for Software Maintenance and Reuse (Emily Hill, Lori Pollock, K. Vijay-Shanker)

Authors: Emily Hill, Lori Pollock, K. Vijay-Shanker
Presenting at: TBD

Abstract: As software systems continue to grow and evolve, locating code for maintenance and reuse tasks becomes increasingly difficult. Existing static code search techniques using natural language queries provide little support to help developers determine whether search results are relevant, and few recommend alternative words to help developers reformulate poor queries. In this paper, we present a novel approach that automatically extracts natural language phrases from source code identifiers and categorizes the phrases and search results in a hierarchy. Our contextual search approach allows developers to explore the word usage in a piece of software, helping them to quickly identify relevant program elements for investigation or to quickly recognize alternative words for query reformulation. An empirical evaluation of 22 developers reveals that our contextual search approach significantly outperforms the most closely related technique in terms of effort and effectiveness.

Back to top

Automatically Finding Patches Using Genetic Programming (Westley Weimer, ThanhVu Nguyen, Claire Le Goues, Stephanie Forrest) Winner of IFIP TC2 Manfred Paul Award and ACM SIGSOFT Distinguished Papers Award

Link to poster and additional post-conference resources!

Authors: Westley Weimer, ThanhVu Nguyen, Claire Le Goues, Stephanie Forrest
Presenting at: TBD

Abstract: Automatic repair of programs has been a longstanding goal in software engineering, yet debugging remains a largely manual process. We introduce a fully automated method for locating and repairing bugs in production software.  The approach works on off-the-shelf legacy applications and does not require formal specifications, program annotations or special coding practices.  Once a program fault is discovered, an extended form of genetic programming evolves program variants until one is found that both retains required functionality and also avoids the defect in question. Standard test cases are used to exercise the fault and to encode program requirements. After a successful repair has been discovered, it is minimized using using structural differencing algorithms and delta debugging.  We describe the proposed method and report results from an initial set of experiments demonstrating that it can successfully repair ten different C programs totaling 63,000 lines in under 200 seconds each, on average.

Back to top

Complete and Accurate Clone Detection in Graph-based Models (Nam H. Pham, Hoan Anh Nguyen, Jafar M. Al-Kofahi, Tung Thanh Nguyen, Tien N. Nguyen)

Authors: Nam H. Pham, Hoan Anh Nguyen, Jafar M. Al-Kofahi, Tung Thanh Nguyen, Tien N. Nguyen
Presenting at: TBD

Abstract: Model-Driven Engineering (MDE) has become an important development framework for many large-scale control software. Previous research has reported that as in traditional code-based development, cloning also occurs in MDE. However, there has been little work on clone detection in models with the limitations on detection precision and completeness. This paper presents ModelCD, a novel clone detection tool for Matlab/Simulink models, that is able to efficiently and accurately detect both exactly matched and approximate model clones. The core of ModelCD is two novel graph-based clone detection algorithms that is able to systematically and incrementally discover clones with a high degree of completeness, accuracy, and scalability. We have conducted an empirical evaluation with various experimental studies on many real-world systems to demonstrate the usefulness of our approach and to compare the performance of ModelCD with existing tools.

Back to top

Discovering and Representing Systematic Code Changes (Miryung Kim, David Notkin)

Authors: Miryung Kim, David Notkin
Presenting at: TBD

Abstract: Software engineers often inspect program differences when reviewing others' code changes, when writing check-in comments, or when determining why a program behaves differently from expected behavior. Program differencing tools that support these tasks are limited in their ability to group related code changes or to detect potential inconsistency in program changes. To overcome these limitations and to complement existing approaches, we built Logical Structural Diff (LSDiff) that infers systematic structural differences as logic rules, noting anomalies from systematic changes as exceptions to the logic rules. We conducted a focus group study with professional software engineers in a large E-commerce company and also compared LSDiff's results with plain structural differences without rules and textual differences. Our evaluation suggests that LSDiff complements existing differencing tools by grouping code changes that form systematic change patterns regardless of their distribution throughout the code and that its ability to discover anomalies shows promise in detecting inconsistent changes.

Back to top

Do Code Clones Matter? (Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, Stefan Wagner)

Link to poster and additional post-conference resources!

Authors: Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, Stefan Wagner
Presenting at: TBD

Abstract: Code cloning is not only assumed to inflate maintenance costs but also considered defect-prone as inconsistent changes to code duplicates can lead to unexpected behavior. Consequently, the identification of duplicated code, clone detection, has been a very active area of research in recent years. Up to now, however, no substantial investigation of the consequences of code cloning on program correctness has been carried out. To remedy this shortcoming, this paper presents the results of a large-scale case study that was undertaken to find out if inconsistent changes to cloned code can represent faults. For the analyzed commercial and open source systems we not only found that inconsistent changes to clones are very frequent but also identified a significant number of faults induced by such changes. The clone detection tool used in the case study implements a novel algorithm for the detection of inconsistent clones, available as open source.

Back to top

Does Distributed Development Affect Software Quality? An Empirical Case Study of Windows Vista (Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald Gall, Brendan Murphy) Winner of ACM SIGSOFT Distinguished Papers Award

Authors: Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald Gall, Brendan Murphy
Presenting at: TBD

Abstract: It is believed that distributed software development is more challenging than collocated development.  Literature on distributed development in software engineering discusses various challenges, including cultural barriers, expertise transfer difficulties, and communication and coordination overhead.  We evaluate this belief by examining the development of Windows Vista and comparing the failures of components that were developed in a distributed fashion with those developed by collocated teams.  We found a negligible difference in failures.  This difference becomes even less significant when controlling for the number of developers working on a binary.  We also examine component characteristics such as code churn, complexity, dependency information, and test coverage to investigate if less complex components are distributed and find little difference between distributed and collocated components.  Further, we examine the software process used during the Vista development cycle and present ways in which the development process utilized may be insensitive to geography by mitigating issues.

Back to top

Effective Static Deadlock Detection (Mayur Naik, Chang-Seo Park, Koushik Sen, David Gay) Winner of ACM SIGSOFT Distinguished Papers Award

Link to poster and additional post-conference resources!

Authors: Mayur Naik, Chang-Seo Park, Koushik Sen, David Gay
Presenting at: TBD

Abstract: We present an effective static deadlock detection algorithm for Java.  Our algorithm uses a novel combination of static analyses each of which approximates a different necessary condition for a deadlock.  We have implemented the algorithm and report upon our experience applying it to a suite of multi-threaded Java programs.  While neither sound nor complete, our approach is effective in practice, finding all known deadlocks as well as discovering previously unknown ones in our benchmarks with few false alarms.

Back to top

Equality and Hashing for (almost) Free: Generating Implementations from Abstraction Functions (Derek Rayside, Zev Benjamin, Rishabh Singh, Joseph P. Near, Aleksandar Milicevic, Daniel Jackson)

Authors: Derek Rayside, Zev Benjamin, Rishabh Singh, Joseph P. Near, Aleksandar Milicevic, Daniel Jackson
Presenting at: TBD

Abstract: In an object-oriented language such as Java, every class requires implementations of two special methods, one for determining equality and one for computing hash codes. Although the basic specification of these methods is usually straightforward, they can be hard to code (due to subclassing, wrapping, delegation and other factors) and often harbor subtle bugs. A technique is presented that simplifies this task. Instead of writing code for the methods, the programmer gives, as a brief annotation, an abstraction function that defines an abstract view of an object's representation, and sometimes an additional observer in the form of an iterator method. Equality and hash codes are then computed in library code that uses reflection to read the annotations. Experiments on a variety of programs suggest that, in comparison to writing the methods by hand, this approach requires less text from the programmer and results in methods that are more often correct.

Back to top

FeatureHouse: Language-Independent, Automated Software Composition (Sven Apel, Christian Kaestner, Christian Lengauer)

Link to poster and additional post-conference resources!

Authors: Sven Apel, Christian Kaestner, Christian Lengauer
Presenting at: TBD

Abstract: Superimposition is a composition technique that has been applied successfully in many areas of software development. Although superimposition is a general approach, it has been (re)invented and implemented individually for various kinds of software artifacts. We unify languages and tools that rely on superimposition by using the language-independent model of feature structure trees (FSTs). On the basis of the FST model, we propose a general approach to the composition of software artifacts written in different languages, Furthermore, we offer a supporting framework and tool chain, called FeatureHouse. We use attribute grammars to automate the integration of additional languages, in particular, we have integrated Java, C#, C, Haskell, JavaCC, and XML. Several case studies demonstrate the practicality and scalability of our approach and reveal insights in the properties a language must have in order to be ready for superimposition.

Back to top

FlexSync: An aspect-oriented approach to Java synchronization (Charles Zhang)

Authors: Charles Zhang
Presenting at: TBD

Abstract: Designers of concurrent programs are faced with many choices of synchronization mechanisms, among which clear functional trade-offs exist. Making synchronization customizable is highly desirable as different deployment scenarios of the same program often place different precedences on synchronization choices. Unfortunately, such customizations cannot be accomplished in the conventional non-modular implementation of synchronization. To enable customizability, we present FlexSync, an aspect oriented synchronization library, to enable the modular reasoning of synchronization and to resolve the coupling between synchronization intentions and mechanisms in Java systems. With FlexSync, programming synchronization is largely declarative. Complex Java systems can simultaneously work with multiple synchronization mechanisms without any code changes. The FlexSync load-time weaver performs deployment time optimizations and ensures these synchronization mechanisms interact with each other and with the core system consistently. We evaluated FlexSync on commercially used complex Java systems and observed significant speedups as a result of the deployment-specific customization.

Back to top

HOLMES: Effective Statistical Debugging via Efficient Path Profiling (Trishul Chilimbi, Ben Liblit, Krishna Mehra, Aditya Nori, Kapil Vaswani)

Authors: Trishul Chilimbi, Ben Liblit, Krishna Mehra, Aditya Nori, Kapil Vaswani
Presenting at: TBD

Abstract: Statistical debugging aims to automate the process of isolating bugs by profiling several runs of the program and using statistical analysis to pinpoint the likely cause of failure. In this paper, we investigate the impact of using richer program profiles such as path profiles on the effectiveness of bug isolation. We describe a statistical debugging tool called Holmes that isolates bugs by finding paths that correlate with failure. We also present an adaptive version of Holmes that uses iterative, bug-directed profiling to lower execution time and space overheads. We evaluate Holmes using programs from the SIR benchmark suite and some large real world applications. Our results indicate that path profiles can help isolate bugs more precisely by providing more information about the context in which bugs occur. Moreover, bug-directed profiling can efficiently isolate bugs with low overheads, providing a scalable and accurate alternative to sparse random sampling.

Back to top

How Tagging helps bridge the Gap between Social and Technical Aspects in Software Development (Christoph Treude, Margaret-Anne Storey)

Authors: Christoph Treude, Margaret-Anne Storey
Presenting at: TBD

Abstract: Empirical research on collaborative software development practices indicates that technical and social aspects of software development are often intertwined. The processes followed are tacit and constantly evolving, thus not all of them are amenable to formal tool support. In this paper, we explore how "tagging", a lightweight social computing mechanism, is used to bridge the gap between technical and social aspects of managing work items. We present the results from an empirical study on how tagging has been adopted and adapted over the past two years of a large project with 175 developers. Our research shows that the tagging mechanism was eagerly adopted by the team, and that it has become a significant part of many informal processes. Our findings indicate that lightweight informal tool support, prevalent in the social computing domain, may play an important role in improving team-based software development practices.

Back to top

How We Refactor, and How We Know It (Emerson Murphy-Hill, Chris Parnin, Andrew Black) Winner of ACM SIGSOFT Distinguished Papers Award

Authors: Emerson Murphy-Hill, Chris Parnin, Andrew Black
Presenting at: TBD

Abstract: What we know about how programmers refactor comes largely from case studies. This is a problem because conclusions drawn from specific cases may not hold in general. We argue that putting refactoring-tool research on a sound scientific basis requires more than case studies: it requires empirical studies of large groups of programmers in the wild. In this paper we examine four data sets spanning more than 10000 developers, 60000 tool-assisted refactorings, 2500 developer hours, and 9000 source file revisions. Using this data, we cast doubt on several previously stated conclusions about how programmers refactor, while validating others. For example, we find that programmers frequently do not indicate refactoring activity in commit logs, which is at variance with the assumption made by several other researchers. However, our findings support other assumptions made in the literature, such as the assumption that programmers intersperse refactoring with other program changes.

Back to top

How to Avoid Drastic Software Process Change (using Stochastic Stability) (Tim Menzies, Steve Williams, Oussama El-rawas, Barry Boehm, Jairus Hihn)

Link to poster and additional post-conference resources!

Authors: Tim Menzies, Steve Williams, Oussama El-rawas, Barry Boehm, Jairus Hihn
Presenting at: TBD

Abstract: Before performing "drastic changes" to a project,  it is  worthwhile to thoroughly explore the available options within the current structure of a project.  An alternative to drastic change are "multiple minor changes" that adjust current options within a project.   In this paper, we  show  that the effects of numerous minor changes can out-weigh the effects of  drastic changes.  That is,  the benefits of drastic change can often be achieved without disrupting a project. The key to our technique is a novel stochastic stability technique that:
  • considers a very large set of minor changes; and
  • carefully selects the right combination of effective minor changes.
Our results show project managers have more project improvement options than they currently realize.  This result should be welcome news to  managers struggling to maintain control and continuity over  their project in the face of multiple demands for drastic change.

Back to top

Improving API Documentation Usability with Knowledge Pushing (Uri Dekel, James D. Herbsleb)

Authors: Uri Dekel, James D. Herbsleb
Presenting at: TBD

Abstract: While a crucial role of API documentation is to convey important usage directives to authors and maintainers of client code, some may be missed due to the narrative's focus on specifications. This risk is increased by high fan-outs and polymorphism, which make it impractical to thoroughly search every invocation target. A lack of awareness of these directives may result in errors and in maintenance difficulties. We address these concerns for Java developers with a framework that manages tagged directives and other artifact knowledge. Our Eclipse integration decorates method invocations whose targets contain directives in order to help readers identify methods that are more (or less) likely to warrant further investigation, which we aid by including the tagged knowledge in the JavaDoc hover. We annotated significant parts of several core APIs, and present results from a lab study demonstrating the potential benefits of our approach.

Back to top

In-Field Healing of Integration Problems with COTS Components (Hervè Chang, Leonardo Mariani, Mauro Pezzè)

Link to poster and additional post-conference resources!

Authors: Hervè Chang, Leonardo Mariani, Mauro Pezzè
Presenting at: TBD

Abstract: Frequently developers integrate complex COTS frameworks and components in software applications. COTS products are often only partially documented, and developers may misuse technology and introduce integration faults, as witnessed by the many entries in fault repositories. Once identified, common integration problems are usually documented in forums and fault repositories on the Web, but this does not prevent them to occur in the field when COTS products are reused. In this paper, we propose a methodology and a self-healing technology that can reduce the occurrence of in-field failures caused by common integration problems that are identified and documented by COTS developers when COTS products are in use. Our methodology supports COTS developers in producing healing connectors for common misuses of COTS products. Our technology produces information that facilitate debugging and patching of applications that use COTS products. Application developers inject healing connectors into their systems to automatically repair problems caused by misuses of COTS products. Healing takes place at run-time, on-the-fly and in-the-field. The activity of healing connectors is traced in log files, to facilitate debugging and patching of integration problems. Empirical experiences with several applications and COTS products show the feasibility of the approach and the efficiency of the technology.

Back to top

Invariant-Based Automatic Testing of AJAX User Interfaces (Ali Mesbah, Arie van Deursen) Winner of ACM SIGSOFT Distinguished Papers Award

Authors: Ali Mesbah, Arie van Deursen
Presenting at: TBD

Abstract: AJAX-based Web 2.0 applications rely on asynchronous communication, a stateful client, and client-side run-time manipulation of the DOM tree. This not only makes them fundamentally different from traditional web applications, but also more error-prone and harder to test. We propose a method for testing AJAX applications automatically. It is based on a crawler to infer a flow graph for all (client-side) user interface states. We identify AJAX specific faults that can occur in such states (related to DOM validity, error messages, discoverability, back-button compatibility, etc.) as well as DOM-tree invariants that serve as oracle to detect such faults. We implemented our approach in ATUSA, a tool offering generic invariant checking components, a plugin-mechanism to add application-specific state validators, and generation of a test suite covering the paths obtained during crawling. We describe two studies evaluating the fault revealing capabilities, scalability, manual effort and level of automation of our approach.

Back to top

Learning Operational Requirements from Goal Models (Dalal Alrajeh, Jeffery Kramer, Alessandra Russo, Sebastián Uchitel)

Authors: Dalal Alrajeh, Jeffery Kramer, Alessandra Russo, Sebastián Uchitel
Presenting at: TBD

Abstract: Goal-oriented methods have increasingly been recognised as an effective means for eliciting, elaborating, analysing and  specifying software requirements. A key activity in these approaches is the derivation of a correct and complete set of operational requirements in the form of pre- and trigger-conditions from system-level goals. Few existing approaches support for this crucial task and rely on significant effort and expertise of the engineer. In this paper we propose a tool-based approach which integrates model checking and machine learning for eliciting and elaborating a set of operational requirements in the form of pre- and trigger conditions from goals based on iterative process in which the engineer is simply required to identify  positive and negative scenarios and to pick from proposed requirements.

Back to top

License Integration Patterns: Addressing Licenses Mismatches in Component-Based Development (Daniel M. German, Ahmed E. Hassan)

Authors: Daniel M. German, Ahmed E. Hassan
Presenting at: TBD

Abstract: In this paper we address the problem of combining software components with different and possibly incompatible legal licenses to create a software application that does not violate any of these licenses while potentially having its own. We call this problem the "license mismatch" problem. The rapid growth and availability of Open Source Software (OSS) components with varying licenses, and the existence of more than 70 OSS licenses increases the complexity of this problem. Based on a study of 124 OSS software packages, we developed a model which describes the interconnection of components in these packages from a legal point of view. We used our model to document integration patterns that are commonly used to solve the license mismatch problem in practice when creating both proprietary and OSS applications. Software engineers with little legal expertise could use these documented patterns to understand and address the legal issues involved in reusing components with different and possibly conflicting licenses.

Back to top

Lightweight Fault-Localization Using Multiple Coverage Types (Raul Santelices, James A. Jones, Yanbing Yu, Mary Jean Harrold)

Link to poster and additional post-conference resources!

Authors: Raul Santelices, James A. Jones, Yanbing Yu, Mary Jean Harrold
Presenting at: TBD

Abstract: Lightweight fault-localization techniques use program coverage to isolate the parts of the code that are most suspicious of being faulty. In this paper, we present the results of a study of three types of program coverage---statements, branches, and data dependencies---to compare their effectiveness in localizing faults. The study shows that no single coverage type performs best for all faults---different kinds of faults are best localized by different coverage types. Based on these results, we present a new coverage-based approach to fault localization that leverages the unique qualities of each coverage type by combining them. Because data dependencies are noticeably more expensive to monitor than branches, we also investigate the effects of replacing data-dependence coverage with an approximation inferred from branch coverage. Our empirical results show that (1) the cost of fault localization using combinations of coverage is less than using any individual coverage type and closer to the best case (without knowing in advance which kinds of faults are present), and (2) using inferred data-dependence coverage retains most of the benefits of combinations.

Back to top

Listening to Programmers - Taxonomies and Characteristics of Comments in Operating System Code (Yoann Padioleau, Lin Tan, YuanYuan Zhou)

Authors: Yoann Padioleau, Lin Tan, YuanYuan Zhou
Presenting at: TBD

Abstract: To avoid or detect bugs, innovations from multiple directions have been proposed. Unfortunately, many of these innovations are not fully exploited by programmers. To bridge the gap, this paper proposes a new approach to ``listen'' to thousands of programmers: studying their programming comments, which can provide guidance (1) for language/tool designers on where they should develop new techniques or enhance the usability of existing ones, and (2) for programmers on what problems are most important so that they should take initiatives to adopt some existing tools. We studied 1050 random comments from the latest versions of Linux, FreeBSD, and OpenSolaris. We found that 52.6% of these comments could be leveraged by existing or to-be-proposed tools for improving reliability. Our findings include: (1) programmers abuse integers and macros, (2) many comments describe code relationships and code evolutions, and (3) many comments about concurrency issues are not well supported by annotation languages.

Back to top

Locating Need-to-Translate Constant Strings for Software Internationalization (Xiaoyin Wang, Lu Zhang, Tao Xie, Hong Mei, Jiasu Sun)

Authors: Xiaoyin Wang, Lu Zhang, Tao Xie, Hong Mei, Jiasu Sun
Presenting at: TBD

Abstract: Modern software requires internationalization to be distributed to different regions of the world. To internationalize an existing software application, developers need to externalize some hard-coded constant strings to resource files, so that translators can translate the software to a local language without revising its source code. Since not all the constant strings require externalization, locating need-to-translate constant strings is a necessary task that developers must complete. In this paper, we present an approach to automatically locating these strings. Our approach first collects a list of GUI-related API methods, and then searches for need-to-translate constant strings from these API methods. We implemented our approach as an Eclipse plug-in, and evaluated our approach on four real-world open source applications: Megamek, ArtOfIllusion, RText and Risk. The results show that our approach can effectively locate most of the need-to-translate constant strings in all the four applications.

Back to top

MINTS: A General Framework and Tool for Supporting Test-Suite Minimization (Hwa-You Hsu, Alessandro Orso)

Link to poster and additional post-conference resources!

Authors: Hwa-You Hsu, Alessandro Orso
Presenting at: TBD

Abstract: Test-suite minimization techniques aim to eliminate redundant test cases from a test-suite based on some criteria, such as coverage or fault-detection capability.  Most existing test-suite minimization techniques have two main limitations: they perform minimization based on a single criterion and produce suboptimal solutions. In this paper, we propose a test-suite minimization framework that overcomes these limitations by allowing testers to (1) easily encode a wide spectrum of test-suite minimization problems, (2) handle problems that involve any number of criteria, and (3) compute optimal solutions by leveraging modern integer linear programming solvers. We implemented our framework in a tool, called MINTS, that is freely-available and can be interfaced with a number of different state-of-the-art solvers. Our empirical evaluation shows that MINTS can be used to instantiate a number of different test-suite minimization problems and efficiently find an optimal solution for such problems using different solvers.

Back to top

Maintaining and Evolving GUI-Directed Test Scripts (Mark Grechanik, Qing Xie, Chen Fu)

Authors: Mark Grechanik, Qing Xie, Chen Fu
Presenting at: TBD

Abstract: Since manual black-box testing of GUI-based APplications (GAPs) is tedious and laborious, test engineers create test scripts to automate the testing process. These test scripts interact with GAPs by performing actions on their GUI objects. An extra effort that test engineers put in writing test scripts is paid off when these scripts are run repeatedly. Unfortunately, releasing new versions of GAPs with modified GUIs breaks their corresponding test scripts thereby obliterating benefits of test automation. We offer a novel approach for maintaining and evolving test scripts so that they can test new versions of their respective GAPs. We implemented our approach and conducted a case study with 45 professional programmers and testers to evaluate it. The results show with strong statistical significance that users find more failures and report fewer false positives (p<0.02) in test scripts with our approach than with a flagship industry product and a baseline manual approach.

Back to top

Mining Exception-Handling Rules as Sequence Association Rules (Suresh Thummalapenta, Tao Xie)

Link to poster and additional post-conference resources!

Authors: Suresh Thummalapenta, Tao Xie
Presenting at: TBD

Abstract: Programming languages such as Java and C++ provide exception-handling constructs that handle exception conditions. Applications are expected to handle these exception conditions and take necessary recovery actions. In this paper, we propose an approach that mines exception-handling rules, which describe expected behavior when exceptions occur. Existing mining approaches mine association rules of the form "FCA -> FCB", which describe that "FCA" should be followed by "FCB" in all paths. we develop the first novel mining algorithm to mine conditional association rules of the form "(FCC1...FCCn) & FCA -> (FCE1...FCEn)", which describe that "FCA" should be followed by "FCE1...FCEn" only when preceded by "FCC1...FCCn". Such form of rules is required to characterize common exception-handling rules. Our evaluation results show that our approach mines 294 real exception-handling rules in five benchmark applications and also detects 87 defects that are not found by a previous related approach.

Back to top

Model Evolution by Runtime Adaptation (Ilenia Epifani, Carlo Ghezzi, Raffaela Mirandola, Giordano Tamburrelli)

Link to poster and additional post-conference resources!

Authors: Ilenia Epifani, Carlo Ghezzi, Raffaela Mirandola, Giordano Tamburrelli
Presenting at: TBD

Abstract: Models help software engineers reason about  design-time decisions before implementing a system. This paper focuses on models that deal with non-functional properties, such as reliability. To build such models, one must rely on numerical estimates of various parameters provided, for example, by domain experts. Unfortunately, estimates are seldom correct. Moreover, in dynamic environments, the value of parameters changes over time. We discuss an approach that addresses this issue by updating models at runtime through a Bayesian estimator that exploits data collected from the running system. Updated models provide an increasingly better representation of the system. By analyzing them at runtime, it is possible to detect or predict if a property is, or will be, violated. Requirement violations trigger reconfigurations aimed at guarantee the desired goals. We illustrate a framework supporting our methodology and a case study in which a Web service composition is modeled through a Discrete Time Markov Chain.

Back to top

Modular String-Sensitive Permission Analysis with Demand-Driven Precision (Emmanuel Geay, Marco Pistoia, Takaaki Tateishi, Barbara Ryder, Julian Dolby)

Authors: Emmanuel Geay, Marco Pistoia, Takaaki Tateishi, Barbara Ryder, Julian Dolby
Presenting at: TBD

Abstract: In modern software systems, programs are obtained by dynamically assembling components.  This has made it necessary to subject component providers to access-control restrictions.  What permissions should be granted to each component?  Too few permissions may cause run-time authorization failures, too many constitute a security hole.  We have designed and implemented a composite algorithm for precise static permission analysis for Java and the CLR.  Unlike previous work, the analysis is modular and fully integrated with a novel slicing-based string analysis that is used to statically compute the string values defining a permission and disambiguate permission propagation paths. The results of our research prototype on production-level Java code support the effectiveness, practicality, and precision of our techniques, and show outstanding improvement over previous work.

Back to top

Predicting Build Failures Using Social Network Analysis on Developer Communication (Timo Wolf, Adrian Schröter, Daniela Damian, Thanh Nguyen)

Link to poster and additional post-conference resources!

Authors: Timo Wolf, Adrian Schröter, Daniela Damian, Thanh Nguyen
Presenting at: TBD

Abstract: A critical factor in work group coordination, communication has been studied extensively. Yet, we are missing objective evidence of the relationship between successful coordination outcome and communication structures. Using data from IBM's Jazz project, we study communication structures of development teams with high coordination needs. We conceptualize coordination outcome by the result of their code integration build processes (successful or failed) and study team communication structures with social network measures. Our results indicate that developer communication plays an important role in the quality of software integrations. Although we found that no individual measure could indicate whether a build will fail or succeed, we leveraged the combination of communication structure measures into a predictive model that indicates whether an integration will fail. When used for five project teams, our predictive model yielded recall values between 55% and 75%, and precision values between 50% to 76%.

Back to top

Predicting Faults Using the Complexity of Code Changes (Ahmed E. Hassan)

Authors: Ahmed E. Hassan
Presenting at: TBD

Abstract: Predicting the incidence of faults in code has been commonly associated with measuring complexity. In this paper, we propose complexity metrics that are based on the code change process instead of on the code. We conjecture that a complex change process negatively affects its product, i.e., the software system.  We  validate our hypothesis empirically through a case study using data derived from the change history for six large open source projects. Our case study shows that our change complexity metrics are better predictors of fault potential in comparison to other well known historical predictors of faults, i.e., prior  modifications and prior faults.

Back to top

Reasoning About Edits to Feature Models (Thomas Thüm, Don Batory, Christian Kästner)

Link to poster and additional post-conference resources!

Authors: Thomas Thüm, Don Batory, Christian Kästner
Presenting at: TBD

Abstract: Features express the variabilities and commonalities among programs in a software product line (SPL). A feature model defines the valid combinations of features, where each combination corresponds to a program in an SPL. SPLs and their feature models evolve over time. We classify the evolution of a feature model via modifications as refactorings, specializations, generalizations, or arbitrary edits. We present an algorithm to reason about feature model edits to help designers determine how the program membership of an SPL has changed. Our algorithm takes two feature models as input (before and after edit versions), where the set of features in both models are not necessarily the same, and it determines the change classification. Our algorithm efficiently determines classifications of edits to models that have hundreds or thousands of features.

Back to top

Refactoring Sequential Java Code for Concurrency (Danny Dig, John Marrero, Michael Ernst)

Authors: Danny Dig, John Marrero, Michael Ernst
Presenting at: TBD

Abstract: Parallelizing existing sequential programs to run efficiently on multicores is hard. Java 5's java.util.concurrent (j.u.c.) package supports writing concurrent programs: the complexity of writing threads-safe and scalable programs is hidden in the library. To use this package, programmers need to reengineer existing code. This is tedious because it requires changing many lines of code, is error-prone because programmers can use the wrong APIs, and is omission-prone because programmers can miss opportunities to use the enhanced APIs. This paper presents our tool, Concurrencer, which enables programmers to refactor sequential code into parallel code that uses the j.u.c. package. Concurrencer does not require any program annotations, although the transformations are involved: they span over several program statements and use custom program analysis. A find-and-replace tool cannot perform such transformations. Empirical evaluation shows that Concurrencer is effective: Concurrencer correctly applies transformations that open-source developers overlooked, and the converted code exhibits good speedup.

Back to top

Safe-Commit Analysis to Facilitate Team Software Development (Jan Wloka, Barbara Ryder, Frank Tip, Xiaoxia Ren)

Link to poster and additional post-conference resources!

Authors: Jan Wloka, Barbara Ryder, Frank Tip, Xiaoxia Ren
Presenting at: TBD

Abstract: Software development teams exchange source code in shared repositories. These repositories are kept consistent by having developers follow a commit policy, such as "Program edits can be committed only if all available tests succeed". Such policies may result in long intervals between commits, increasing the likelihood of duplicative development and merge conflicts.  Furthermore, commit policies are generally not automatically enforceable. We present an analysis-based algorithm to identify committable changes that can be released early, without causing failures of existing tests, even in the presence of failing tests in a developer's local workspace!  The algorithm can support relaxed commit policies that allow early release of changes, reducing the potential for merge conflicts.  In experiments using several versions of Daikon with failing tests, 3 newly enabled commit policies were shown to allow a significant percentage of changes to be committed.

Back to top

Semantics-Based Code Search (Steven P. Reiss)

Link to poster and additional post-conference resources!

Authors: Steven P. Reiss
Presenting at: TBD

Abstract: Our goal is to use available open source code to generate code that meet a user's specifications. The key words here are specifications and generate. Our framework lets users precisely specify what they are searching for using keywords, signatures, test cases, contracts, and security constraints.  It uses an open set of program transformations to map the original code into exactly what the user wants. These transforms range from the simple, for example changing the name of a method to match the name given by the user, to the complex, for example finding a subset of the statements of a method that compute a value of the desired target type given only values of the parameter types. This approach is implemented in a prototype system for Java with a web interface.

Back to top

Succession: Measuring Transfer of Code and Developer Productivity (Audris Mockus)

Authors: Audris Mockus
Presenting at: TBD

Abstract: Code ownership transfer or succession is a crucial ingredient in open source code reuse and in offshoring projects. Measuring succession can help understand factors that affect the success of such transfers and suggest ways to make them more efficient. We propose and evaluate several methods to measure succession based on the chronology and traces of developer activities.  Based on ten instances of offshoring succession identified through interviews, we find that the best succession measure can accurately pinpoint the most likely mentors. We model the productivity ratio of more than 1000 developer pairs involved in the succession to test conjectures based on the organizational socialization theory and find the ratio to decrease for instances of offshoring and for mentors who have worked primarily on a single project or have transferred ownership for their non-primary project code, thus supporting a theory-based conjectures and providing practical suggestions on how to improve succession.

Back to top

Synthesizing Intensional Behavior Models by Graph Transformation (Carlo Ghezzi, Andrea Mocci, Mattia Monga)

Link to poster and additional post-conference resources!

Authors: Carlo Ghezzi, Andrea Mocci, Mattia Monga
Presenting at: TBD

Abstract: This paper describes an approach (SPY) to recovering the specification of a software component from the observation of its run-time behavior. It focuses on components that behave as data abstractions. Components are assumed to be black boxes that do not allow any implementation inspection. The inferred description may help understand what the component does when no formal specification is available. SPY works in two main stages. First, it builds a deterministic finite-state machine that models the partial behavior of instances of the data abstraction. This is then generalized via graph transformation rules. The rules can generate a possibly infinite number of behavior models, which generalize the description of the data abstraction under an assumption of "regularity" with respect to the observed behavior. The rules can be viewed as a likely specification of the data abstraction.  We illustrate how SPY works on relevant examples and we compare it with competing methods.

Back to top

Taint-Based Directed Whitebox Fuzzing (Vijay Ganesh, Tim Leek, Martin Rinard)

Authors: Vijay Ganesh, Tim Leek, Martin Rinard
Presenting at: TBD

Abstract: We present a new automated white box fuzzing technique and a tool, BuzzFuzz, that implements this technique. Unlike standard fuzzing techniques, which randomly change parts of the input file with little or no information about the underlying syntactic structure of the file, BuzzFuzz uses dynamic taint tracing to automatically locate regions of original seed input files that influence values used at key program attack points (points where the program may contain an error). BuzzFuzz then automatically generates new fuzzed test input files by fuzzing these identified regions of the original seed input files. Because these new test files typically preserve the underlying syntactic structure of the original seed input files, they make it past the initial input parsing components to exercise code deep within the semantic core of the computation. We have used BuzzFuzz to automatically find errors in two open-source applications: Swfdec (an Adobe Flash player) and MuPDF (a PDF viewer). Our results indicate that our new directed fuzzing technique can effectively expose errors located deep within large applications. Because the directed fuzzing technique uses the taint information to automatically discover and exploit information about the input file format, it is especially appropriate for testing applications that have complex, highly structured input file formats.

Back to top

Taming Coincidental Correctness: Coverage Refinement with Context Patterns to Improve Fault Localization (Xinming Wang, S.C. Cheung, W.K. Chan, Zhenyu Zhang)

Authors: Xinming Wang, S.C. Cheung, W.K. Chan, Zhenyu Zhang
Presenting at: TBD

Abstract: Recent techniques for fault localization leverage code coverage to address the high cost problem of debugging. These techniques exploit the correlations between pro-gram failures and the coverage of program entities as the clue in locating faults. Experimental evidence shows that the effectiveness of these techniques can be affected ad-versely by coincidental correctness, which occurs when a fault is executed but no failure is detected. In this paper, we propose an approach to address this problem. We refine code coverage of test runs using control- and data- flow patterns prescribed by different fault types. We con-jecture that this extra information, which we call context patterns, can strengthen the correlations between pro-gram failures and the coverage of faulty program entities, making it easier for fault localization techniques to locate the faults. To evaluate the proposed approach, we have conducted a mutation analysis on three real world programs and cross-validated the results with real faults. The experimental results consistently show that coverage refinement is effective in easing the coincidental correctness problem in fault localization techniques.

Back to top

Taming Dynamically Adaptive Systems Using Models and Aspects (Brice Morin, Olivier Barais, Gregory Nain, Jean-Marc Jézéquel)

Link to poster and additional post-conference resources!

Authors: Brice Morin, Olivier Barais, Gregory Nain, Jean-Marc Jézéquel
Presenting at: TBD

Abstract: Since software systems need to be continuously available under varying conditions, their ability to evolve at runtime is increasingly seen as one key issue.  Modern programming frameworks already provide support for dynamic adaptations. However the high-variability of features in Dynamic Adaptive Systems (DAS) introduces an explosion of possible runtime system configurations (often called modes) and mode transitions. Designing these configurations and their transitions is tedious and error-prone, making the system feature evolution difficult. While Aspect-Oriented Modeling (AOM) was introduced to  improve the modularity of software, this paper presents how an AOM approach can be used to tame the combinatorial explosion of DAS modes. Using AOM techniques, we derive a wide range of modes by weaving aspects into  an explicit model reflecting the runtime system. We use these generated modes to automatically adapt the system. We validate our approach on a schizophrenic middleware for home-automation currently deployed in Rennes metropolis.

Back to top

Tesseract: Interactive Visual Exploration of Socio-Technical Relationships in Software Development (Anita Sarma, Larry Maccherone, Patrick Wagstrom, James Herbsleb)

Link to poster and additional post-conference resources!

Authors: Anita Sarma, Larry Maccherone, Patrick Wagstrom, James Herbsleb
Presenting at: TBD

Abstract: Software developers have long known that project success requires a robust understanding of both technical and social linkages. However, research has largely considered these independently. Research on networks of technical artifacts focuses on techniques like code analysis or mining project archives. Social network analysis has been used to capture information about relations among people. Yet, each type of information is often far more useful when combined, as when the “goodness” of social networks is judged by the patterns of dependencies in the technical artifacts. To bring such information together, we have developed Tesseract, a socio-technical dependency browser that utilizes cross-linked displays to enable exploration of the myriad relationships between artifacts, developers, bugs, and communications. We evaluated Tesseract by (1) demonstrating its feasibility with GNOME project data (2) assessing its usability via informal user evaluations, and (3) verifying its suitability for the open source community via semi-structured interviews.

Back to top

The Impact of Process Choice in High Maturity Environments: An Empirical Analysis (Narayan Ramasubbu, Rajesh Balan)

Authors: Narayan Ramasubbu, Rajesh Balan
Presenting at: TBD

Abstract: We present the results of a three year field study of the software development process choices made by project teams at two leading offshore vendors. In particular, we focus on the performance implications of project teams that chose to augment structured, plan-driven processes to implement the CMM level-5 Key Process Areas (KPAs) with agile methods.  Our analysis of 112 software projects reveals that the decision to augment the firm-recommended, plan-driven approach with improvised, agile methods was significantly affected by the extent of client knowledge and involvement, newness of technology, and the project size. Furthermore this decision had a significant and mostly positive impact on project performance indicators such as reuse, rework, defect density, and productivity.

Back to top

The Road Not Taken: Estimating Path Execution Frequency Statically (Raymond P.L. Buse, Westley R. Weimer)

Link to poster and additional post-conference resources!

Authors: Raymond P.L. Buse, Westley R. Weimer
Presenting at: TBD

Abstract: A variety of compilers, static analyses, and testing frameworks rely heavily on path frequency information. Uses for such information range from optimizing transformations to bug finding. Path frequencies are typically obtained through profiling, but that approach is severely restricted: it requires running programs in an indicative environment, and on indicative test inputs. We present a descriptive statistical model of path frequency based on features that can be readily obtained from a program's source code.  Our model is over 90% accurate with respect to several benchmarks, and is sufficient for selecting the 5% of paths that account for over half of a program's total runtime. We demonstrate our technique's robustness by measuring its performance as a static branch predictor, finding it to be more accurate than previous approaches on average. Finally, our qualitative analysis of the model provides insight into which source-level features indicate "hot paths."

Back to top

The Secret Life of Bugs: Going Past the Errors and Omissions in Software Repositories (Jorge Aranda, Gina Venolia)

Authors: Jorge Aranda, Gina Venolia
Presenting at: TBD

Abstract: Every bug has a story behind it. The people that discover and resolve it need to coordinate, to get information from documents, tools, or other people, and to navigate through issues of accountability, ownership, and organizational structure. This paper reports on a field study of coordination activities around bug fixing that used a combination of case study research and a survey of software professionals. Results show that the histories of even simple bugs are strongly dependent on social, organizational, and technical knowledge that cannot be solely extracted through automation of electronic repositories, and that such automation provides incomplete and often erroneous accounts of coordination. The paper uses rich bug histories and survey results to identify common bug fixing coordination patterns and to provide implications for tool designers and researchers of coordination in software development.

Back to top

Using Quantitative Analysis to Implement Autonomic IT Systems (Radu Calinescu, Marta Kwiatkowska)

Link to poster and additional post-conference resources!

Authors: Radu Calinescu, Marta Kwiatkowska
Presenting at: TBD

Abstract: The software underpinning today's IT systems needs to adapt dynamically and predictably to rapid changes in system workload, environment and objectives. We describe a software framework that achieves such adaptiveness for IT systems whose components can be modelled as Markov chains. The framework comprises (i) an autonomic architecture that uses Markov-chain quantitative analysis to dynamically adjust the parameters of an IT system in line with its state, environment and objectives; and (ii) a method for developing instances of this architecture for real-world systems. Two case studies are presented that use the framework successfully for the dynamic power management of disk drives, and for the adaptive management of cluster availability within data centres, respectively.

Back to top

Validation of Contracts using Enabledness Preserving Finite State Abstractions (Guido de Caso, Víctor Braberman, Diego Garbervetsky, Sebastián Uchitel)

Link to poster and additional post-conference resources!

Authors: Guido de Caso, Víctor Braberman, Diego Garbervetsky, Sebastián Uchitel
Presenting at: TBD

Abstract: Pre/post condition-based specifications are common-place in a variety of software engineering activities that range from requirements through to design and implementation. The fragmented nature of these specifications can hinder validation as it is difficult to understand if the specifications for the various operations fit together well. In this paper we propose a novel technique for automatically constructing abstractions in the form of behaviour models from pre/post condition-based specifications. The level of abstraction at which such models are constructed preserves enabledness of sets of operations, resulting in a finite model that is intuitive to validate and which facilitates tracing back to the specification for debugging. The paper also reports on the application of the approach to an industrial strength protocol specification in which concerns were identified.

Back to top

WISE: Automated Test Generation for Worst-Case Complexity (Jacob Burnim, Sudeep Juvekar, Koushik Sen)

Authors: Jacob Burnim, Sudeep Juvekar, Koushik Sen
Presenting at: TBD

Abstract: Program analysis and automated test generation have primarily been used to find correctness bugs.  We present complexity testing, a novel automated test generation technique to find performance bugs. Our complexity testing algorithm, which we call WISE (Worst-case Inputs from Symbolic Execution), operates on a program accepting inputs of arbitrary size.  For each input size, WISE attempts to construct an input which exhibits the worst-case computational complexity of the program.  WISE uses exhaustive test generation for small input sizes and generalizes the result of executing the program on those inputs into an ``input generator.''  The generator is subsequently used to efficiently generate worst-case inputs for larger input sizes.  We have performed experiments to demonstrate the utility of our approach on a set of standard data structures and algorithms. Our results show that WISE can effectively generate worst-case inputs for several of these benchmarks.

Back to top