Stanford Software Seminar

The Stanford Software Seminar is usually held on Mondays in various rooms in the Gates building. Talks are open to anyone.

To subscribe to the seminar mailing list, send email to software-research-join@lists.stanford.edu from the email address you wish to subscribe. Likewise, to unsubscribe, send an email to software-research-leave@lists.stanford.edu from the subscribed email address. In either case, the subject and body of your email will be ignored.

Upcoming Talks
Date Friday March 7, 2:00-3:00
Place Gates 392
Speaker Mooly Sagiv, Tel Aviv University
Title VeriCon: Towards Verifying Controller Programs in Software-Defined Networks
Abstract Software-defined networking (SDN) is a new paradigm for operating and managing computer networks. SDN enables logically-centralized control over network devices through a software ``controller'' program that operates independently from the network hardware.

We present VeriCon, the first system for verifying network-wide invariants of SDN controller programs. VeriCon uses first-order logic to define admissible network topologies and desired network-wide invariants (e.g., routing correctness. correct access control, and consistency of the controller's data structures).

VeriCon either confirms the correctness of the controller program on ALL admissible network topologies or outputs a concrete example that violates an invariant, and so can be used for debugging controller code.

We show that VeriCon, which implements classical Floyd-Hoare-Dijkstra deductive verification, is practical for a large repertoire of controller programs. In addition, as VeriCon is compositional, in the sense that it checks the correctness of each network event independently against the specified invariants, it can scale to handle complex systems.

We view VeriCon as a first step en route to practical mechanisms for verifying network-wide invariants of controller code. This is a joint work with Thomas Ball and Nikolaj Bjorner (MSR), Aaron Gember (Wisc), Shachar Itzhaky (TAU), Aleksandr Karbyshev (TUM), Michael Schapira and Asaf Valdarsky (HUJI)

Speaker Bio
Past Talks
Date Friday Jan 17, 3-4pm
Place Gates 498
Speaker Stavros Aronis, Uppsala University
Title Optimal Dynamic Partial Order Reduction
Abstract Stateless model checking is a powerful technique for program verification, which however suffers from an exponential growth in the number of explored executions. A successful technique for reducing this number, while still maintaining complete coverage, is Dynamic Partial Order Reduction (DPOR). We present a new DPOR algorithm, which is the first to be provably optimal in that it always explores the minimal number of executions. It is based on a novel class of sets, called source sets, which replace the role of persistent sets in previous algorithms. First, we show how to modify an existing DPOR algorithm to work with source sets, resulting in an efficient and simple to implement algorithm. Second, we extend this algorithm with a novel mechanism, called wakeup trees, that allows to achieve optimality. We have implemented both algorithms in a stateless model checking tool for Erlang programs. Experiments show that source sets significantly increase the performance and that wakeup trees incur only a small overhead in both time and space.
Speaker Bio
Date 3:00-4:00, Monday, November 18
Place Gates 415
Speaker Junfeng Yang, Columbia University
Title Determinism Is Not Enough: Making Parallel Programs Reliable with Stable Multithreading
Abstract Our accelerating computational demand and the rise of multicore hardware have made parallel programs, especially shared-memory multithreaded programs, increasingly pervasive and critical. Yet, these programs remain extremely difficult to write, test, analyze, debug, and verify. Conventional wisdom has attributed these difficulties to nondeterminism (i.e., repeated executions of the same program on the same input may show different behaviors), and researchers have recently dedicated much effort to bringing determinism into multithreading. In this talk, I argue that determinism is not as useful as commonly perceived: it is neither sufficient nor necessary for reliability. We present our view on why multithreaded programs are difficult to get right, describe a promising approach we call stable multithreading to dramatically improve reliability, and summarize our last four years research on building and applying stable multithreading systems. (More details are at http://www.cs.columbia.edu/~junfeng/.)
Speaker Bio
Date 3:00-4:00, Monday, December 2
Place Gates 415
Speaker Todd Millstein, UCLA
Title Toward a "Safe" Semantics for Multithreaded Programming Languages
Abstract "Safe" programming languages enforce fundamental language abstractions, thereby providing strong guarantees for all program executions and obviating large classes of subtle and dangerous errors. Modern languages have embraced the compelling programmability benefits of (memory and type) safety despite the additional run-time overhead. Unfortunately, recent work to standardize multithreading semantics in mainstream programming languages is reversing this trend. While a significant improvement over prior informally-specified semantics, the current standards allow a small programming error or omission to violate program safety in ways that are difficult to understand, detect, and correct.

In this talk I will argue that a safe multithreaded programming language should support the simple interleaving semantics of threads known as sequential consistency (SC). I'll then describe the results of our research over the past few years, which challenges the idea that the SC semantics is inconsistent with high performance. Perhaps surprisingly, restricting a modern compiler's optimizations to respect SC introduces minimal runtime overhead. While modern hardware relies upon important optimizations that potentially violate SC, a small extension to such hardware can preserve the SC semantics while retaining the lion's share of the benefit of these optimizations. Further, various factors will conspire to lower the cost of SC on stock hardware in the coming years.

This work is joint with Dan Marino (Symantec Research Labs, formerly UCLA), Abhay Singh (U Michigan, Ann Arbor), Madan Musuvathi (Microsoft Research, Redmond), and Satish Narayanasamy (U Michigan, Ann Arbor).

Speaker Bio Todd Millstein is an Associate Professor in the Computer Science Department at the University of California, Los Angeles. Todd received his Ph.D. and M.S. from the University of Washington and his A.B. from Brown University, all in Computer Science. Todd received an NSF CAREER award in 2006, an IBM Faculty Award in 2008, an ACM SIGPLAN Most Influential PLDI Paper Award in 2011, and an IEEE Micro Top Picks selection in 2012.
Date 2:00-3:00, Monday, September 16
Place 104 Gates
Speaker Mohsen Lesani, UCLA
Title On Testing and Verification of Transactional Memory Algorithms
Abstract A transactional memory (TM) is an object composed of a set of base objects such as registers and locks. The safety of the TM algorithm is the result of the safety of the composed base objects and the logic of the algorithm. We define a language that captures the type of the base objects and the algorithm. We define a history semantics for the language that characterizes the set of histories that a program can result. Based on the history semantics, we propose techniques for both testing and verification of TM algorithms. First, we identify two problems that lead to violation of opacity, a safety criterion for TM. We present an automatic tool that given a violating history, finds program traces that result in that history. We show that the well-known TM algorithms DSTM and McRT don't satisfy opacity. DSTM suffers from a write-skew anomaly, while McRT suffers from a write-exposure anomaly. Second, we present a program logic with novel propositions about execution order and linearization orders of base objects. We prove that our inference rules are sound i.e. if we can derive that a program satisfies a property, then every history of the program satisfies the property. Our logic is composable as it can be augmented with new inference rules to support reasoning about new object types. We have used our logic to prove that TL2 TM algorithm satisfies opacity. We are formalizing our logic and proofs in PVS.
Speaker Bio Mohsen Lesani is a Phd candidate at UCLA advised by professor Palsberg. He has research experience with IBM Research, Oracle (Sun) Labs, HP Labs and EPFL. He is interested in the design, implementation, testing and verification of synchronization algorithms.
Date 4:00-5:00, Tuesday, May 28
Place 415 Gates
Speaker Vijay Ganesh
Title SMT Solvers for Software Reliability and Security
Abstract SMT solvers increasingly play a central role in the construction of reliable and secure software, regardless of whether such reliability/security is ensured through formal methods, program analysis or testing. This dramatic influence of SMT solvers on software engineering as a discipline is a recent phenomenon, largely attributable to impressive gains in solver efficiency and expressive power.

In my talk, I will motivate the need for SMT solvers, sketch out their research story thus far, and then describe my contributions to solver research. Specifically, I will talk about two SMT solvers that I designed and implemented, namely, STP and HAMPI, currently being used in 100+ research projects. I will talk about real-world applications enabled by my solvers, and the techniques I developed that helped make them efficient.

Time permitting, I will also talk about some theoretical results in the context of SMT solving.

Speaker Bio Vijay Ganesh is an assistant professor at the University of Waterloo, Canada, since September 2012. Prior to that he was a Research Scientist at MIT, and completed his PhD in computer science from Stanford University.
Date 2:00-3:00, Monday, May 20
Place Gates 104
Speaker Cristian Cadar, Imperial College London
Title Safe Software Updates via Multi-version Execution
Abstract Software systems are constantly evolving, with new versions and patches being released on a continuous basis. Unfortunately, software updates present a high risk, with many releases introducing new bugs and security vulnerabilities. We tackle this problem using a simple but effective multi-version based approach. Whenever a new update becomes available, instead of upgrading the software to the new version, we run the new version in parallel with the old one; by carefully coordinating their executions and selecting the behavior of the more reliable version when they diverge, we create a more secure and dependable multi-version application. We implemented this technique in Mx, a system targeting Linux applications running on multi-core processors, and show that it can be applied successfully to several real applications, such as Lighttpd and Redis.
Date 3:00-4:00, Monday, May 6
Place Gates 392
Speaker Rupak Majumdar, Max Planck Institute
Title Static Provenance Verification for Message-Passing Programs
Abstract Provenance information records the source and ownership history of an object. We study the problem of static provenance tracking in concurrent programs in which several principals execute concurrent processes and exchange messages over unbounded but unordered channels. The provenance of a message, roughly, is a function of the sequence of principals that have transmitted the message in the past. The provenance verification problem is to statically decide, given a message passing program and a set of allowed provenances, whether the provenance of all messages in all possible program executions, belongs to the allowed set. We formalize the provenance verification problem abstractly in terms of well-structured provenance domains, and show a general decidability result for it. In particular, we show that if the provenance of a message is a sequence of principals who have sent the message, and a provenance query asks if the provenance lies in a regular set, the problem is decidable and EXPSPACE-complete. We describe an implementation of our technique to check provenances of messages in Firefox extensions. (Joint work with Roland Meyer and Zilong Wang)
Speaker Bio
Date 2:30-3:30, Thurs., Dec. 6
Place 392 Gates
Speaker Christoph Kirsch, University of Salzburg
Title Distributed Queues: Faster Pools and Better Queues
Abstract Designing and implementing high-performance concurrent data structures whose access performance scales on multicore hardware is difficult. An emerging remedy to scalability issues is to relax the sequential semantics of the data structure and exploit the resulting potential for parallel access in relaxed implementations. However, a major obstacle in the adoption of relaxed implementations is the belief that their behavior becomes unpredictable. We therefore aim at relaxing existing implementations systematically for better scalability and performance without incurring a cost in predictability. We present distributed queues (DQ), a new family of relaxed concurrent queue implementations. DQ implement bounded or unbounded out-of-order relaxed queues with strict (i.e. linearizable) emptiness check. Our comparison of DQ against existing pool, and strict and relaxed queue implementations reveals that DQ outperform and outscale the state-of-the-art implementations. We also empirically show that the shorter execution time of queue operations of fast but relaxed implementations such as DQ (i.e. the degree of reordering through overlapping operations) may offset the effect of semantical relaxations (i.e. the degree of reordering through relaxation) making them appear as behaving as or sometimes even more FIFO than strict but slow implementations.

This is joint work with A. Haas, T.A. Henzinger, M. Lippautz, H. Payer, A. Sezgin, and A. Sokolova

Speaker Bio Christoph Kirsch is full professor and holds a chair at the Department of Computer Sciences of the University of Salzburg, Austria. Since 2008 he is also a visiting scholar at the Department of Civil and Environmental Engineering of the University of California, Berkeley. He received his Dr.Ing. degree from Saarland University, Saarbruecken, Germany, in 1999 while at the Max Planck Institute for Computer Science. From 1999 to 2004 he worked as Postdoctoral Researcher at the Department of Electrical Engineering and Computer Sciences of the University of California, Berkeley. His research interests are in concurrent programming and systems, virtual execution environments, and embedded software. Dr. Kirsch co-invented the Giotto and HTL languages, and leads the JAviator UAV project for which he received an IBM faculty award in 2007. He co-founded the International Conference on Embedded Software (EMSOFT), has been elected ACM SIGBED chair in 2011, and is currently associate editor of ACM TODAES.
Date 2:00-3:00, July 16, 2012
Place Gates 260
Speaker Jesse Tov, Harvard
Title Practical Programming with Substructural Types
Abstract Substructural logics remove from classical logic rules for reordering, duplication, or dropping of assumptions. Because propositions in such a logic may no longer be freely copied or ignored, this suggests understanding propositions in substructural logics as representing resources rather than truth. For the programming language designer, substructural logics thus provide a framework for considering type systems that can track the changing states of logical and physical resources.

While several substructural type systems have been proposed and implemented, many of these have targeted substructural types at a particular purpose, rather than offering them as a general facility. The more general substructural type systems have been theoretical in nature and too unwieldy for practical use. This talk presents the design of a general purpose language with substructural types, and discusses several language design problems that had to be solved in order to make substructural types useful in practice.

Speaker Bio
Date 2:00-3:00, July 18, 2012
Place Gates 260
Speaker Aditya Thakur, U. Wisconsin
Title A Deductive Algorithm for Symbolic Abstraction with Applications to SMT
Abstract This talk presents connections between logic and abstract interpretation. In particular, I will present a new algorithm for the problem of "symbolic abstraction": Given a formula \phi in a logic L and an abstract domain A, the symbolic abstraction of \phi is the best abstract value in A that over-approximates the meaning of \phi. When \phi represents a concrete transformer, algorithms for symbolic abstraction can be used to automatically synthesize the corresponding abstract transformer. Furthermore, if the symbolic abstraction of \phi is bottom, then \phi is proved unsatisfiable.

The bottom line is that our algorithm is "dual-use": (i) it can be used by an abstract interpreter to compute abstract transformers, and (ii) it can be used in an SMT (Satisfiability Modulo Theories) solver to determine whether a formula is satisfiable.

The key insight behind the algorithm is that Staalmarck's method for satisfiability checking of propositional-logic formulas can be explained using concepts from the field of abstract interpretation. This insight then led to the discovery of the connection between Staalmarck's method and symbolic abstraction, and the extension of Staalmarck's method to richer logics, such as quantifier-free linear real arithmetic.

This is joint work with Prof. Thomas Reps.

Speaker Bio
Date 3:00-4:00, May 24, 2012
Place Gates 260
Speaker Nataliya Guts, University of Maryland
Title Polymonads: Reasoning and Inference
Abstract Many useful programming constructions can be expressed as monads. Examples include probabilistic modeling, functional reactive programming, parsing, and information flow tracking, not to mention effectful functionality like state and I/O. In our previous work[SGLH11], we presented a type-based rewriting algorithm to make programming with arbitrary monads as easy as ubuilt-in support for state and I/O. Developers write programs using monadic values of type m t as if they were of type t, and our algorithm inserts the necessary binds, units, and monad-to-monad morphisms so that the program typechecks.

A number of other programming idioms resemble monads but deviate from the standard monad binding mechanism. Examples include parameterized monads, monads for effects, information flow state tracking. Our present work aims to provide support for formal reasoning and lightweight programming for such constructs. We present a new expressive paradigm, polymonads, including the equivalent of monad and morphism laws. Polymonads subsume conventional monads and all other examples mentioned above. On the practical side, we provide an extension of our type inference rewriting algorithm to support lightweight programming with polymonads.

[SGLH11] N. Swamy, N. Guts, D. Leijen, M. Hicks. Lightweight Monadic Programming in ML. In ICFP, 2011.

Speaker Bio
Date 3:00-4:00, November 7, 2011
Place Gates 260
Speaker Anupam Datta, CMU
Title Policy Auditing over Incomplete Logs
Abstract We present the design, implementation and evaluation of an algorithm that checks audit logs for compliance with privacy and security policies. The algorithm, which we name reduce, addresses two fundamental challenges in compliance checking that arise in practice. First, in order to be applicable to realistic policies, reduce operates on policies expressed in a first-order logic that allows restricted quantification over infinite domains. We build on ideas from logic programming to identify the restricted form of quantified formulas. The resulting logic is more expressive than prior logics used for compliance-checking, including propositional temporal logics and metric first-order temporal logic, and, in contrast to these logics, can express all 84 disclosure-related clauses in the HIPAA Privacy Rule. Second, since audit logs are inherently incomplete (they may not contain sufficient information to determine whether a policy is violated or not), reduce proceeds iteratively: in each iteration, it provably checks as much of the policy as possible over the current log and outputs a residual policy that can only be checked when the log is extended with additional information. We prove correctness, termination, time and space complexity results for reduce.We implement reduce and evaluate it by checking simulated audit logs for compliance with the HIPAA Privacy Rule. Our experimental results demonstrate that the algorithm is fast enough to be used in practice.
Speaker Bio Anupam Datta is an Assistant Research Professor at Carnegie Mellon University where he is affiliated with CyLab, the Electrical and Computer Engineering Department, and (by courtesy) the Computer research focuses on foundations of security and privacy. One area of focus has been on programming language methods for compositional security. His work on Protocol Composition Logic and the Logic of Secure Systems has uncovered new principles for compositional security and has been applied successfully to find attacks in and verify properties of a number of practical cryptographic protocols and secure systems. A second area of focus has been on formalizing and enforcing privacy policies. He has worked on a Logic of Privacy that formalizes concepts from contextual integrity --- a philosophical theory of privacy as a right to appropriate flows of personal information. His group has produced the first complete formalization of the HIPAA Privacy Rule using this logic and developed principled audit mechanisms for enforcing policies expressed in the logic.

Dr. Datta has co-authored a book and over 30 publications in conferences and journals on these topics. He serves on the Steering Committee of the IEEE Computer Security Foundations Symposium. He has served as Program Co-Chair of the 2011 Formal Aspects of Security and Trust Workshop and the 2008 Formal and Computational Cryptography Workshop. Dr. Datta obtained MS and PhD degrees from Stanford University and a BTech from IIT Kharagpur, all in Computer Science.

Date 3:00-4:00, Oct 4, 2011
Place 498 Gates
Speaker David Basin, ETH Zurich
Title Policy Monitoring in First-order Temporal Logic
Abstract In security and compliance, it is often necessary to ensure that agents and systems comply to complex policies. An example from financial reporting is the requirement that every transaction t of a customer c, who has within the last 30 days been involved in a suspicious transaction t', must be reported as suspicious within 2 days. We present an approach to monitoring such policies formulated in an expressive fragment of metric first-order temporal logic. We also report on case studies in security and compliance monitoring and use these to evaluate both the suitability of this fragment for expressing complex, realistic policies and the efficiency of our monitoring algorithm.
Speaker Bio David Basin is a full professor and has the chair for Information Security at the Department of Computer Science, ETH Zurich since 2003. He is also the director of the ZISC, the Zurich Information Security Center.

He received his bachelors degree in mathematics from Reed College in 1984, his Ph.D. from Cornell University in 1989, and his Habilitation from the University of Saarbr|cken in 1996. His appointments include a postdoctoral research position at the University of Edinburgh (1990 - 1991), and afterwards he led a subgroup, within the programming logics research group, at the Max-Planck-Institut f|r Informatik (1992 - 1997). From 1997 - 2002 he was a full professor at the University of Freiburg where he held the chair for software engineering.

His research focuses on information security, in particular methods and tools for modeling, building, and validating secure and reliable systems.

Date 3:00-4:00 pm, June 10
Place Gates 104
Speaker Koushik Sen, UC Berkeley
Title Specifying and Checking Correctness of Parallel Programs
Abstract The spread of multicore processors and manycore graphics processing units has greatly increased the need for parallel correctness tools. Reasoning about parallel multi-threaded programs is significantly more difficult than for sequential programs due to non-determinism. We believe that the only way to tackle this complexity is to separate reasoning about parallelism correctness (i.e., that a parallel program gives the same outcome despite thread interleavings) from reasoning about functional correctness (i.e., that the program produces the correct outcome on a thread interleaving). In this talk, I will describe two fundamental techniques for separating the parallelization correctness aspect of a program from its functional correctness. The first idea consists of extending programming languages with constructs for writing specifications, called bridge assertions, that focus on relating outcomes of two parallel executions differing only in thread-interleavings. The second idea consists of allowing a programmer to use a non-deterministic sequential program as the specification of a parallel one. For functional correctness, it is then enough to check the sequential program. For parallelization correctness, it is sufficient to check the deterministic behavior of the parallel program with respect to the non-deterministic sequential program. To check parallel correctness, we have developed a new scalable automated method for testing and debugging, called active testing. Active testing combines the power of imprecise program analysis with the precision of software testing to quickly discover concurrency bugs and to reproduce discovered bugs on demand.
Speaker Bio Koushik Sen is an assistant professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. His research interest lies in Software Engineering, Programming Languages, and Formal methods. He is interested in developing software tools and methodologies that improve programmer productivity and software quality. He is best known for his work on directed automated random testing and concolic testing. He has received a NSF CAREER Award in 2008, a Haifa Verification Conference (HVC) Award in 2009, a IFIP TC2 Manfred Paul Award for Excellence in Software: Theory and Practice in 2010, and a Sloan Foundation Fellowship in 2011. He has won three ACM SIGSOFT Distinguished Paper Awards. He received the C.L. and Jane W-S. Liu Award in 2004, the C. W. Gear Outstanding Graduate Award in 2005, and the David J. Kuck Outstanding Ph.D. Thesis Award in 2007 from the UIUC Department of Computer Science. He holds a B.Tech from Indian Institute of Technology, Kanpur, and M.S. and Ph.D. in CS from University of Illinois at Urbana-Champaign.
Date 1:00-2:00, Friday June 3
Place 463a Gates
Speaker Andreas Zeller, Saarland University
Title Mining Precise Specifications
Abstract Recent advances in software validation and verification make it possible to widely automate the check whether a specification is satisfied. This progress is hampered, though, by the persistent difficulty of writing specifications. Are we facing a "specification crisis"? By mining specifications from existing systems, we can alleviate this burden, reusing and extending the knowledge of 60 years of programming, and bridging the gap between formal methods and real-world software. But mining specifications has its challenges: We need good usage examples to learn expected behavior; we need to cope with the approximations of static and dynamic analysis; and we need specifications that are readable and relevant to users. In this talk, I present the state of the art in specification mining, its challenges, and its potential, up to a vision of seamless integration of specification and programming.
Speaker Bio Andreas Zeller is a full professor for Software Engineering at Saarland University in Saarbr|cken, Germany. His research concerns the analysis of large software systems and their development process; his students are funded by companies like Google, Microsoft, or SAP. In June 2011, Zeller will be inducted as Fellow of the ACM for his contributions to automated debugging and mining software archives.
Date 2:00-3:00, Monday, Feb. 7
Place Gates 104
Speaker Michael Franz, UC Irvine
Title Recent Advances In Compiler Research - Firefox's TraceMonkey and Beyond
Abstract Common to practically all compilers built over the past 50 years has been the concept of the "control flow graph", a model of a program that a compiler builds and then traverses while generating target code. Even just-in-time and embedded compilers use such control flow graphs, although they tend to make the unit of compilation smaller than traditional batch compilers (e.g., one method at a time rather than one class at a time). Trace Compilation, to which we have made significant contributions, represents a radical departure from this long established convention. A novel intermediate representation, the Trace Tree, is constructed lazily on-demand while the program is simultaneously executed, incrementally compiled, and optimized. The advantage of this technique is that the compiler doesn't expend any resources on parts of the program that are not frequently executed; traditional compilers construct control-flow graphs for unimportant and even for unreachable parts of a program and need to prune such graphs later. Our specific approach to trace compilation is now in the process of being adopted widely across and beyond academia. Working with the Mozilla foundation, we incorporated our technique into the Firefox browser, starting with version 3.5. By incorporating our invention, Mozilla was able to raise Firefox's JavaScript performance by a surprising factor of 7. Our Trace Compilation technique is now being used daily by several hundred million users around the globe. Other groups of researchers that are now using trace compilation include Oracle, Adobe, Google, and Microsoft, and we are collaborating with several of these projects. In a second project, we are investigating compiler-generated software diversity as a defense mechanism against software attacks. Our solution is centered on an "App Store" containing a diversification engine (a "multicompiler") that automatically generates a unique version of every program each time that a downloader requests it. All the different versions of the same program behave in exactly the same way from the perspective of the end-user, but they implement their functionality in subtly different ways. As a result, any specific attack will succeed only on a small fraction of targets. An attacker would require a large number of different attacks and would have no way of knowing a priori which specific attack will succeed on which specific target. Equally importantly, our approach makes it much more difficult for an attacker to generate attack vectors by way of reverse engineering of security patches.
Speaker Bio Prof. Michael Franz is a Professor of Computer Science in UCI's Donald Bren School of Information and Computer Sciences, a Professor of Electrical Engineering and Computer Science (by courtesy) in UCI's Henry Samueli School of Engineering, and the director of UCI's Secure Systems and Software Laboratory. He is currently also a visiting Professor of Informatics at ETH Zurich, the Swiss Federal Institute of Technology, from which he previously received the Dr. sc. techn. (advisor: Niklaus Wirth) and the Dipl. Informatik-Ing. ETH degrees.
Date 11:00-12:00, Tuesday, December 7th
Place Gates 463a
Speaker Hongseok Yang, University of London
Title Automatic Program Analysis of Overlaid Data Structures
Abstract We call a data structure overlaid, if a node in the structure includes links for multiple data structures and these links are intended to be used at the same time. These overlaid data structures are frequently used in systems code, when implementing multiple types of indexing structures over the same set of nodes. For instance, the deadline IO scheduler of Linux has a queue whose node has links for a doubly-linked list as well as those for a red-black tree. The doubly-linked list here is used to record the order that nodes are inserted in the queue, and the red-black tree provides an efficient indexing structure on the sector fields of the nodes.

In this talk, I will describe an automatic program analysis of these overlaid data structures. The focus of the talk will be on two main issues: to represent such data structures effectively and to build an efficient yet precise program analyser, which can prove the memory safety of realistic examples, such as the Linux deadline IO scheduler. During the talk, I will explain how we addressed the first issue by the combination of standard classical conjunction and separating conjunction from separation logic. Also, I will describe how we used a meta-analysis and the dynamic insertion of ghost instructions in solving the second issue. If time permits, I will give a demo of the tool.

This is a joint work with Oukseh Lee and Rasmus Petersen.

Speaker Bio
Date 2:00-3:00, October 25
Place Gates 104
Speaker Patrick Eugster, Purdue
Title Distributed Event-based Programming in Java
Abstract The abstraction of "event" has been used for years to reason about concurrent and distributed programs, and is being increasingly used as a programming paradigm. Developing distributed event-based applications is currently challenging for programmers though as it involves integrating a number of technologies besides dealing with an abstraction that cuts across more traditional programming paradigms.

EventJava is an extension of the mainstream Java language targeting at simplifying the development of a wide range of event-based applications. In this talk, we first provide an overview of select features of the EventJava language framework and its implementation. Then we present a performance evaluation from different viewpoints. We conclude with an outlook on future work.

Speaker Bio
Date 1:00-2:00, Wednesday, October 13
Place Gates 219
Speaker Erik Meijer, Microsoft
Title Fundamentalist Functional Programming
Abstract In 1984, John Hughes wrote a seminal paper titled "Why Functional Programming Matters", in which he eloquently explained the value of pure and lazy functional programming. Due to the increasing importance of the Web and the advent of many-core machines, in the quarter of a century since the paper was written, the problems associated with imperative languages and their side effects have become increasingly evident.

This talk argues that fundamentalist functional programming-that is, radically eliminating all side effects from programming languages, including strict evaluation-is what it takes to conquer the concurrency and parallelism dragon. Programmers must embrace pure, lazy functional programmin g-with all effects apparent in the type system of the host language using monads.

A radical paradigm shift is the answer, but does that mean that all current programmers will be lost along the way? Fortunately not! By design, LINQ is based on monadic principles, and the success of LINQ proves that the world does not fear the monads.

Speaker Bio Erik Meijer is an accomplished programming-language designer who has worked on a wide range of languages, including Haskell, Mondrian, X#, C#, and Visual Basic. He runs the Cloud Languages Team in the Business Platform Division at Microsoft, where his primary focus has been to remove the impedance mismatch between databases and programming languages in the context of the Cloud. One of the fruits of these efforts is LINQ, which not only adds a native querying syntax to .NET languages, such as C# and Visual Basic, but also allows developers to query data sources other than tables, such as objects or XML. Most recently, Erik has been working on and preaching the virtues of fundamentalist functional programming in the new age of concurrency and many-core. Some people might recognize him from his brief stint as the "Head in the Box" on Microsoft VBTV. These days, you can regularly watch Erik's interviews on the "Expert to Expert" and "Going Deep" series on Channel 9.
Date 2:00-3:00, September 27
Place Gates 104
Speaker Sorin Lerner, UC San Diego
Title Strategies for Building Correct Optimizations
Abstract Program analyses and optimizations are at the core of many optimizing compilers, and their correctness in this setting is critically important because it affects the correctness of any code that is compiled. However, designing correct analyses and optimizations is difficult, error prone and time consuming. This talk will present several inter-related approaches for building correct analyses and optimizations. I will start with an approach based on a domain-specific language for expressing optimizations, combined with a checker for verifying the correctness of optimizations written in this language. This first approach still requires the programmer to write down the optimizations in full detail. I will then move on to several techniques which instead attempt to synthesize correct optimizations from a higher-level description. In particular, I will present an approach that discovers correct optimization opportunities by exploring the application of equality axioms on the program being optimized. Finally, I will present a technique that synthesizes generally applicable and correct optimization rules from concrete examples of code before and after some transformations have been performed.
Speaker Bio Sorin Lerner is an Assistant Professor in the Department of Computer Science and Engineering at the University of California, San Diego. He received his PhD in 2006 from the University of Washington, under the supervision of Craig Chambers. Before that, he received an undergraduate degree in Computer Engineering from McGill University, Montreal. His research interests lie in programming language and analysis techniques for making software systems easier to write, maintain and understand, including static program analysis, domain specific languages, compilation, formal methods and automated theorem proving. Lerner works actively at the interface of Programming Languages and Software Engineering, and frequently publishes at POPL/PLDI and ICSE/FSE. Sorin Lerner was the co-chair of the 2010 ACM SIGPLAN-SIGSOFT PASTE workshop, and is the recipient of an NSF Career Award (2007), and of the 2003 PLDI Best paper award.
Date 2:00-3:00, Wednesday, September 15
Place Gates 392
Speaker Madan Musuvathy, Microsoft Research
Title A Probabilistic Algorithm for Finding Concurrency Errors
(or How to Crash Your Program in the First Few Runs)
Abstract Unexpected thread interleavings can lead to concurrency errors that are hard to find, reproduce, and debug. In this talk, I will present a probabilistic algorithm for finding such errors. The algorithm works by randomly perturbing the timing of threads and event handlers at runtime. Every run of the algorithm finds every concurrency bug in the program with some (reasonably large) probability. Repeated runs can be used to reduce the chance of missing bugs to any desired amount. The algorithms scales to large programs and, in many cases, finds bugs in the first few runs of a program. A tool implementing this algorithm is being used to improve the concurrency testing at Microsoft for over a year.

I will also describe the relationship between this algorithm and the dimension theory of partial-orders, and how results from this field can be used to further improve the algorithm.

Speaker Bio Madan Musuvathi is a researcher at Microsoft Research interested in software verification, program analysis, and systems. Recently, he has focused on the scalable analysis of concurrent systems. He received his Ph.D. at Stanford University in 2004 and has been at Microsoft Research since.
Date 1:00-2:00, August 24
Place Gates 104
Speaker Noam Rinetzky, Queen Mary University of London
Title Verifying Linearizability with Hindsight
Abstract We present a proof of safety and linearizability of a highly-concurrent optimistic set algorithm. The key step in our proof is the Hindsight Lemma, which allows a thread to infer the existence of a global state in which its operation can be linearized based on limited local atomic observations about the shared state. The Hindsight Lemma allows us to avoid one of the most complex and non-intuitive steps in reasoning about highly concurrent algorithms: considering the linearization point of an operation to be in a different thread than the one executing it.

The Hindsight Lemma assumes that the algorithm maintains certain simple invariants which are resilient to interference, and which can themselves be verified using purely thread-local proofs. As a consequence, the lemma allows us to unlock a perhaps-surprising intuition: a high degree of interference makes non-trivial highly-concurrent algorithms in some cases much easier to verify than less concurrent ones.

Joint work with Peter W. O'Hearn (Queen Mary University of London), Martin T. Vechev (IBM T.J. Watson Research Center), Eran Yahav (IBM T.J. Watson Research Center), and Greta Yorsh (IBM T.J. Watson Research Center).

Presented in the 29th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC'10).

Link to paper: http://www.eecs.qmul.ac.uk/~maon/pubs/PODC10-hindsight.pdf

Speaker Bio
Date 2:00-3:00, Monday, August 16
Place Gates 104
Speaker Greta Yorsh, IBM Research
Title Specializing Memory Management for Concurrent Data Structures
Abstract Memory reclamation plays a central role in the design of concurrent data structures. The main challenge is to equip a particular concurrent data structure with its own custom memory reclamation, in a way that is both correct and efficient. This problem arises frequently in environments that do not support automatic memory management, but it is also relevant in the case where we want to obtain a more efficient concurrent data structure. Unfortunately, despite various proposals, the most prevalent methodologies today such as hazard pointers are not well understood, and applying them is still an ad-hoc, error-prone, difficult, and time-consuming manual process.

We propose a systematic approach to specialization of memory reclamation to a particular concurrent data structure. We start with a concurrent algorithm that is proven to behave correctly assuming automatic memory reclamation. We apply a sequence of correctness-preserving transformations to both the memory reclamation scheme and the algorithm. These transformation rely on invariants of the algorithm, computed by standard analyses and clearly illustrate why a given transformation is applied and what are the conditions under which it can be applied safely. We demonstrate our approach by systematically deriving correct and efficient custom memory reclamation for state-of-the-art concurrent data structure algorithms, including several variations of concurrent stack, queue, and set algorithms.

(joint work in progress with Martin Vechev and Eran Yahav)

Speaker Bio
Date CHANGED: 1:00-2:00 Friday, August 6
Place CHANGED: Gates 463a
Speaker Cindy Rubio Gonzalez, University of Wisconsin
Title Error Propagation Analysis for File Systems
Abstract Unchecked errors are especially pernicious in operating system file management code. Transient or permanent hardware failures are inevitable, and error-management bugs at the file system layer can cause silent, unrecoverable data corruption. Furthermore, even when developers have the best of intentions, inaccurate documentation can mislead programmers and cause software to fail in unexpected ways.

We propose an interprocedural static analysis that tracks errors as they propagate through file system code. Our implementation detects overwritten, out-of-scope, and unsaved unchecked errors. Analysis of four widely-used Linux file system implementations (CIFS, ext3, IBM JFS and ReiserFS), a relatively new file system implementation (ext4), and shared virtual file system (VFS) code uncovers 312 confirmed error propagation bugs. Our flow- and context-sensitive approach produces more precise results than related techniques while providing better diagnostic information, including possible execution paths that demonstrate each bug found.

Additionally, we use our error-propagation analysis framework to identify the error codes returned by system calls across 52 Linux file systems. We examine mismatches between documented and actual error codes returned by 42 Linux file-related system calls. Comparing analysis results with Linux manual pages reveals over 1,700 undocumented error-code instances affecting all file systems and system calls examined.

Speaker Bio
Date 2:00-3:00, Thursday, July 22
Place Gates 104
Speaker Byron Cook, Microsoft Research, Cambridge
Title New methods for proving temporal properties of infinite-state systems
Abstract I will describe some new methods of proving temporal properties of infinite-state programs. Our approach takes advantage of the fact that linear-temporal properties can often be proved more efficiently using proof techniques usually associated with the branching-time logic CTL. The caveat is that, in certain instances, nondeterminism in the system's transition relation can cause CTL methods to report counterexamples that are spurious in LTL. To address this problem we describe an algorithm that, as it attempts to apply CTL proof methods, finds and then removes problematic nondeterminism via an analysis on the spurious counterexamples. We must also develop CTL symbolic model checking tools for infinite-state systems.
Speaker Bio Dr. Byron Cook is a Principal Researcher at Microsoft Research in Cambridge, UK as well as Professor of Computer Science at Queen Mary, University of London. He is one of the developers of the Terminator program termination proving tool, as well as the SLAM software model checker. Before joining Microsoft Research he was a developer in the Windows OS kernel group. See research.microsoft.com/~bycook/ for more information.
Date 1:30-2:30, Tuesday, June 1
Place 463a
Speaker Sanjit Seshia, UC Berkeley
Title Integrating Induction and Deduction for Verification and Synthesis
Abstract Even with impressive advances in formal methods over the last few decades, some problems in automatic verification and synthesis remain challenging. Examples include the verification of quantitative properties of software such as execution time, and certain program synthesis problems. In this talk, I will present a new approach to automatic verification and synthesis based on a combination of inductive methods (learning from examples), and deductive methods (based on logical inference and constraint solving).

Our approach integrates verification techniques such as satisfiability solving and theorem proving (SAT/SMT), numerical simulation, and fixpoint computation with inductive inference methods including game-theoretic online learning, learning Boolean functions and learning polyhedra. My talk will illustrate this combination of inductive and deductive reasoning for three problems: (i) program synthesis applied to malware deobfuscation; (ii) the verification of execution time properties of embedded software, and (briefly) (iii) the synthesis of switching logic for hybrid systems.

Speaker Bio Sanjit A. Seshia is an assistant professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He received an M.S. and a Ph.D. in Computer Science from Carnegie Mellon University, and a B.Tech. in Computer Science and Engineering from the Indian Institute of Technology, Bombay. His research interests are in dependable computing and computational logic, with a current focus on applying automated formal methods to problems in embedded systems, computer security, and electronic design automation. He has received a Presidential Early Career Award for Scientists and Engineers (PECASE), an Alfred P. Sloan Research Fellowship, and the School of Computer Science Distinguished Dissertation Award at Carnegie Mellon University.
Date 2:00-3:00, Tuesday, May 11th
Place Gates 104
Speaker Mark Wegman, IBM Research
Title Managing Businesses that Design
Abstract Software development is fundamentally a design process. The quality of the eventual outcome depends on how well people can come together to create a pleasing design. Different organizations may be vastly better or worse than others in how they create designs. Understanding how what an organization does well is similar in many ways to debugging a program. You instrument it, subject to concerns about privacy. The instrumentation can be done via the tools that we use to build software as that's what people use in the organization. Given the needs of the organization to change, those tools may also need to change. The analysis they do on the Software artifacts may change as well. This is new work in an attempt to define a vision of a potential new science on the management of design. It should be noted that we are not advocating that the best management is more intrusive management -- sometimes the best management recognizes that to accomplish what is needed people need to take more risks up front.
Speaker Bio
Date 1:30-2:30, Wednesday, May 12
Place Gates 104
Speaker John Field, IBM Research
Title The Thorn Programming Language: Robust Distributed Scripting
Abstract Scripting languages enjoy great popularity due to their support for rapid and exploratory development. They typically have lightweight syntax, weak data privacy, dynamic typing, and powerful aggregate data types. The price of these features comes later in the software life cycle. Scripts are hard to evolve and compose, and often slow. An additional weakness of most scripting languages is lack of support for distributed computing---though distribution is required for scalability and interacting with remote services. Thorn, developed jointly by IBM Research and Purdue University, is a modern scripting language addressing these issues. It enjoys most of the advantages of scripting languages, but provides support for software evolution and robustification, e.g., an expressive module system and type annotation facilities. It has distributed computing built in the core language.

This is joint work with Bard Bloom, Brian Burg, Nate Nystrom, Johan Vstlund, Gregor Richards, Rok Strnia, Jan Vitek, and Tobias Wrigstad.

Speaker Bio
Date 3:00-4:00, Monday, April 26th
Place 104 Gates
Speaker Ken McMillan, Cadence
Title Relevance Heuristics for Program Analysis
Abstract Relevance heuristics allow us to tailor a program analysis to a particular property to be verified. This in turn makes it possible to improve the precision of the analysis where needed, while maintaining scalability. In this talk I will discuss the principles by which SAT solvers and other decision procedures decide what information is relevant to a given proof. Then we will see how these ideas can be exploited in program verification using the method of Craig interpolation. The result is an analysis that is finely tuned to prove a given property of a program. At the end of the talk, I will cover some recent research in this area, including the use of interpolants for verifying heap-manipulating programs, generating procedure summaries, and generating program tests.
Speaker Bio
Date 10:00-11:00, Monday, April 12th
Place 104 Gates
Speaker Sorav Bansal, IIT Delhi
Title An Optimizing Virtualization Engine
Abstract I will talk about our early experiences with developing a high-performance binary translator to perform hardware virtualization. Our binary translator is capable of performing dynamic runtime optimizations for the OS and it's applications, is transparent to the user, can run an unmodified OS, and can install incrementally on an existing system. I will present our experiences with development and some early results.
Speaker Bio

Date 4:00-5:00, Tuesday, March 9
Place Gates 463a
Speaker Harry Mairson, Brandeis University
Title Linear Logic and the Complexity of Control Flow Analysis
Abstract Static program analysis is about predicting the future: it's what compilers do at compile-time to predict and optimize what happens at run-time. What is the tradeoff between the efficiency and the precision of the computed prediction? Control flow analysis (CFA) is a canonical form of program analysis that answers questions like "can call site X ever call procedure P?" or "can procedure P ever be called with argument A?" Such questions can be answered exactly by Girard's geometry of interaction (GoI), but in the interest of efficiency and time-bounded computation, control flow analysis computes a crude approximation to GoI, possibly including false positives.

Different versions of CFA are parameterized by their sensitivity to calling contexts, as represented by a contour-a sequence of k labels representing these contexts, analogous to Levy's labelled lambda calculus. CFA with larger contours is designed to make the approximation more precise. A naive calculation shows that 0CFA (i.e., with k=0) can be solved in polynomial time, and kCFA (k>0, a constant) can be solved in exponential time. We show that these bounds are exact: the decision problem for 0CFA is PTIME-hard, and for kCFA is EXPTIME-hard. Each proof depends on fundamental insights about the linearity of programs. In both the linear and nonlinear case, contrasting simulations of the linear logic exponential are used in the analysis. The key idea is to take the approximation mechanism inherent in CFA, and exploit its crudeness to do real computation.

This is joint work with David Van Horn (Brandeis University), presented at the 2008 ACM International Conference on Functional Programming.

Speaker Bio

Date 11:00-12:00, Wednesday, March 3
Place Gates 463a
Speaker Ganesan Ramalingam, Microsoft Research
Title Safe Parallelism
Abstract In this talk I will address two problems.

In the first part, we consider the use of speculation, by programmers, as an algorithmic paradigm to parallelize seemingly sequential code. Execution order constraints imposed by dependences can serialize computation, preventing parallelisation of code and algorithms. Speculating on the value(s) carried by dependences is one way to break such critical dependences. We present language constructs that enable programmers to declaratively express speculative parallelism in programs. In general, speculation requires a runtime mechanism to undo the effects of speculative computation in the case of mispredictions. We describe a set of conditions under which such rollback can be avoided. We utilize a static analysis to check if a given program satisfies these conditions to enable safe use of these constructs without the overheads required for rollback.

In the second part, we consider the problem of making a sequential library thread-safe for concurrent clients. We consider a sequential library annotated with assertions along with a proof that these assertions hold in a sequential execution. We show how we can use the proof to derive concurrency control that ensures that any execution of the library methods, when invoked by concurrent clients, satisfies the same assertions.

Speaker Bio

Date 2:00-3:00, Friday, February 5, 2010
Place Gates 104
Speaker Eran Yahav, IBM Research
Title Automatic Inference of Memory Fences
Abstract This work addresses the problem of placing memory fences in a concurrent program running on a relaxed memory model. Modern architectures implement relaxed memory models in which memory operations may be reordered and executed non-atomically. Special instructions called memory fences are provided to the programmer, allowing control of this behavior. To ensure correctness of many algorithms, in particular of non-blocking ones, a programmer is often required to explicitly insert memory fences into her program. However, she must use as few fences as possible, or the benefits of running on a relaxed architecture will be lost. Placing memory fences is challenging and extremely error prone, as it requires subtle reasoning about the underlying memory model.

We present a framework for automatic inference of memory fences in concurrent programs, assisting the programmer with this complex task. Given a program, a specification, a description of the memory model, and a set of test cases, our framework computes a set of ordering constraints that guarantee the correctness of the program under the memory model for all provided test cases. The computed constraints are maximally permissive: removing any constraint from the solution would permit an execution violating the specification. Our framework then realizes the computed constraints as additional fences in the input program. We implemented our approach in a tool called FENDER and used it to infer correct and efficient placements of fences for several nontrivial algorithms, including practical concurrent data structures.

Joint work with Michael Kuperstein and Martin Vechev.

Date 4:00-5:00pm, Monday, February 1, 2010
Place 463a Gates
Speaker David Bacon, IBM Research
Title Liquid Metal: Eliminating the Boundary between Hardware and Software
Abstract This talk will present Liquid Metal, an end-to-end system from language design to co-execution on hardware and software. The goal of the Liquid Metal project at IBM Research is to allow hybrid systems to be programmed in a single dynamic high-level object-oriented language that maps well to CPUs and FPGAs (and the architectures in between) -- to "JIT the Hardware". While at first glance it may seem that these different systems have conflicting requirements in terms of programming features, it is our belief that many of the features turn out to be highly beneficial in both environments when they are provided at a sufficiently high level of abstraction. By using a single language we open up the opportunity to hide the complexity of crossing domains from software into hardware, and facilitate a fluid movement of computation back and forth between different types of computational devices, choosing to execute code where it is most efficient to do so.

I will describe the key features of the language design, describe our compilation, synthesis, and run-time environment, and present initial results from our prototype system.

Joint work with Joshua Auerbach, Rodric Rabbah, and Perry Cheng

Speaker Bio David F. Bacon is a Research Staff Member at IBM's T.J. Watson Research Center. He led the Metronome project which which pioneered hard real-time garbage collection, opening the use of high-level languages like Java for time-critical systems in financial trading, aerospace, defense, video gaming, and telecommunications. His program analysis and synchronization algorithms are included in most compilers and run-time systems for modern object-oriented languages.

Dr. Bacon received his Ph.D. in computer science from the University of California, Berkeley and his A.B. from Columbia University; in 2009 he was a Visiting Professor at Harvard University. He is a member of the IBM Academy of Technology and a Fellow of the ACM.

Date 11:00-12:00, Wednesday, January 27, 2010
Place Gates 463a
Speaker Jonathan Aldrich, CMU
Title Pragmatic Typestate Verification with Permissions
Abstract Object-oriented libraries often define usage protocols that clients must follow in order for the system to work properly. These protocols may be poorly documented and difficult to follow, causing errors and significant lost productivity in software development.

We are exploring a new approach to verifying object protocols using permissions. These permissions track not only the current "typestate" of an object in its protocol, but an abstraction of what operations other aliases might perform on the object and what invariants must remain true. Developers annotate their code with state and permission information, which can be automatically and soundly checked for consistency. Our approach is fully modular, yet allows substantial reasoning about objects even when they are aliased by multiple clients. I will discuss extensions to check protocols in concurrent systems, our practical experience with the system, and current work toward a new programming language based on typestate.

Speaker Bio Jonathan Aldrich is Associate Professor of Computer Science at Carnegie Mellon University. He is the director of CMU's undergraduate minor program in Software Engineering, and teaches courses in programming languages, software engineering and program analysis. Dr. Aldrich joined the CMU faculty after completing a Ph.D. at the University of Washington and a B.S. at Caltech.

Aldrich's research contributions include verifying architectural structure and secure information flow, modular formal reasoning about code, and API protocol safety. For his work on architecture and information flow, Aldrich received a 2006 NSF CAREER award and the 2007 Dahl-Nygaard Junior Prize, given annually for a significant technical contribution to object-oriented programming.

Date 2:00-3:00, Thursday, November 19, 2009,
Place Gates 104
Speaker Sumit Gulwani, Microsoft Research
Title The Reachability-bound Problem
Abstract The "reachability-bound problem" is the problem of finding a symbolic worst-case bound on the number of times a given control location inside a procedure is visited in terms of the inputs to that procedure. This has applications in bounding resources consumed by a program such as time, memory, network-traffic, power, as well as estimating quantitative properties (as opposed to boolean properties) of data in programs, such as amount of information leakage or uncertainty propagation.

Our approach to solving the reachability-bound problem brings together three fundamentally different techniques for reasoning about loops in an effective manner. This includes (a) abstract-interpretation based iterative technique for computing precise disjunctive invariants to summarize nested loops, (b) arithmetic constraint solving based technique for computing ranking functions for individual paths inside loops, and (c) proof-rules based technique for appropriate composition of ranking functions for individual paths for precise loop bound computation.

We have implemented our solution to the reachability-bound problem in a tool called SPEED, which computes symbolic computational complexity bounds for procedures in .Net code-bases. The tool scales to large programs taking an average of around one second to analyze each procedure.

Speaker Bio

Date 3:00-4:00, Monday, October 5, 2009
Place Gates 392
Speaker Martin Rinard, MIT
Title Automatically Reducing Energy Consumption, Improving Performance, and Tolerating Failures With Good Quality of Service
Abstract Reducing energy consumption, improving performance, and tolerating failures are important goals in modern computing systems. We present two techniques for satisfying these goals. The first technique, loop perforation, finds the most time-consuming loops, then transforms the loops to execute fewer iterations. Our results show that this technique can reduce the computational resources required to execute the application by a factor of two to three (enabling corresponding improvements in energy consumption, performance, and fault tolerance) while delivering good quality of service. The second technique, goal-directed parallelization, executes the most time-consuming loops in parallel, then (guided by memory profiling information) adds synchronization and replication as necessary to eliminate bottlenecks and enable the application to produce accurate output. Our results show that this approach makes it possible to effectively parallelize challenging applications without the use of complex static analysis. Because traditional program transformations operate in the absence of any specification of acceptable program behavior, the transformed program must produce the identical result as the original program. In contrast, the two techniques presented in this talk exploit the availability of quality of service specifications to apply much more aggressive transformations that may change the result that the program produces (as long as the result satisfies the specified quality of service requirements). The success of these two techniques demonstrates the advantages of this approach.
Speaker Bio

Date 2:00-3:00, Monday, September 28
Place Gates 104
Speaker Westley Weimer, University of Virginia
Title A Genetic Programming Approach to Automated Software Repair
Abstract Automatic program repair has been a longstanding goal in software engineering, yet debugging remains a largely manual process. We introduce a fully automated method for repairing bugs in software. The approach works on off-the-shelf legacy applications and does not require formal specifications, program annotations or special coding practices. Once a program fault is discovered, an extended form of genetic programming is used to evolve program variants until one is found that both retains required functionality and also avoids the defect in question. Standard test cases are used to exercise the fault and to encode program requirements. After a successful repair has been discovered, it is minimized using structural differencing algorithms and delta debugging. We describe the proposed method and report experimental results demonstrating that it can successfully repair twenty programs totaling almost 200,000 lines of code in under 200 seconds, on average. In addition, we describe how to combine the automatic repair mechanism with anomaly intrusion detection to produce a closed-loop repair system, and empirically evaluate the resulting repair quality. Finally, we propose a test suite sampling technique for automated repair, allowing programs with hundreds of test cases to be repaired in minutes with no loss in repair quality.
Speaker Bio

Date 1:00-2:00, Wednesday, August 26
Place 120 Gates
Speaker Tom Reps, University of Wisconsin and GrammaTech, Inc.
Title WYSINWYX: What You See Is Not What You eXecute
Abstract Computers do not execute source-code programs; they execute machine-code programs that are generated from source code. Consequently, some of the elements relevant to understanding the program's capabilities and potential flaws may not be visible in a program's source code. This can be due to layout choices made by the compiler or optimizer, or because transformations have been applied subsequent to compilation (e.g., to make the code run faster or to insert software protections). We call this the WYSINWYX phenomenon (pronounced ``wiz-in-wicks''): What You See [in source code] Is Not What You eXecute.

Not only can this create a mismatch between what a programmer intends and what is actually executed by the processor, it can cause analyses that are performed on source code -- the approach followed by most program-analysis tools -- to fail to detect bugs and security vulnerabilities. To address this issue, we have developed methods to analyze machine code using a variety of dynamic, static, and symbolic techniques.

Joint work with G. Balakrishnan (NEC), J. Lim (UW), A. Lal (UW), A. Thakur (UW), D. Gopan (GrammaTech, Inc.), and T. Teitelbaum (Cornell and GrammaTech, Inc.).

Speaker Bio

Date 2:00-3:00, Thursday, June 4, 2009
Place Gates 104
Speaker Shaz Qadeer, Microsoft Research
Title Algorithmic verification of systems software using SMT solvers
Abstract Program verification is an undecidable problem; all program verifiers must make a tradeoff between precision and scalability. Over the past decade, a variety of scalable program analysis tools have been developed. These tools, based primarily on techniques such as type systems and dataflow analysis, scale to large and realistic programs. However, to achieve scalability they sacrifice precision, resulting in a significant number of false error reports and adversely affecting the usability of the tool.

In this talk, I will present a different approach to program verification realized in the HAVOC verifier for low-level systems software. HAVOC works directly on the operational semantics of C programs based on a physical model of memory that allows precise modeling of pointer arithmetic and other unsafe operations prevalent in low-level software. To achieve scalability, HAVOC performs modular verification using contracts in an expressive assertion language that includes propositional logic, arithmetic, and quantified type and data-structure invariants. The assertion logic is closed under weakest precondition, thereby guaranteeing precise verification for loop-free and call-free code fragments. To reduce the effort of writing contracts, HAVOC provides a mechanism to infer them automatically. It allows the user to populate the code with candidate contracts and then searches efficiently through the candidate set for a subset of consistent contracts.

The expressive contract language in HAVOC has two important benefits. First, it allows the documentation and verification of properties and invariants specific to a particular software system. Second, it allows a user to systematically achieve the ideal of precise verification (with no false alarms) by interacting with the verifier and providing key contracts that could not be inferred automatically.

HAVOC has been implemented using the Boogie verification-condition generator and the Z3 solver for Satisfiability-Modulo-Theories. I will describe the design and implementation of HAVOC and our experience applying it to verify typestate assertions on medium-sized device drivers with zero false alarms. I will conclude with a discussion of remaining challenges and directions for future work.

Speaker Bio

Date 2:00-3:00, Monday, May 18, 2009
Place Gates 104
Speaker Jeremy Condit, Microsoft Research
Title Unifying Type Checking and Property Checking for Low-Level Code
Abstract Type checking for low-level code is challenging because type safety often depends on complex, program-specific invariants that are difficult for traditional type checkers to express. Conversely, property checking (or assertion checking) for low-level code is challenging because it is difficult to write concise specifications that distinguish between locations in an untyped program's heap. In this talk, I will present a new technique that addresses both problems simultaneously by implementing a type checker for low-level code as part of a property checker. I will present a low-level formalization of a C program's heap and its types that can be checked with an SMT solver, and I will discuss several case studies that demonstrate the ability of this tool to express and check complex type invariants in low-level C code, including several small Windows device drivers. This is joint work with Shaz Qadeer, Shuvendu Lahiri, and Brian Hackett.
Speaker Bio

Date 3:30-4:30 pm, Monday, May 11, 2009
Place Gates 392
Speaker Terence Kelly, HP Labs
Title Eliminating Concurrency Bugs with Control Engineering
Abstract Concurrent programming is notoriously difficult and is becoming increasingly prevalent as multicore hardware compels performance-conscious developers to parallelize software. If we cannot enable the average programmer to write correct and efficient parallel software at reasonable cost, the computer industry's rate of value creation may decline substantially.

Our research addresses the challenges of concurrent programming by leveraging control engineering, a body of technique that can constrain the behavior of complex systems, prevent runtime failures, and relieve human designers and operators of onerous responsibilities. In past decades, control theory made complex and potentially dangerous industrial processes safe and manageable and relieved human operators of tedious and error-prone chores. Today, Discrete Control Theory promises similar benefits for concurrent software. This talk describes an application of the control engineering paradigm to concurrent software: Gadara, which uses Discrete Control Theory to eliminate deadlocks in shared-memory multithreaded software.

Speaker Bio

Date 3:30-4:30 pm, Monday, April 27, 2009
Place Gates 392
Speaker Hans Boehm, HP Labs
Title The C++0x concurrency memory model and some of its implications
Abstract The upcoming revision of the C++ standard (historically referred to as C++0x) integrates support for threads and locks in the language. As a result, it can be much more precise about the semantics of shared variables than prior definitions of the language. Althought the specification is intentionally similar to the corresponding Java rules, there are some significant, and not accidental, differences.

We describe and motivate the treatment of shared variables in C++0x. We explore its consequences on compilers, and speculate about its potential impact on hardware instruction sets and on transactional memory semantics.

Speaker Bio

Date 3:00-4:00 pm, Monday, April 20, 2009
Place Gates 100
Speaker Bill Pugh, University of Maryland and Google
Title The Cost of Static Analysis for Defect Detection
Abstract Static analysis tools can find programming mistakes and places where expected or desired safety properties are not provably guaranteed. In some environments, the cost of shipping or deploying software with defects such as a buffer overflow is huge. In such environments, it is relatively easy to justify the use of static analysis tools. But in other environments, the business case for using static analysis tools is harder to make.

Static analysis tools often find clear coding mistakes. But in the end, companies don't care about whether their code contains coding mistakes; they care about whether the software functions as intended and can be developed and delivered promptly. In this talk, I'll discuss experience with why some coding mistakes don't impact software functionality and ways to minimize the cost/benefit ratio for incorporating static analysis into the software development process.

Speaker Bio

Date 3:00-4:00 pm, Monday, February 9, 2009
Place 392 Gates
Speaker Tom Ball, Microsoft Research
Title Program Analysis 2.0
Abstract Microsoft Research's efforts in program analysis began with detecting defects in C and C++ programs with tools such as ESP, PREfast, PREfix, and SLAM, all of which have been widely deployed internally. The PREfast and SLAM technologies also were incorporated into shipping Microsoft products (Visual Studio and the Driver Development Kit, respectively). In the last few years, we have been working on program analysis tools for .NET, focusing on: (1) code contracts and verifying code against contracts; (2) automatic test generation; (3) checking for concurrency defects. These tools are based on advances in abstract interpretation over linear inequalities, symbolic execution of object-oriented programs, efficient and precise automatic theorem proving, and direct model checking of concurrent systems. All of the above tools are available for both academic and commercial use, in partnership with Visual Studio.
Speaker Bio Thomas Ball is Principal Researcher at Microsoft Research where he manages the Software Reliability Research group (http://research.microsoft.com/srr/). Tom received a Ph.D. from the University of Wisconsin-Madison in 1993, was with Bell Labs from 1993-1999, and has been at Microsoft Research since 1999. He is one of the originators of the SLAM project, a software model checking engine for C that forms the basis of the Static Driver Verifier tool. Tom's interests range from program analysis, model checking, testing and automated theorem proving to the problems of defining and measuring software quality.

Date 3:00-4:00 pm, Monday, February 23, 2009
Place Gates 392
Speaker Mooly Sagiv, Tel-Aviv University
Title TVLA: A system for inferring quantified invariants
Abstract The TVLA system was originally designed as a system for inferring shape properties. In this talk, I will present both the traditional view and the also I will show that the TVLA abstractions amounts to representing invariants in a limited form. This allows TVLA to prove properties which are not usually viewed as shape properties and also sheds some light on the limitations of TVLA and in particular the state space explosion. Moreover, I will present TVLA operations as effective heuristics for reasoning about quantified invariants which includes: materialization, finite differencing, kleene evaluation, and consequence finding This is a joint work with Tal Lev-Ami (Tel Aviv University), Roman Manevich (Tel Aviv University), G. Ramalingam (MSR) , Tom Reps (University of Wisconsin), and Greta Yorsh (IBM Research)
Speaker Bio

Date 3:00-4:00 pm, Monday, February 2, 2009
Place 392 Gates
Speaker Ranjit Jhala, UC San Diego
Title Liquid Types
Abstract We present Logically Qualified Data Types, abbreviated to Liquid Types, a new static program verification technique which combines the complementary strengths of automated deduction (SMT solvers), model checking (Predicate Abstraction), and type systems (Hindley-Milner inference). We have implemented the technique in a tool that infers liquid types for Ocaml programs. To demonstrate the utility of our approach, we show how liquid types reduce, by more than an order of magnitude, the manual annotations required to statically verify (1) the safety of array accesses on a diverse set of benchmarks, and (2) invariants like sortedness, balancedness, binary-search-ordering, variable ordering, set-implementation, heap-implementation, and acyclicity of data structure libraries for list-sorting, union-find, splay trees, AVL trees, red-black trees, heaps, associative maps, extensible vectors, and binary decision diagrams.
Speaker Bio

Date 3:00-4:00 pm, Monday, January 26, 2009
Place Gates 392
Speaker Greg Bronevetsky, Lawrence Livermore National Lab
Title Static Dataflow for Message Passing Applications
Abstract Message passing is a very popular style of parallel programming, used in a wide variety of applications and supported by many APIs, such as BSD sockets, MPI and PVM. It's importance has motivated significant amounts of research on optimization and debugging techniques for such applications. Although this work has produced impressive results, it has also failed to fulfill its full potential. The reason is that while prior work has focused on runtime techniques, there has been very little on compiler analyses that understand the properties of parallel message passing applications and use this information to improve application performance and debuggers quality. This paper presents a novel compiler analysis framework that extends dataflow to parallel message passing applications on arbitrary numbers of processes. It works on an extended control-flow graph that includes all the possible inter-process interactions of any numbers of processes. This enables dataflow analyses built on top of this framework to incorporate information about the application's parallel behavior and communication topology. The overall parallel dataflow framework can be instantiated with a variety of specific dataflow analyses as well as abstractions that can tune the cost/accuracy of detecting the application's communication topology. The proposed framework bridges the gap between prior work on parallel runtime systems and sequential dataflow analyses, enabling new transformations, runtime optimizations and bug detection tools that require knowledge of the application's communication topology. We instantiate this framework with two different symbolic analyses and show how these analyses can detect different types of communication patterns, which enables the use of dataflow analyses on a wide variety of real applications.
Speaker Bio

Date 3:00-4:00 pm, Monday, January 12, 2009
Place Gates 392
Speaker Cormac Flanagan, UC Santa Cruz
Title Velodrome: A Sound and Complete Dynamic Atomicity Checker For Multithreaded Programs
Abstract Atomicity is a fundamental correctness property in multithreaded programs, both because atomic code blocks are amenable to sequential reasoning (which significantly simplifies correctness arguments), and because atomicity violations often reveal defects in a program's synchronization structure. Unfortunately, all existing atomicity checkers are incomplete, in that they may yield false alarms even on correctly-synchronized programs, which significantly limits their usefulness. We present the first dynamic analysis for atomicity that is both sound and complete. The analysis reasons about the exact dependencies between operations in the observed trace, and so reports error messages if and only if the observed trace is not conflict-serializable. Despite this significant increase in precision, we show that the performance and coverage achieved by our analysis is competitive with earlier incomplete dynamic atomicity analyses.
Speaker Bio

Date 2:00-3:00 pm, Friday, January 9, 2009
Place Gates 104
Speaker Mooly Sagiv, Tel-Aviv University
Title (Semi) Thread-Modular Shape Analysis
Abstract Thread-modular static analysis of concurrent systems abstracts away the correlations between the local variables (and program locations) of different threads. This idea reduces the exponential complexity due to thread interleaving and allows us to handle programs with an unbounded number of threads. Thread-modular static analyses face a major problem in simultaneously requiring a separation of the reasoning done for each thread, for efficiency purposes, and capturing relevant interactions between threads, which is often crucial to verify properties. Programs that manipulate the heap complicate thread-modular analysis. Naively treating the heap as part of the global state, accessible by all threads, has several disadvantages since it still admits exponential blow-ups in the heap and is not precise enough to capture things like ownership transfers of heap objects. An effective thread-modular analysis needs to determine which parts of the heap are owned by which threads to obtain a suitable thread-modular state abstraction. I will present new thread-modular analysis techniques and adaptations of thread-modular analysis for programs which manipulate the heap. It is shown that the precision of thread-modular analysis is improved by tracking some correlations between the local variables of different threads. I will also describe techniques for reducing the analysis time for common situations. A key observation for handling the heap is using notions of separation and more generally subheaps in order to abstract away correlations between the properties of subheaps. This is a joint work with Josh Berdine (MSR), Byron Cook (MSR), Alexey Gotsman (Cambridge University), Tal Lev-Ami (Tel Aviv University), Roman Manevich (Tel Aviv University), G. Ramalingam (MSR), and Michal Segalov (Tel Aviv University)
Speaker Bio

Date 1:00-2:00, Friday, Dec. 5
Place Gates 392
Speaker George Candea,EPFL
Title Deadlock Immunity: Teaching Systems How To Defend Against Deadlocks
Abstract Deadlock immunity is a property by which programs, once afflicted by a given deadlock, develop resistance against future occurrences of that and similar deadlocks. We developed a technique that enables programs to automatically gain such immunity without assistance from programmers or users. We implemented it for both Java and POSIX threads and evaluated it with several real systems, including MySQL, JBoss, SQLite, Apache ActiveMQ, Limewire, and Java JDK. The results demonstrate effectiveness against real, reported deadlock bugs, while incurring modest performance overhead and scaling to 1024 threads. I will discuss how deadlock immunity can offer programmers and users an attractive tool for coping with elusive deadlocks, as well as present extensions of the immunity idea to other types of failures.
Speaker Bio

Date 2:00-3:00 pm, Monday, October 27, 2008
Place Gates 104
Speaker Mayur Naik, Intel Berkeley Research Lab
Title Effective Static Deadlock Detection
Abstract We present an effective static deadlock detection algorithm for Java. Our algorithm uses a novel combination of static analyses each of which approximates a different necessary condition for a deadlock. We have implemented the algorithm and report upon our experience applying it to a suite of multi-threaded Java programs. While neither sound nor complete, our approach is effective in practice, finding all known deadlocks as well as discovering previously unknown ones in our benchmarks with few false alarms.

Joint work with Chang-Seo Park (UC Berkeley), Koushik Sen (UC Berkeley), and David Gay (Intel Research, Berkeley)

Speaker Bio Mayur Naik is a researcher at Intel Research, Berkeley. His current research interests include languages and tools for helping programmers write parallel programs. He is involved in projects Ivy (http://ivy.cs.berkeley.edu/) and Chord (http://chord.stanford.edu/) which explore techniques for improving the reliability of multi-threaded programs written in C and Java, respectively. He received his Ph.D. in Computer Science from Stanford University in 2008 working on static race detection for Java.

Date 2:00-3:00 pm, Monday, October 13, 2008
Place Gates 104
Speaker Ras Bodik, UC Berkeley
Title Program Synthesis by Sketching
Abstract Software synthesis automatically derives programs that are efficient, even surprising, but it requires a domain theory, elusive for many applications. Trying to make synthesis accessible, we style the synthesizer into a programmer assistant: the programmer writes a partial program that elides tricky code fragments and the synthesizer completes the program to match a specification. Our hypothesis is that the partial program, called a sketch, communicates the programmer insight to the synthesizer more naturally than a domain theory. On the algorithmic side, sketching exploits recent advances in automated decision procedures. This talk will show how we turned a program checker into a synthesizer with a counterexample-guided inductive synthesis. I will also describe the SKETCH language and its linguistic support for synthesis and show how we synthesized complex implementations of ciphers, scientific codes, and even concurrent lock-free data-structures. Joint work with Armando Solar-Lezama, Chris Jones, Gilad Arnold, Lexin Shan, Satish Chandra and many others.
Speaker Bio

Date 2:00-3:00 pm, Monday, June 23, 2008
Place Gates 498
Speaker Michael Bond, UT Austin
Title Deployed Software: An Ideal Environment for Fixing Bugs?
Abstract Despite extensive in-house testing, deployed software still contains bugs. These bugs cause systems to fail, wasting billions of dollars and sometimes causing injury or death. Bugs in deployed software are hard to diagnose and fix since they are environment dependent and difficult for developers to reproduce. Furthermore, developers cannot use heavyweight approaches that would degrade performance. This talk makes a case for deployment being the ideal environment for fixing bugs. Solutions fall into two categories: helping developers diagnose and fix bugs, and automatically tolerating bugs instead of letting systems fail. The talk focuses on memory leaks in managed languages--the lone memory bug not eliminated by modern languages--and presents an approach for diagnosing leaks, called Bell, and an approach for tolerating leaks, called Melt. Bell encodes per-object program locations into a single bit and uses brute-force decoding to recover likely program locations causing leaks. Melt puts likely leaked memory on disk and keeps time and memory resources proportional to in-use (not leaked) memory even in the face of growing leaks. The talk concludes with future work and thoughts about how software and hardware trends will make bugs a bigger problem and make deployment-time bug detection and tolerance even more appealing.
Speaker Bio

Date 4:00-5:00 pm, Thursday, June 26, 2008
Place Gates 498
Speaker Todd Millstein, UCLA
Title Enforcing and Validating User-Defined Programming Disciplines
Abstract One way that programmers manage the complexity of building and maintaining software systems is by adhering to "programming disciplines" of various sorts. For example, locking disciplines prevent concurrency errors, "ownership" disciplines control aliasing among pointers, and design patterns enforce architectural styles or constraints. However, these disciplines are typically only documented informally through comments, if they are documented at all, and can easily be forgotten or misused. Over the past few years, colleagues and I have developed frameworks that allows programmers to specify desired programming disciplines, enforce them statically on programs, and validate them against intended run-time invariants. In this talk, I will overview our approach and illustrate it through our JavaCOP framework for "pluggable type systems" in Java.
Speaker Bio Todd Millstein is an Assistant Professor in the Computer Science Department at the University of California, Los Angeles. Todd received his Ph.D. and M.S. from the University of Washington and his A.B. from Brown University, all in Computer Science. He received an NSF CAREER award in 2006 and an IBM Faculty Award in 2008.

Date 3:00-4:00 pm, Monday, May 14, 2007
Place Gates 104
Speaker Manuvir Das, Microsoft
Title Formal Specifications on Industrial-Strength Code - From Myth to Reality
Abstract The research community has long understood the value of formal specifications in building robust software. However, the adoption of any specifications beyond run-time assertions in industrial software has been limited. All of this has changed at Microsoft in the last few years. Today, formal specifications are a mandated part of the software development process in the largest Microsoft product groups. Millions of specifications have been added, and tens of thousands of bugs have been exposed and fixed in future versions of products under development. In addition, Windows public interfaces are formally specified and the Visual Studio compiler understands and enforces these specifications, meaning that programmers anywhere can now use formal specifications to make their software more robust.

The goal of this talk is to share the technical, social and practical story of how we were able to move organizations with thousands of programmers to an environment where the use of formal specifications is routine.
Speaker Bio Manuvir Das leads the Program Analysis research group in the Center for Software Excellence at Microsoft Corporation, and is an affiliate faculty member at the University of Washington. His research interests are in inventing and applying techniques from Programming Languages, Compilers, and Systems to the software engineering process. Manuvir holds a bachelor's degree in Computer Science from IIT Bombay, and a PhD in Computer Science from the University of Wisconsin-Madison.

Date 4:00-5:00 pm, Thursday, April 26, 2007
Place Gates 463a (Theory Lounge)
Speaker Philip Wadler, University of Edinburgh
Title Faith, Evolution, and Programming Languages
Abstract Faith and evolution provide complementary--and sometimes conflicting--models of the world, and they also can model the adoption of programming languages. Adherents of competing paradigms, such as functional and object-oriented programming, often appear motivated by faith. Families of related languages, such as C, C++, Java, and C#, may arise from pressures of evolution. As designers of languages, adoption rates provide us with scientific data, but the belief that elegant designs are better is a matter of faith. This talk traces one concept, second-order quantification, from its inception in the symbolic logic of Frege through to the generic features introduced in Java 5, touching on features of faith and evolution. The remarkable correspondence between natural deduction and functional programming informed the design of type classes in Haskell. Generics in Java evolved directly from Haskell type classes, and are designed to support evolution from legacy code to generic code. Links, a successor to Haskell aimed at AJAX-style three-tier web applications, aims to reconcile some of the conflict between dynamic and static approaches to typing.
Speaker Bio Philip Wadler is Professor of Theoretical Computer Science at the University of Edinburgh. He holds a Royal Society-Wolfson Research Merit Fellowship and is a Fellow of the Royal Society of Edinburgh. Previously, he worked or studied at Avaya Labs, Bell Labs, Glasgow, Chalmers, Oxford, CMU, Xerox Parc, and Stanford, and has visited as a guest professor in Paris, Sydney, and Copenhagen. Prof. Wadler appears at position 70 on Citeseer's list of most-cited authors in computer science; is a winner of the POPL Most Influential Paper Award; and sits on the ACM Sigplan Executive Committee. He contributed to the designs of Haskell, Java, and XQuery, and is a co-author of XQuery from the Experts (Addison Wesley, 2004) and Generics and Collections in Java (O'Reilly, 2006). He has delivered invited talks in locations ranging from Aizu to Zurich.

Date 3:00-4:00 pm, Monday, April 9, 2007
Place Gates 463A
Speaker Corina Pasareanu, NASA Ames
Title Abstract State Matching and Symbolic Execution for Software Model Checking
Abstract This talk describes abstraction based model checking techniques that compute under-approximations of the feasible behaviors of the system under analysis. The techniques perform model checking with explicit, or symbolic, execution and abstract state matching. Model checking explores the concrete program transitions while storing abstract versions of the explored states, as specified by an abstraction mapping. State matching determines whether an abstract state is being revisited, in which case the model checker backtracks. Applications include program error detection and test input generation.

This is joint work with Willem Visser (SEVEN Networks), Saswat Anand (Georgia Institute of Technology), and Radek Pelanek (Masaryk University).

Date 3:00-4:00 pm, Monday, March 5, 2007
Place Gates 104
Speaker Gary T. Leavens, Iowa State University
Title JML: Expressive Modular Reasoning for Java
Abstract The Java Modeling Language (JML) is used to specify, check, and verify detailed designs for Java classes and interfaces. JML is an open, international, collaborative effort among 20 research groups and projects. This talk briefly describes background on the JML effort, including its tool support for runtime assertion checking, extended static checking, static verification, unit testing, etc.

Subtyping and dynamic dispatch in object-oriented languages pose a challenge for modular reasoning, which is the key to practical tools. Moreover, a specification language must be clear and expressive to be used by practicing programmers. Modular reasoning is done by using supertype abstraction -- reasoning based on static type information. Expressiveness comes from a rich set of features for specifying methods. This talk describes how specification inheritance in JML forces behavioral subtyping, through a discussion of semantics and examples. Behavioral subtyping, together with a set of methodological restrictions, makes modular reasoning with supertype abstraction valid.

This work was supported in part by NSF grant CCF-0429567. The work on behavioral subtyping in particular is based on joint work with Prof. David A. Naumann of Stevens University. See http://jmlspecs.org for more information about JML.

Date 2:00-3:00 pm, Monday, February 26, 2007
Place Gates 104
Speaker David Monniaux, National Center for Scientific Research (France)
Title Verification of device drivers: asynchronous "intelligent" peripherals
Abstract It is common in current embedded systems to use powerful off-the-shelf technologies such as programmable I/O controllers doing direct-to-memory access, handling worklists etc., for instance for USB busses. This poses the question of verifying properties of the driver, since the driver cannot be analyzed in isolation but must be asynchronously composed with a specification of the controller.

We'll see some results of the verification of properties of a simulation of a USB OHCI controller asynchronously composed with an industrial device driver, using the Astree static analyzer. We'll then discuss future works and challenges.

Date 3:00-4:00 pm, Monday, February 5, 2007
Place Gates 104
Speaker Michael Hicks, University of Maryland
Title Modular Information Hiding and Type Safe Linking for C
Abstract The overarching goal of my research is to explore means to develop more flexible, reliable, and secure software systems. In this talk, I will present a brief overview of my work, and in particular focus on CMod, a novel tool that provides a sound module system for C.

CMod works by enforcing a set of four rules that are based on principles of modular reasoning and on current programming practice. CMod's rules flesh out the convention that .h header files are module interfaces and .c source files are module implementations. Although this convention is well known, developing CMod's rules revealed there are many subtleties in applying the basic pattern correctly. We have proven formally that CMod's rules enforce both information hiding and type-safe linking. We evaluated CMod on a number of benchmarks, and found that most programs obey CMod's rules, or can be made to with minimal effort, while rule violations reveal brittle coding practices including numerous information hiding violations and occasional type errors.

Date 3:00-4:00 pm, Monday, January 29, 2007
Place Gates 104
Speaker Radu Rugina, Cornell University
Title Practical Shape Analysis
Abstract Shape analyses are aimed at extracting invariants that describe the "shapes" of heap-allocated recursive data structures. Although existing shape analyses have been successful at verifying complex heap manipulations, they have had limited success at being practical for larger programs.

In this talk I will present two practical approaches to heap analysis and their application to error detection, program verification, and compiler transformations. First, I will present a reference counting shape analysis where the compiler uses local reasoning about individual heap cells, instead of global reasoning about the entire heap. Second, I will present a heap analysis by contradiction, where the analysis checks the absence of heap errors by disproving their presence. These techniques are both sufficiently precise to accurately analyze a large class of heap manipulation algorithms, and sufficiently lightweight to scale to larger programs.

Date 3:00-4:00 pm, Monday, January 8, 2007
Place Gates 104
Speaker Michael Ernst, MIT
Title Refactoring for parameterizing Java classes
Abstract Type safety and expressiveness of many existing Java libraries and their client applications would improve, if the libraries were upgraded to define generic classes. Efficient and accurate tools exist to assist client applications to use generics libraries, but so far the libraries themselves must be parameterized manually, which is a tedious, time-consuming, and error-prone task. We present a type-constraint-based algorithm for converting non-generic libraries to add type parameters. The algorithm handles the full Java language and preserves backward compatibility, thus making it safe for existing clients. Among other features, it is capable of inferring wildcard types and introducing type parameters for mutually-dependent classes. We have implemented the algorithm as a fully automatic refactoring in Eclipse.

We evaluated our work in two ways. First, our tool parameterized code that was lacking type parameters. We contacted the developers of several of these applications, and in all cases where we received a response, they confirmed that the resulting parameterizations were correct and useful. Second, to better quantify its effectiveness, our tool parameterized classes from already-generic libraries, and we compared the results to those that were created by the libraries' authors. Our tool performed the refactoring accurately ? in 87% of cases the results were as good as those created manually by a human expert, in 9% of cases the tool results were better, and in 4% of cases the tool results were worse.

Date 3:00-4:00 pm, Monday, December 4, 2006
Place Gates 104
Speaker Koushik Sen, University of California, Berkeley
Title Concolic Testing of Sequential and Concurrent Programs
Abstract Testing with manually generated test cases is the primary technique used in industry to improve reliability of software--in fact, such testing is reported to account for 50%-80% of the typical cost of software development. I will describe Concolic Testing, a systematic and efficient method which combines random and symbolic testing. Concolic testing enables automatic and systematic testing of large programs, avoids redundant test cases and does not generate false warnings. Experiments on real-world software show that concolic testing can be used to effectively catch generic errors such as assertion violations, memory leaks, uncaught exceptions, and segmentation faults. Combined with dynamic partial order reduction techniques and predictive analysis, concolic testing is effective in catching concurrency bugs such as data races and deadlocks as well as specification related bugs. I will describe my experience with building two concolic testing tools, CUTE for C programs and jCUTE for Java programs, and applying these tools to real-world software systems. I will briefly describe some ongoing projects to improve and extend concolic testing.

Date 3:00-4:00 pm, Thursday, November 30, 2006
Place Gates 104
Speaker Rupak Majumdar, University of California, Los Angeles
Title What's Next for Software Verification?
Abstract Over the last few years, software verification based on predicate abstraction and counterexample-guided refinement has been a successful technique for performing automatic and precise static analysis of programs. Other than device driver protocols, though, software verifiers have so far mostly been applied to simple properties of systems. For these programs and properties, we show a simple syntactic algorithm to construct approximate program invariants that is surprisingly effective, suggesting perhaps that software verification tools are overkill for these applications. For deeper properties, we describe our recent attempts to verify a memory management subsystem and outline several challenges that must be met before the verification will succeed. We describe some partial progress. In particular, we describe a technique to infer predicates in the presence of data structures such as lists and sets, and a type system that extracts word-level relationships from code containing bitwise operations.

Date 3:00-4:00 pm, Monday, November 6, 2006
Place Gates 104
Speaker Ranjit Jhala, University of California, San Diego
Title Permissive Interfaces
Abstract A modular program analysis considers components independently and provides succinct summaries for each component, which can be used when checking the rest of the system. Consider a system comprising of two components, a library and a client. A temporal summary, or interface, of the library specifies legal sequences of library calls. The interface is safe if no call sequence violates the library's internal invariants; the interface is permissive if it contains every such sequence. Modular program analysis requires full interfaces, which are both safe and permissive: the client does not cause errors in the library if and only if it makes only sequences of library calls that are allowed by the full interface of the library. Previous interface-based methods have focused on safe interfaces, which may be too restrictive and thus reject good clients. We present an algorithm for automatically synthesizing software interfaces that are both safe and permissive. The algorithm generates interfaces as graphs whose vertices are labeled with predicates over the library's internal state, and whose edges are labeled with library calls. The interface state is refined incrementally until the full interface is constructed. In other words, the algorithm automatically synthesizes a typestate system for the library, against which any client can be checked for compatibility. We present an implementation of the algorithm which is based on the BLAST model checker, and we evaluate some case studies.

Joint work with Thomas A. Henzinger and Rupak Majumdar.

Date 2:30-3:30 pm, Tuesday, October 31, 2006
Place Gates 104
Speaker Guy L. Steele Jr., Sun Microsystems Laboratories
Title Parallel Programming and Code Selection in Fortress
Abstract As part of the DARPA program for High Productivity Computing Systems, the Programming Language Research Group at Sun Microsystems Laboratories is developing Fortress, a language intended to support large-scale scientific computation with the same level of portability that the Java programming language provided for multithreaded commercial applications. One of the design principles of Fortress is that parallelism be encouraged everywhere; for example, it is intentionally just a little bit harder to write a sequential loop than a parallel loop. Another is to have rich mechanisms for encapsulation and abstraction; the idea is to have a fairly complicated language for library writers that enables them to write libraries that present a relatively simple set of interfaces to the application programmer. Thus Fortress is as much a framework for language developers as it is a language for coding scientific applications. We will discuss ideas for using a rich parameterized polymorphic type system to organize multithreading and data distribution on large parallel machines. The net result is similar in some ways to data distribution facilities in other languages such as HPF and Chapel, but more open-ended, because in Fortress the facilities are defined by user-replaceable and -extendable libraries rather than wired into the compiler. A sufficiently rich type system can take the place of certain kinds of flow analysis to guide certain kinds of code selection and optimization, again moving policymaking out of the compiler and into libraries coded in the Fortress source language.

Date 3:00-4:00 pm, Monday, October 16, 2006
Place Gates 104
Speaker Zhendong Su, University of California, Davis
Title Scalable and Accurate Tree-based Detection of Code Clones
Abstract Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against code modifications. In this talk, I will present an efficient algorithm for identifying similar subtrees. The algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space R^n and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called Deckard and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that Deckard is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.

Joint work with Lingxiao Jiang, Ghassan Misherghi, and Stephane Glondu.

Date 3:00-4:00 pm, Thursday, June 8, 2006
Place Gates 104
Speaker Bowen Alpern, IBM T. J. Watson Research Center
Title Early Ruminations on Anticipating Demand for Software Delivered as a Stream
Abstract This is not, at least not primarily, a talk about the economic viability of a particular model of software distribution. Rather, it reports on some preliminary investigations into whether predictive caching techniques might be able to mitigate the first-time performance penalty incurred when software executes as it is being delivered. The context for this work is the PDS (Progressive Deployment System) project at IBM Research which uses application virtualization to deliver an application and its non operating system dependencies as a stream in order to avoid dependency problems that can occur when multiple applications are installed on the same machine. This is joint work with the PDS group.

Date 2:00-3:00 pm, Thursday, May 18, 2006
Place Gates 104
Speaker Danny Dig
Title Automated Detection of Refactorings in Evolving Components
Abstract One of the costs of reusing software components is upgrading applications to use the new version of the components. Upgrading an application can be error-prone, tedious, and disruptive of the development process. An important kind of change in OO components is a refactoring. Refactorings are program transformations that improve the internal design without changing the observable behavior (e.g., renamings, moving methods between classes, splitting/merging of classes). Our previous study showed that more than 80% of the disruptive changes in five different components were caused by refactorings. If the refactorings that happened between two versions of a component could be automatically detected, a refactoring tool could replay them on applications.

I will present an algorithm that detects refactorings performed during component evolution. Our algorithm uses a combination of a fast syntactic analysis to detect refactoring candidates and a more expensive semantic analysis to refine the results. The experiments on components ranging from 17 KLOC to 352 KLOC show that our algorithm detects refactorings in real-world components with accuracy over 85%.

Joint work with Can Comertoglu, Darko Marinov and Ralph Johnson (UIUC).

Date 3:00-4:00 pm, Monday, May 15, 2006
Place Gates 104
Speaker Stephen Freund, Williams College
Title Practical Hybrid Type Checking
Abstract Software systems typically contain large API's that are only informally and imprecisely specified and hence easily misused. Practical mechanisms for documenting and verifying precise specifications would significantly improve software reliability.

The Sage programming language is designed to provide high-coverage checking of expressive specifications. The Sage type language is a synthesis of (unrestricted) refinement types and Pure Type Systems. Since type checking for this language is not statically decidable, Sage uses hybrid type checking, which extends static type checking with dynamic contract checking, automatic theorem proving, and a database of refuted subtype judgments.

In this talk, I present the key ideas behind hybrid type checking, the Sage language, and preliminary experimental results suggesting that hybrid type checking of precise specifications is a promising approach for the development of reliable software. I will also discuss more recent work on extending Sage to include mutable objects.

This is joint work with Cormac Flanagan, Jessica Gronski, Kenn Knowles, and Aaron Tomb at University of California, Santa Cruz.

Date 3:00-4:00 pm, Monday, April 24, 2006
Place Gates 104
Speaker Kathy Yelick, UC Berkeley and Lawrence Berkeley National Lab
Title Compilation Technology for Computational Science
Abstract The emergence of multicore processors marks the end of an era in computing: whereas hardware developers were largely responsible for exponential performance gains in the past decade, software developers will now be equally responsible if these gains are to continue. The notoriously difficult problem of writing parallel software will now be commonplace. Much of the experience using parallelism for performance resides in the scientific computing community, and while that group has focused on numerical simulations and very large scale parallelism, surely some lessons can be learned. In this talk, I will describe the class of partitioned global address space languages, which have recently received support from the scientific computing community as a possible alternative to message passing and threads. One of these languages, Titanium, is a modest extension of Java with domain-specific extensions for scientific computing and large-scale parallelism.

Titanium has proven to be significantly more expressive than message passing and has been used for significant scientific problems, including a parallel simulation of blood flow in the heart and an elliptic solver based on adaptive mesh refinement. Several interesting computer science challenges have arisen from this work, including the need for program analysis techniques specialized to parallel languages, machine-independent optimization strategies, and runtime support for latency hiding. I will describe some recent work in compilation of parallel languages, including thread-aware pointer analysis, sequential consistency enforcement, and model-driven communication optimizations, and give an overview of some open questions in the field.

Date 3:00-4:00 pm, Monday, April 17, 2006
Place Gates 104
Speaker Brad Chamberlain, Cray Inc.
Title Chapel: Cray Cascade's High Productivity Language
Abstract In 2002, DARPA launched the High Productivity Computing Systems (HPCS) program, with the goal of improving user productivity on High-End Computing systems for the year 2010. As part of Cray's research efforts in this program, we have been developing a new parallel language named Chapel, designed to:
1. support a global view of parallel programming with the ability to tune for locality,
2. support general parallel programming including data- and task-parallel codes, as well as nested parallelism, and
3. help narrow the gulf between mainstream and parallel languages.
In this talk I will introduce the motivations and foundations for Chapel, describe several core language concepts, and show some sample computations written in Chapel.

Date 2:15-3:15 pm, Friday, March 10, 2006
Place Gates B8
Speaker Stephen Fink, IBM T. J. Watson Research Center
Title Effective Typestate Verification in the Presence of Aliasing
Abstract We describe a novel framework for verification of typestate properties, including several new techniques to precisely treat aliases without undue performance costs. In particular, we present a flow-sensitive, context-sensitive, integrated verifier that utilizes a parametric abstract domain combining typestate and aliasing information. To scale to real programs without compromising precision, we present a staged verification system in which faster verifiers run as early stages which reduce the workload for later, more precise, stages.

We have evaluated our framework on a number of real Java programs, checking correct API usage for various Java standard libraries. The results show that our approach scales to hundreds of thousands of lines of code, and verifies correctness for over 95% of the potential points of failure.

Date 4:00-5:15 pm, Thursday, February 23, 2006
Place Gates 104
Speaker Vivek Sarkar, IBM T. J. Watson Research Center
Title X10: An Object-Oriented Approach to Non-Uniform Cluster Computing
Abstract It is now well established that the device scaling predicted by Moore's Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the rate that had been sustained during the last two decades. As a result, future systems are rapidly moving from uniprocessor to multiprocessor configurations, so as to use parallelism instead of frequency scaling as the foundation for increased compute capacity. The dominant emerging multiprocessor structure for the future is a Non-Uniform Cluster Computing (NUCC) system with nodes that are built out of multi-core SMP chips with non-uniform memory hierarchies, and interconnected in horizontally scalable cluster configurations such as blade servers. Unlike previous generations of hardware evolution, this shift will have a major impact on existing software. Current OO language facilities for concurrent and distributed programming are inadequate for addressing the needs of NUCC systems because they do not support the notions of non-uniform data access within a node, or of tight coupling of distributed nodes.

We have designed a modern object-oriented programming language, X10, for high performance, high productivity programming of NUCC systems. A member of the partitioned global address space family of languages, X10 highlights the explicit reification of locality in the form of places; lightweight activities embodied in async, future, foreach, and ateach constructs; a construct for termination detection (finish); the use of lock-free synchronization (atomic blocks); and the manipulation of cluster-wide global data structures. We present an overview of the X10 programming model and language, experience with our reference implementation, and results from some initial productivity comparisons between the X10 and Java(TM) languages.

This is joint work with other members of the X10 core team --- Vijay Saraswat, Raj Barik, Philippe Charles, Christopher Donawa, Christian Grothoff, Allan Kielstra, Igor Peshansky, and Christoph von Praun.

Date 3:00-4:00 pm, Monday, February 6, 2006
Place Gates 104
Speaker Kathleen Fisher
Title PADS: Processing Arbitrary Data Sources
Abstract Many high-volume data sources exist that can be mined very profitably, for example: call detail records, web server logs, network packets, network configuration and log files, provisioning records, credit card records, stock market data, etc. Unfortunately, many such data sources are in formats over which data consumers have no control. A significant effort is required to understand such a data source and write a parser for the data, a process that is both tedious and error-prone. Often, the hard-won understanding of the data ends up embedded in parsing code, making both sharing the understanding and maintaining the parser difficult. Typically, such parsers are incomplete, failing to specify how to handle situations where the data does not conform to the expected format.

In this talk, I will describe the PADS project, which aims to provide languages and tools for simplifying the analysis of ad hoc data. We have designed a declarative data-description language, PADS/C, expressive enough to describe the data sources we see in practice at AT&T, including ASCII, binary, EBCDIC (Cobol), and mixed formats. From PADS/C we generate a C library with functions for parsing, manipulating, summarizing, querying, and writing the data.

This work is joint with Bob Gruber, Mary Fernandez, David Walker, Yitzhak Mandelbaum, and Mark Daly.

Date 4:15-5:30 pm, Tuesday, November 29, 2005
Place Gates 104
Speaker Sanjit Seshia, University of California, Berkeley
Title SAT-Based Decision Procedures and Malware Detection
Abstract SAT-based decision procedures operate by performing a satisfiability-preserving encoding of their input to a Boolean satisfiability (SAT) problem, on which a SAT solver is invoked. In this talk I will present UCLID, a verification tool based on SAT-based decision procedures, and describe an application to detecting malware (e.g., viruses and worms). UCLID's SAT-based decision procedures are for quantifier-free first-order logics involving arithmetic. These have been used within a malware detector that shows greater resilience to malware obfuscations than commercial tools. I will describe the notion of a "semantic signature," the detection algorithm, and experimental results.

Date 3:30-4:30 pm, Tuesday, November 15, 2005
Place Gates 104
Speaker Norman Ramsey, Harvard University
Title A Low-level Approach to Reuse for Programming-Language Infrastructure
Abstract New ideas in programming languages are best evaluated experimentally. But experimental evaluation is helpful only if there is an implementation that is efficient enough to encourage programmers to use the new features. Ideally, language researchers would build efficient implementations by reusing existing infrastructure, but existing infrastructures do not serve researchers well: in high-level infrastructures, many high-level features are built in and can't be changed, and in low-level infrastructures, it is hard to support important *run-time* services such as garbage collection, exceptions, and so on.

I am proposing a different approach: for reuse with many languages, an infrastructure needs *two* low-level interfaces: a compile-time interface and a *run-time* interface. If both interfaces provide appropriate mechanisms, the mechanisms can be composed to build many high-level abstractions, leaving the semantics and cost model up to the client.

In this talk, I will illustrate these ideas with examples drawn from two parts of the C-- language infrastructure: exception dispatch and procedure calls. I will focus on the mechanisms that make it possible for you to choose the semantics and cost model you want. For exceptions, these mechanisms are drawn from both compile-time and run-time interfaces, and together they enable you to duplicate all the established techniques for implementing exceptions. For procedure calls, the mechanisms are quite different; rather than provide low-level mechanisms that combine to form different kinds of procedure calls, I have found it necessary to extend the compile-time interface to enable direct control of the semantics and cost of procedure calls. I will also sketch some important unsolved problems regarding mechanisms to support advanced control features such as threads and first-class continuations.

Date 4:15-5:30 pm, Tuesday, November 8, 2005
Place Gates 104
Speaker Terence Parr, University of San Francisco
Title ANTLR and Computer Language Implementation
Abstract While parsing has been well understood for decades and a number of decent parser generators exist, anyone who has built an interpreter, translator, or compiler of consequence will tell you that the overall problem of supporting language development has not been adequately solved. Yes, parsing has been solved in theory but many of the strongest parsing strategies and systems are cumbersome in practice. Moreover, parsing is only one component of language implementation and the other pieces have either been studied purely from a compiler point of view or have resulted in powerful but inaccessible solutions that programmers in the trenches are unable or unwilling to use.

My goal in this talk is twofold: (1) to convince you that there is still work to be done and interesting problems to solve in the realm of language tools such as IDEs tailored to language development, grammar reuse strategies, automatic grammar construction from sample inputs, and transformation systems that are accessible to the average programmer; (2) to demonstrate a few items from my ANTLR research program such as the LL(*) parsing algorithm, tree grammars, grammar rewrite rules, and ANTLRWorks grammar development environment.

Date 4:15-5:30 pm, Tuesday, November 1, 2005
Place Gates 104
Speaker George Necula, University of California, Berkeley
Title Data Structure Specifications via Local Equality Axioms
Abstract We describe a program verification methodology for specifying global shape properties of data structures by means of axioms involving arbitrary predicates on scalar fields and pointer equalities in the neighborhood of a memory cell. We show that such local invariants are both natural and sufficient for describing a large class of data structures. We describe a complete decision procedure for such a class of axioms. The decision procedure is not only simpler and faster than in other similar systems, but has the advantage that it can be extended easily with reasoning for any decidable theory of scalar fields.

Date 4:15-5:30 pm, Tuesday, October 18, 2005
Place Gates 104
Speaker Ulfar Erlingsson, Microsoft Research
Title Principles and Applications of Software Control-Flow Integrity
Abstract Current software attacks often build on exploits that subvert machine-code execution. The enforcement of a basic safety property, Control-Flow Integrity (CFI), can prevent such attacks from arbitrarily controlling program behavior. CFI enforcement is simple, and its guarantees can be established formally, even with respect to powerful adversaries. Moreover, CFI enforcement is practical: it is compatible with existing software and can be done efficiently using software rewriting in commodity systems. Finally, CFI provides a useful foundation for enforcing further security policies.

This talk describes CFI and an x86 implementation of CFI enforcement, assesses its security benefits against real-world attacks, and shows how the CFI guarantees can enable efficient software implementations of a protected shadow call stack and of access control for memory regions.

This is joint work with Martin Abadi, Mihai Budiu, and Jay Ligatti. More information about the work can be found at http://research.microsoft.com/research/sv/gleipnir/

Date 2:30-3:45 pm, Monday, October 10, 2005
Place Gates 104
Speaker Dan Grossman, University of Washington
Title Strong Atomicity for Today's Programming Languages
Abstract The data races and deadlocks that riddle threaded applications are an ever-greater impediment to reliable and responsive desktop applications. The lock-based approach to shared-memory concurrency (i.e., the approach taken in current programming languages such as Java) has software-engineering shortcomings compared to atomicity. An atomic construct is a concurrency primitive that executes code as though no other thread has interleaved execution. To ensure correctness, fair scheduling, and reasonable performance, we advocate a logging-and-rollback approach to implementing atomic. Moreover, we believe we can implement atomic well enough on today's commodity hardware to utilize atomicity and further investigate its usefulness.

This talk will describe ongoing work designing and implementing languages with atomicity. After describing the advantages of atomicity, we will describe our experience with two prototypes: (1) AtomCaml is a working prototype that extends the mostly-functional language OCaml with atomicity. OCaml does not support true parallelism (it essentially assumes a uniprocessor), which lets us perform key optimizations for this common case. In particular, non-atomic code can run unchanged. (2) AtomJava is a Java extension currently under development. It implements atomicity in terms of locks (which are not visible to programmers) without potential deadlock. The AtomJava compiler produces Java source code that can run on any Java implementation.