The Stanford Software Seminar is usually held on Mondays in various rooms in the Gates building. Talks are open to anyone.
To subscribe to the seminar mailing list, send email to email@example.com from the email address you wish to subscribe. Likewise, to unsubscribe, send an email to firstname.lastname@example.org from the subscribed email address. In either case, the subject and body of your email will be ignored.
|Date||4:00-5:00, Tuesday, May 28|
|Title||SMT Solvers for Software Reliability and Security|
SMT solvers increasingly play a central role in the construction of reliable and secure software, regardless of whether such reliability/security is ensured through formal methods, program analysis or testing. This dramatic influence of SMT solvers on software engineering as a discipline is a recent phenomenon, largely attributable to impressive gains in solver efficiency and expressive power.
In my talk, I will motivate the need for SMT solvers, sketch out their research story thus far, and then describe my contributions to solver research. Specifically, I will talk about two SMT solvers that I designed and implemented, namely, STP and HAMPI, currently being used in 100+ research projects. I will talk about real-world applications enabled by my solvers, and the techniques I developed that helped make them efficient.
Time permitting, I will also talk about some theoretical results in the context of SMT solving.
|Speaker Bio||Vijay Ganesh is an assistant professor at the University of Waterloo, Canada, since September 2012. Prior to that he was a Research Scientist at MIT, and completed his PhD in computer science from Stanford University.|
|Date||2:00-3:00, Monday, May 20|
|Speaker||Cristian Cadar, Imperial College London|
|Title||Safe Software Updates via Multi-version Execution|
|Abstract||Software systems are constantly evolving, with new versions and patches being released on a continuous basis. Unfortunately, software updates present a high risk, with many releases introducing new bugs and security vulnerabilities. We tackle this problem using a simple but effective multi-version based approach. Whenever a new update becomes available, instead of upgrading the software to the new version, we run the new version in parallel with the old one; by carefully coordinating their executions and selecting the behavior of the more reliable version when they diverge, we create a more secure and dependable multi-version application. We implemented this technique in Mx, a system targeting Linux applications running on multi-core processors, and show that it can be applied successfully to several real applications, such as Lighttpd and Redis.|
|Date||3:00-4:00, Monday, May 6|
|Speaker||Rupak Majumdar, Max Planck Institute|
|Title||Static Provenance Verification for Message-Passing Programs|
|Abstract||Provenance information records the source and ownership history of an object. We study the problem of static provenance tracking in concurrent programs in which several principals execute concurrent processes and exchange messages over unbounded but unordered channels. The provenance of a message, roughly, is a function of the sequence of principals that have transmitted the message in the past. The provenance verification problem is to statically decide, given a message passing program and a set of allowed provenances, whether the provenance of all messages in all possible program executions, belongs to the allowed set. We formalize the provenance verification problem abstractly in terms of well-structured provenance domains, and show a general decidability result for it. In particular, we show that if the provenance of a message is a sequence of principals who have sent the message, and a provenance query asks if the provenance lies in a regular set, the problem is decidable and EXPSPACE-complete. We describe an implementation of our technique to check provenances of messages in Firefox extensions. (Joint work with Roland Meyer and Zilong Wang)|
|Date||2:30-3:30, Thurs., Dec. 6|
|Speaker||Christoph Kirsch, University of Salzburg|
|Title||Distributed Queues: Faster Pools and Better Queues|
Designing and implementing high-performance concurrent data structures whose access performance scales on multicore hardware is difficult. An emerging remedy to scalability issues is to relax the sequential semantics of the data structure and exploit the resulting potential for parallel access in relaxed implementations. However, a major obstacle in the adoption of relaxed implementations is the belief that their behavior becomes unpredictable. We therefore aim at relaxing existing implementations systematically for better scalability and performance without incurring a cost in predictability. We present distributed queues (DQ), a new family of relaxed concurrent queue implementations. DQ implement bounded or unbounded out-of-order relaxed queues with strict (i.e. linearizable) emptiness check. Our comparison of DQ against existing pool, and strict and relaxed queue implementations reveals that DQ outperform and outscale the state-of-the-art implementations. We also empirically show that the shorter execution time of queue operations of fast but relaxed implementations such as DQ (i.e. the degree of reordering through overlapping operations) may offset the effect of semantical relaxations (i.e. the degree of reordering through relaxation) making them appear as behaving as or sometimes even more FIFO than strict but slow implementations.
This is joint work with A. Haas, T.A. Henzinger, M. Lippautz, H. Payer, A. Sezgin, and A. Sokolova
|Speaker Bio||Christoph Kirsch is full professor and holds a chair at the Department of Computer Sciences of the University of Salzburg, Austria. Since 2008 he is also a visiting scholar at the Department of Civil and Environmental Engineering of the University of California, Berkeley. He received his Dr.Ing. degree from Saarland University, Saarbruecken, Germany, in 1999 while at the Max Planck Institute for Computer Science. From 1999 to 2004 he worked as Postdoctoral Researcher at the Department of Electrical Engineering and Computer Sciences of the University of California, Berkeley. His research interests are in concurrent programming and systems, virtual execution environments, and embedded software. Dr. Kirsch co-invented the Giotto and HTL languages, and leads the JAviator UAV project for which he received an IBM faculty award in 2007. He co-founded the International Conference on Embedded Software (EMSOFT), has been elected ACM SIGBED chair in 2011, and is currently associate editor of ACM TODAES.|
|Date||2:00-3:00, July 16, 2012|
|Speaker||Jesse Tov, Harvard|
|Title||Practical Programming with Substructural Types|
Substructural logics remove from classical logic rules for reordering,
duplication, or dropping of assumptions. Because propositions in such a
logic may no longer be freely copied or ignored, this suggests understanding
propositions in substructural logics as representing resources rather than
truth. For the programming language designer, substructural logics thus
provide a framework for considering type systems that can track the changing
states of logical and physical resources.
While several substructural type systems have been proposed and implemented, many of these have targeted substructural types at a particular purpose, rather than offering them as a general facility. The more general substructural type systems have been theoretical in nature and too unwieldy for practical use. This talk presents the design of a general purpose language with substructural types, and discusses several language design problems that had to be solved in order to make substructural types useful in practice.
|Date||2:00-3:00, July 18, 2012|
|Speaker||Aditya Thakur, U. Wisconsin|
|Title||A Deductive Algorithm for Symbolic Abstraction with Applications to SMT|
This talk presents connections between logic and abstract
interpretation. In particular, I will present a new algorithm for the
problem of "symbolic abstraction": Given a formula \phi in a logic L
and an abstract domain A, the symbolic abstraction of \phi is the best
abstract value in A that over-approximates the meaning of \phi. When
\phi represents a concrete transformer, algorithms for symbolic
abstraction can be used to automatically synthesize the corresponding
abstract transformer. Furthermore, if the symbolic abstraction of
\phi is bottom, then \phi is proved unsatisfiable.
The bottom line is that our algorithm is "dual-use": (i) it can be used by an abstract interpreter to compute abstract transformers, and (ii) it can be used in an SMT (Satisfiability Modulo Theories) solver to determine whether a formula is satisfiable.
The key insight behind the algorithm is that Staalmarck's method for satisfiability checking of propositional-logic formulas can be explained using concepts from the field of abstract interpretation. This insight then led to the discovery of the connection between Staalmarck's method and symbolic abstraction, and the extension of Staalmarck's method to richer logics, such as quantifier-free linear real arithmetic.
This is joint work with Prof. Thomas Reps.
|Date||3:00-4:00, May 24, 2012|
|Speaker||Nataliya Guts, University of Maryland|
|Title||Polymonads: Reasoning and Inference|
Many useful programming constructions can be expressed as monads. Examples
include probabilistic modeling, functional reactive programming, parsing, and
information flow tracking, not to mention effectful functionality like state
and I/O. In our previous work[SGLH11], we presented a type-based rewriting
algorithm to make programming with arbitrary monads as easy as ubuilt-in support for state and I/O. Developers write programs using monadic
values of type m t as if they were of type t, and our algorithm inserts the
necessary binds, units, and monad-to-monad morphisms so that the program
A number of other programming idioms resemble monads but deviate from the standard monad binding mechanism. Examples include parameterized monads, monads for effects, information flow state tracking. Our present work aims to provide support for formal reasoning and lightweight programming for such constructs. We present a new expressive paradigm, polymonads, including the equivalent of monad and morphism laws. Polymonads subsume conventional monads and all other examples mentioned above. On the practical side, we provide an extension of our type inference rewriting algorithm to support lightweight programming with polymonads.
[SGLH11] N. Swamy, N. Guts, D. Leijen, M. Hicks. Lightweight Monadic Programming in ML. In ICFP, 2011.
|Date||3:00-4:00, November 7, 2011|
|Speaker||Anupam Datta, CMU|
|Title||Policy Auditing over Incomplete Logs|
|Abstract||We present the design, implementation and evaluation of an algorithm that checks audit logs for compliance with privacy and security policies. The algorithm, which we name reduce, addresses two fundamental challenges in compliance checking that arise in practice. First, in order to be applicable to realistic policies, reduce operates on policies expressed in a first-order logic that allows restricted quantification over infinite domains. We build on ideas from logic programming to identify the restricted form of quantified formulas. The resulting logic is more expressive than prior logics used for compliance-checking, including propositional temporal logics and metric first-order temporal logic, and, in contrast to these logics, can express all 84 disclosure-related clauses in the HIPAA Privacy Rule. Second, since audit logs are inherently incomplete (they may not contain sufficient information to determine whether a policy is violated or not), reduce proceeds iteratively: in each iteration, it provably checks as much of the policy as possible over the current log and outputs a residual policy that can only be checked when the log is extended with additional information. We prove correctness, termination, time and space complexity results for reduce.We implement reduce and evaluate it by checking simulated audit logs for compliance with the HIPAA Privacy Rule. Our experimental results demonstrate that the algorithm is fast enough to be used in practice.|
Anupam Datta is an Assistant Research Professor at Carnegie Mellon University
where he is affiliated with CyLab, the Electrical and Computer Engineering
Department, and (by courtesy) the Computer research focuses on foundations of security and privacy. One area of focus has
been on programming language methods for compositional security. His work on
Protocol Composition Logic and the Logic of Secure Systems has uncovered new
principles for compositional security and has been applied successfully to find
attacks in and verify properties of a number of practical cryptographic
protocols and secure systems. A second area of focus has been on formalizing
and enforcing privacy policies. He has worked on a Logic of Privacy that
formalizes concepts from contextual integrity --- a philosophical theory of
privacy as a right to appropriate flows of personal information. His group has
produced the first complete formalization of the HIPAA Privacy Rule using this
logic and developed principled audit mechanisms for enforcing policies
expressed in the logic.
Dr. Datta has co-authored a book and over 30 publications in conferences and journals on these topics. He serves on the Steering Committee of the IEEE Computer Security Foundations Symposium. He has served as Program Co-Chair of the 2011 Formal Aspects of Security and Trust Workshop and the 2008 Formal and Computational Cryptography Workshop. Dr. Datta obtained MS and PhD degrees from Stanford University and a BTech from IIT Kharagpur, all in Computer Science.
|Date||3:00-4:00, Oct 4, 2011|
|Speaker||David Basin, ETH Zurich|
|Title||Policy Monitoring in First-order Temporal Logic|
|Abstract||In security and compliance, it is often necessary to ensure that agents and systems comply to complex policies. An example from financial reporting is the requirement that every transaction t of a customer c, who has within the last 30 days been involved in a suspicious transaction t', must be reported as suspicious within 2 days. We present an approach to monitoring such policies formulated in an expressive fragment of metric first-order temporal logic. We also report on case studies in security and compliance monitoring and use these to evaluate both the suitability of this fragment for expressing complex, realistic policies and the efficiency of our monitoring algorithm.|
David Basin is a full professor and has the chair for Information Security at the Department of Computer Science, ETH Zurich since 2003. He is also the director of the ZISC, the Zurich Information Security Center.
He received his bachelors degree in mathematics from Reed College in 1984, his Ph.D. from Cornell University in 1989, and his Habilitation from the University of Saarbr|cken in 1996. His appointments include a postdoctoral research position at the University of Edinburgh (1990 - 1991), and afterwards he led a subgroup, within the programming logics research group, at the Max-Planck-Institut f|r Informatik (1992 - 1997). From 1997 - 2002 he was a full professor at the University of Freiburg where he held the chair for software engineering.
His research focuses on information security, in particular methods and tools for modeling, building, and validating secure and reliable systems.
|Date||3:00-4:00 pm, June 10|
|Speaker||Koushik Sen, UC Berkeley|
|Title||Specifying and Checking Correctness of Parallel Programs|
|Abstract||The spread of multicore processors and manycore graphics processing units has greatly increased the need for parallel correctness tools. Reasoning about parallel multi-threaded programs is significantly more difficult than for sequential programs due to non-determinism. We believe that the only way to tackle this complexity is to separate reasoning about parallelism correctness (i.e., that a parallel program gives the same outcome despite thread interleavings) from reasoning about functional correctness (i.e., that the program produces the correct outcome on a thread interleaving). In this talk, I will describe two fundamental techniques for separating the parallelization correctness aspect of a program from its functional correctness. The first idea consists of extending programming languages with constructs for writing specifications, called bridge assertions, that focus on relating outcomes of two parallel executions differing only in thread-interleavings. The second idea consists of allowing a programmer to use a non-deterministic sequential program as the specification of a parallel one. For functional correctness, it is then enough to check the sequential program. For parallelization correctness, it is sufficient to check the deterministic behavior of the parallel program with respect to the non-deterministic sequential program. To check parallel correctness, we have developed a new scalable automated method for testing and debugging, called active testing. Active testing combines the power of imprecise program analysis with the precision of software testing to quickly discover concurrency bugs and to reproduce discovered bugs on demand.|
|Speaker Bio||Koushik Sen is an assistant professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. His research interest lies in Software Engineering, Programming Languages, and Formal methods. He is interested in developing software tools and methodologies that improve programmer productivity and software quality. He is best known for his work on directed automated random testing and concolic testing. He has received a NSF CAREER Award in 2008, a Haifa Verification Conference (HVC) Award in 2009, a IFIP TC2 Manfred Paul Award for Excellence in Software: Theory and Practice in 2010, and a Sloan Foundation Fellowship in 2011. He has won three ACM SIGSOFT Distinguished Paper Awards. He received the C.L. and Jane W-S. Liu Award in 2004, the C. W. Gear Outstanding Graduate Award in 2005, and the David J. Kuck Outstanding Ph.D. Thesis Award in 2007 from the UIUC Department of Computer Science. He holds a B.Tech from Indian Institute of Technology, Kanpur, and M.S. and Ph.D. in CS from University of Illinois at Urbana-Champaign.|
|Date||1:00-2:00, Friday June 3|
|Speaker||Andreas Zeller, Saarland University|
|Title||Mining Precise Specifications|
|Abstract||Recent advances in software validation and verification make it possible to widely automate the check whether a specification is satisfied. This progress is hampered, though, by the persistent difficulty of writing specifications. Are we facing a "specification crisis"? By mining specifications from existing systems, we can alleviate this burden, reusing and extending the knowledge of 60 years of programming, and bridging the gap between formal methods and real-world software. But mining specifications has its challenges: We need good usage examples to learn expected behavior; we need to cope with the approximations of static and dynamic analysis; and we need specifications that are readable and relevant to users. In this talk, I present the state of the art in specification mining, its challenges, and its potential, up to a vision of seamless integration of specification and programming.|
|Speaker Bio||Andreas Zeller is a full professor for Software Engineering at Saarland University in Saarbr|cken, Germany. His research concerns the analysis of large software systems and their development process; his students are funded by companies like Google, Microsoft, or SAP. In June 2011, Zeller will be inducted as Fellow of the ACM for his contributions to automated debugging and mining software archives.|
|Date||2:00-3:00, Monday, Feb. 7|
|Speaker||Michael Franz, UC Irvine|
|Title||Recent Advances In Compiler Research - Firefox's TraceMonkey and Beyond|
|Speaker Bio||Prof. Michael Franz is a Professor of Computer Science in UCI's Donald Bren School of Information and Computer Sciences, a Professor of Electrical Engineering and Computer Science (by courtesy) in UCI's Henry Samueli School of Engineering, and the director of UCI's Secure Systems and Software Laboratory. He is currently also a visiting Professor of Informatics at ETH Zurich, the Swiss Federal Institute of Technology, from which he previously received the Dr. sc. techn. (advisor: Niklaus Wirth) and the Dipl. Informatik-Ing. ETH degrees.|
|Date||11:00-12:00, Tuesday, December 7th|
|Speaker||Hongseok Yang, University of London|
|Title||Automatic Program Analysis of Overlaid Data Structures|
We call a data structure overlaid, if a node in the structure
includes links for multiple data structures and these links
are intended to be used at the same time. These overlaid data
structures are frequently used in systems code, when
implementing multiple types of indexing structures over the same set of nodes.
For instance, the deadline IO scheduler of Linux has a queue whose
node has links for a doubly-linked list as well as those for a red-black tree.
The doubly-linked list here is used to record the order that nodes are inserted
in the queue, and the red-black tree provides an efficient indexing structure on
the sector fields of the nodes.
In this talk, I will describe an automatic program analysis of these overlaid data structures. The focus of the talk will be on two main issues: to represent such data structures effectively and to build an efficient yet precise program analyser, which can prove the memory safety of realistic examples, such as the Linux deadline IO scheduler. During the talk, I will explain how we addressed the first issue by the combination of standard classical conjunction and separating conjunction from separation logic. Also, I will describe how we used a meta-analysis and the dynamic insertion of ghost instructions in solving the second issue. If time permits, I will give a demo of the tool.
This is a joint work with Oukseh Lee and Rasmus Petersen.
|Date||2:00-3:00, October 25|
|Speaker||Patrick Eugster, Purdue|
|Title||Distributed Event-based Programming in Java|
The abstraction of "event" has been used for years to reason about concurrent and distributed programs, and is being
increasingly used as a programming paradigm. Developing distributed event-based applications is currently challenging
for programmers though as it involves integrating a number of technologies besides dealing with an abstraction that cuts
across more traditional programming paradigms.
EventJava is an extension of the mainstream Java language targeting at simplifying the development of a wide range of event-based applications. In this talk, we first provide an overview of select features of the EventJava language framework and its implementation. Then we present a performance evaluation from different viewpoints. We conclude with an outlook on future work.
|Date||1:00-2:00, Wednesday, October 13|
|Speaker||Erik Meijer, Microsoft|
|Title||Fundamentalist Functional Programming|
In 1984, John Hughes wrote a seminal paper titled "Why Functional Programming Matters", in which he
eloquently explained the value of pure and lazy functional
programming. Due to the increasing importance of the Web and the
advent of many-core machines, in the quarter of a century since the
paper was written, the problems associated with imperative languages
and their side effects have become increasingly evident.
This talk argues that fundamentalist functional programming-that is, radically eliminating all side effects from programming languages, including strict evaluation-is what it takes to conquer the concurrency and parallelism dragon. Programmers must embrace pure, lazy functional programmin g-with all effects apparent in the type system of the host language using monads.
A radical paradigm shift is the answer, but does that mean that all current programmers will be lost along the way? Fortunately not! By design, LINQ is based on monadic principles, and the success of LINQ proves that the world does not fear the monads.
|Speaker Bio||Erik Meijer is an accomplished programming-language designer who has worked on a wide range of languages, including Haskell, Mondrian, X#, C#, and Visual Basic. He runs the Cloud Languages Team in the Business Platform Division at Microsoft, where his primary focus has been to remove the impedance mismatch between databases and programming languages in the context of the Cloud. One of the fruits of these efforts is LINQ, which not only adds a native querying syntax to .NET languages, such as C# and Visual Basic, but also allows developers to query data sources other than tables, such as objects or XML. Most recently, Erik has been working on and preaching the virtues of fundamentalist functional programming in the new age of concurrency and many-core. Some people might recognize him from his brief stint as the "Head in the Box" on Microsoft VBTV. These days, you can regularly watch Erik's interviews on the "Expert to Expert" and "Going Deep" series on Channel 9.|
|Date||2:00-3:00, September 27|
|Speaker||Sorin Lerner, UC San Diego|
|Title||Strategies for Building Correct Optimizations|
|Abstract||Program analyses and optimizations are at the core of many optimizing compilers, and their correctness in this setting is critically important because it affects the correctness of any code that is compiled. However, designing correct analyses and optimizations is difficult, error prone and time consuming. This talk will present several inter-related approaches for building correct analyses and optimizations. I will start with an approach based on a domain-specific language for expressing optimizations, combined with a checker for verifying the correctness of optimizations written in this language. This first approach still requires the programmer to write down the optimizations in full detail. I will then move on to several techniques which instead attempt to synthesize correct optimizations from a higher-level description. In particular, I will present an approach that discovers correct optimization opportunities by exploring the application of equality axioms on the program being optimized. Finally, I will present a technique that synthesizes generally applicable and correct optimization rules from concrete examples of code before and after some transformations have been performed.|
|Speaker Bio||Sorin Lerner is an Assistant Professor in the Department of Computer Science and Engineering at the University of California, San Diego. He received his PhD in 2006 from the University of Washington, under the supervision of Craig Chambers. Before that, he received an undergraduate degree in Computer Engineering from McGill University, Montreal. His research interests lie in programming language and analysis techniques for making software systems easier to write, maintain and understand, including static program analysis, domain specific languages, compilation, formal methods and automated theorem proving. Lerner works actively at the interface of Programming Languages and Software Engineering, and frequently publishes at POPL/PLDI and ICSE/FSE. Sorin Lerner was the co-chair of the 2010 ACM SIGPLAN-SIGSOFT PASTE workshop, and is the recipient of an NSF Career Award (2007), and of the 2003 PLDI Best paper award.|
|Date||2:00-3:00, Wednesday, September 15|
|Speaker||Madan Musuvathy, Microsoft Research|
A Probabilistic Algorithm for Finding Concurrency Errors |
(or How to Crash Your Program in the First Few Runs)
Unexpected thread interleavings can lead to concurrency errors that are hard to find, reproduce, and debug. In this talk, I will present a probabilistic algorithm for finding such errors. The algorithm works by randomly perturbing the timing of threads and event handlers at runtime. Every run of the algorithm finds every concurrency bug in the program with some (reasonably large) probability. Repeated runs can be used to reduce the chance of missing bugs to any desired amount. The algorithms scales to large programs and, in many cases, finds bugs in the first few runs of a program. A tool implementing this algorithm is being used to improve the concurrency testing at Microsoft for over a year.
I will also describe the relationship between this algorithm and the dimension theory of partial-orders, and how results from this field can be used to further improve the algorithm.
|Speaker Bio||Madan Musuvathi is a researcher at Microsoft Research interested in software verification, program analysis, and systems. Recently, he has focused on the scalable analysis of concurrent systems. He received his Ph.D. at Stanford University in 2004 and has been at Microsoft Research since.|
|Date||1:00-2:00, August 24|
|Speaker||Noam Rinetzky, Queen Mary University of London|
|Title||Verifying Linearizability with Hindsight|
We present a proof of safety and linearizability of a highly-concurrent optimistic set algorithm. The key step in our proof is the Hindsight Lemma, which allows a thread to infer the existence of a global state in which its operation can be linearized based on limited local atomic observations about the shared state. The Hindsight Lemma allows us to avoid one of the most complex and non-intuitive steps in reasoning about highly concurrent algorithms: considering the linearization point of an operation to be in a different thread than the one executing it.
The Hindsight Lemma assumes that the algorithm maintains certain simple invariants which are resilient to interference, and which can themselves be verified using purely thread-local proofs. As a consequence, the lemma allows us to unlock a perhaps-surprising intuition: a high degree of interference makes non-trivial highly-concurrent algorithms in some cases much easier to verify than less concurrent ones.
Joint work with Peter W. O'Hearn (Queen Mary University of London), Martin T. Vechev (IBM T.J. Watson Research Center), Eran Yahav (IBM T.J. Watson Research Center), and Greta Yorsh (IBM T.J. Watson Research Center).
Presented in the 29th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC'10).
Link to paper: http://www.eecs.qmul.ac.uk/~maon/pubs/PODC10-hindsight.pdf
|Date||2:00-3:00, Monday, August 16|
|Speaker||Greta Yorsh, IBM Research|
|Title||Specializing Memory Management for Concurrent Data Structures|
Memory reclamation plays a central role in the design of concurrent
data structures. The main challenge is to equip a particular
concurrent data structure with its own custom memory reclamation, in a
way that is both correct and efficient. This problem arises frequently
in environments that do not support automatic memory management, but
it is also relevant in the case where we want to obtain a more
efficient concurrent data structure.
Unfortunately, despite various proposals, the most prevalent
methodologies today such as hazard pointers are not well understood,
and applying them is still an ad-hoc, error-prone, difficult, and
time-consuming manual process.
We propose a systematic approach to specialization of memory reclamation to a particular concurrent data structure. We start with a concurrent algorithm that is proven to behave correctly assuming automatic memory reclamation. We apply a sequence of correctness-preserving transformations to both the memory reclamation scheme and the algorithm. These transformation rely on invariants of the algorithm, computed by standard analyses and clearly illustrate why a given transformation is applied and what are the conditions under which it can be applied safely. We demonstrate our approach by systematically deriving correct and efficient custom memory reclamation for state-of-the-art concurrent data structure algorithms, including several variations of concurrent stack, queue, and set algorithms.
(joint work in progress with Martin Vechev and Eran Yahav)
|Date||CHANGED: 1:00-2:00 Friday, August 6|
|Place||CHANGED: Gates 463a|
|Speaker||Cindy Rubio Gonzalez, University of Wisconsin|
|Title||Error Propagation Analysis for File Systems|
Unchecked errors are especially pernicious in operating system file management code. Transient or
permanent hardware failures are inevitable, and error-management bugs at the file system layer can
cause silent, unrecoverable data corruption. Furthermore, even when developers have the best of
intentions, inaccurate documentation can mislead programmers and cause software to fail in
We propose an interprocedural static analysis that tracks errors as they propagate through file system code. Our implementation detects overwritten, out-of-scope, and unsaved unchecked errors. Analysis of four widely-used Linux file system implementations (CIFS, ext3, IBM JFS and ReiserFS), a relatively new file system implementation (ext4), and shared virtual file system (VFS) code uncovers 312 confirmed error propagation bugs. Our flow- and context-sensitive approach produces more precise results than related techniques while providing better diagnostic information, including possible execution paths that demonstrate each bug found.
Additionally, we use our error-propagation analysis framework to identify the error codes returned by system calls across 52 Linux file systems. We examine mismatches between documented and actual error codes returned by 42 Linux file-related system calls. Comparing analysis results with Linux manual pages reveals over 1,700 undocumented error-code instances affecting all file systems and system calls examined.
|Date||2:00-3:00, Thursday, July 22|
|Speaker||Byron Cook, Microsoft Research, Cambridge|
|Title||New methods for proving temporal properties of infinite-state systems|
|Abstract||I will describe some new methods of proving temporal properties of infinite-state programs. Our approach takes advantage of the fact that linear-temporal properties can often be proved more efficiently using proof techniques usually associated with the branching-time logic CTL. The caveat is that, in certain instances, nondeterminism in the system's transition relation can cause CTL methods to report counterexamples that are spurious in LTL. To address this problem we describe an algorithm that, as it attempts to apply CTL proof methods, finds and then removes problematic nondeterminism via an analysis on the spurious counterexamples. We must also develop CTL symbolic model checking tools for infinite-state systems.|
Dr. Byron Cook is a Principal Researcher at Microsoft Research
in Cambridge, UK as well as Professor of Computer Science at Queen
Mary, University of London. He is one of the developers of the
Terminator program termination proving tool, as well as the SLAM
software model checker. Before joining Microsoft Research he was a
developer in the Windows OS kernel group. See
|Date||1:30-2:30, Tuesday, June 1|
|Speaker||Sanjit Seshia, UC Berkeley|
|Title||Integrating Induction and Deduction for Verification and Synthesis|
Even with impressive advances in formal methods over the last few
decades, some problems in automatic verification and synthesis remain
challenging. Examples include the verification of quantitative
properties of software such as execution time, and certain program
synthesis problems. In this talk, I will present a new approach to
automatic verification and synthesis based on a combination of
inductive methods (learning from examples), and deductive methods
(based on logical inference and constraint solving).|
Our approach integrates verification techniques such as satisfiability solving and theorem proving (SAT/SMT), numerical simulation, and fixpoint computation with inductive inference methods including game-theoretic online learning, learning Boolean functions and learning polyhedra. My talk will illustrate this combination of inductive and deductive reasoning for three problems: (i) program synthesis applied to malware deobfuscation; (ii) the verification of execution time properties of embedded software, and (briefly) (iii) the synthesis of switching logic for hybrid systems.
|Speaker Bio||Sanjit A. Seshia is an assistant professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He received an M.S. and a Ph.D. in Computer Science from Carnegie Mellon University, and a B.Tech. in Computer Science and Engineering from the Indian Institute of Technology, Bombay. His research interests are in dependable computing and computational logic, with a current focus on applying automated formal methods to problems in embedded systems, computer security, and electronic design automation. He has received a Presidential Early Career Award for Scientists and Engineers (PECASE), an Alfred P. Sloan Research Fellowship, and the School of Computer Science Distinguished Dissertation Award at Carnegie Mellon University.|
|Date||2:00-3:00, Tuesday, May 11th|
|Speaker||Mark Wegman, IBM Research|
|Title||Managing Businesses that Design|
|Abstract||Software development is fundamentally a design process. The quality of the eventual outcome depends on how well people can come together to create a pleasing design. Different organizations may be vastly better or worse than others in how they create designs. Understanding how what an organization does well is similar in many ways to debugging a program. You instrument it, subject to concerns about privacy. The instrumentation can be done via the tools that we use to build software as that's what people use in the organization. Given the needs of the organization to change, those tools may also need to change. The analysis they do on the Software artifacts may change as well. This is new work in an attempt to define a vision of a potential new science on the management of design. It should be noted that we are not advocating that the best management is more intrusive management -- sometimes the best management recognizes that to accomplish what is needed people need to take more risks up front.|
|Date||1:30-2:30, Wednesday, May 12|
|Speaker||John Field, IBM Research|
|Title||The Thorn Programming Language: Robust Distributed Scripting|
Scripting languages enjoy great popularity due to their support
for rapid and exploratory development. They typically have
lightweight syntax, weak data privacy, dynamic typing, and
powerful aggregate data types. The price of these features comes
later in the software life cycle. Scripts are hard to evolve and
compose, and often slow. An additional weakness of most scripting
languages is lack of support for distributed computing---though
distribution is required for scalability and interacting with
remote services. Thorn, developed jointly by IBM Research and
Purdue University, is a modern scripting language addressing these
issues. It enjoys most of the advantages of scripting languages,
but provides support for software evolution and robustification,
e.g., an expressive module system and type annotation facilities.
It has distributed computing built in the core language.
This is joint work with Bard Bloom, Brian Burg, Nate Nystrom, Johan Vstlund, Gregor Richards, Rok Strnia, Jan Vitek, and Tobias Wrigstad.
|Date||3:00-4:00, Monday, April 26th|
|Speaker||Ken McMillan, Cadence|
|Title||Relevance Heuristics for Program Analysis|
|Abstract||Relevance heuristics allow us to tailor a program analysis to a particular property to be verified. This in turn makes it possible to improve the precision of the analysis where needed, while maintaining scalability. In this talk I will discuss the principles by which SAT solvers and other decision procedures decide what information is relevant to a given proof. Then we will see how these ideas can be exploited in program verification using the method of Craig interpolation. The result is an analysis that is finely tuned to prove a given property of a program. At the end of the talk, I will cover some recent research in this area, including the use of interpolants for verifying heap-manipulating programs, generating procedure summaries, and generating program tests.|
|Date||10:00-11:00, Monday, April 12th|
|Speaker||Sorav Bansal, IIT Delhi|
|Title||An Optimizing Virtualization Engine|
|Abstract||I will talk about our early experiences with developing a high-performance binary translator to perform hardware virtualization. Our binary translator is capable of performing dynamic runtime optimizations for the OS and it's applications, is transparent to the user, can run an unmodified OS, and can install incrementally on an existing system. I will present our experiences with development and some early results.|
|Date||4:00-5:00, Tuesday, March 9|
|Speaker||Harry Mairson, Brandeis University|
|Title||Linear Logic and the Complexity of Control Flow Analysis|
Static program analysis is about predicting the future: it's what compilers do at compile-time to predict and optimize what happens at run-time. What is the tradeoff between the efficiency and the precision of the computed prediction? Control flow analysis (CFA) is a canonical form of program analysis that answers questions like "can call site X ever call procedure P?" or "can procedure P ever be called with argument A?" Such questions can be answered exactly by Girard's geometry of interaction (GoI), but in the interest of efficiency and time-bounded computation, control flow analysis computes a crude approximation to GoI, possibly including false positives.
Different versions of CFA are parameterized by their sensitivity to calling contexts, as represented by a contour-a sequence of k labels representing these contexts, analogous to Levy's labelled lambda calculus. CFA with larger contours is designed to make the approximation more precise. A naive calculation shows that 0CFA (i.e., with k=0) can be solved in polynomial time, and kCFA (k>0, a constant) can be solved in exponential time. We show that these bounds are exact: the decision problem for 0CFA is PTIME-hard, and for kCFA is EXPTIME-hard. Each proof depends on fundamental insights about the linearity of programs. In both the linear and nonlinear case, contrasting simulations of the linear logic exponential are used in the analysis. The key idea is to take the approximation mechanism inherent in CFA, and exploit its crudeness to do real computation.
This is joint work with David Van Horn (Brandeis University), presented at the 2008 ACM International Conference on Functional Programming.
|Date||11:00-12:00, Wednesday, March 3|
|Speaker||Ganesan Ramalingam, Microsoft Research|
In this talk I will address two problems.
In the first part, we consider the use of speculation, by programmers, as an algorithmic paradigm to parallelize seemingly sequential code. Execution order constraints imposed by dependences can serialize computation, preventing parallelisation of code and algorithms. Speculating on the value(s) carried by dependences is one way to break such critical dependences. We present language constructs that enable programmers to declaratively express speculative parallelism in programs. In general, speculation requires a runtime mechanism to undo the effects of speculative computation in the case of mispredictions. We describe a set of conditions under which such rollback can be avoided. We utilize a static analysis to check if a given program satisfies these conditions to enable safe use of these constructs without the overheads required for rollback.
In the second part, we consider the problem of making a sequential library thread-safe for concurrent clients. We consider a sequential library annotated with assertions along with a proof that these assertions hold in a sequential execution. We show how we can use the proof to derive concurrency control that ensures that any execution of the library methods, when invoked by concurrent clients, satisfies the same assertions.
|Date||2:00-3:00, Friday, February 5, 2010|
|Speaker||Eran Yahav, IBM Research|
|Title||Automatic Inference of Memory Fences|
This work addresses the problem of placing memory fences in a
concurrent program running on a relaxed memory model. Modern
architectures implement relaxed memory models in which memory
operations may be reordered and executed non-atomically.
Special instructions called memory fences are provided to the
programmer, allowing control of this behavior. To ensure
correctness of many algorithms, in particular of non-blocking
ones, a programmer is often required to explicitly insert
memory fences into her program. However, she must use as few
fences as possible, or the benefits of running on a relaxed
architecture will be lost. Placing memory fences is challenging
and extremely error prone, as it requires subtle reasoning
about the underlying memory model.
We present a framework for automatic inference of memory fences in concurrent programs, assisting the programmer with this complex task. Given a program, a specification, a description of the memory model, and a set of test cases, our framework computes a set of ordering constraints that guarantee the correctness of the program under the memory model for all provided test cases. The computed constraints are maximally permissive: removing any constraint from the solution would permit an execution violating the specification. Our framework then realizes the computed constraints as additional fences in the input program. We implemented our approach in a tool called FENDER and used it to infer correct and efficient placements of fences for several nontrivial algorithms, including practical concurrent data structures.
Joint work with Michael Kuperstein and Martin Vechev.
|Date||4:00-5:00pm, Monday, February 1, 2010|
|Speaker||David Bacon, IBM Research|
|Title||Liquid Metal: Eliminating the Boundary between Hardware and Software|
This talk will present Liquid Metal, an end-to-end system from language design to co-execution on hardware and software. The goal of the Liquid Metal project at IBM Research is to allow hybrid systems to be programmed in a single dynamic high-level object-oriented language that maps well to CPUs and FPGAs (and the architectures in between) -- to "JIT the Hardware". While at first glance it may seem that these different systems have conflicting requirements in terms of programming features, it is our belief that many of the features turn out to be highly beneficial in both environments when they are provided at a sufficiently high level of abstraction. By using a single language we open up the opportunity to hide the complexity of crossing domains from software into hardware, and facilitate a fluid movement of computation back and forth between different types of computational devices, choosing to execute code where it is most efficient to do so.
I will describe the key features of the language design, describe our compilation, synthesis, and run-time environment, and present initial results from our prototype system.
Joint work with Joshua Auerbach, Rodric Rabbah, and Perry Cheng
David F. Bacon is a Research Staff Member at IBM's T.J. Watson Research Center. He led the Metronome project which which pioneered hard real-time garbage collection, opening the use of high-level languages like Java for time-critical systems in financial trading, aerospace, defense, video gaming, and telecommunications. His program analysis and synchronization algorithms are included in most compilers and run-time systems for modern object-oriented languages.
Dr. Bacon received his Ph.D. in computer science from the University of California, Berkeley and his A.B. from Columbia University; in 2009 he was a Visiting Professor at Harvard University. He is a member of the IBM Academy of Technology and a Fellow of the ACM.
|Date||11:00-12:00, Wednesday, January 27, 2010|
|Speaker||Jonathan Aldrich, CMU|
|Title||Pragmatic Typestate Verification with Permissions|
Object-oriented libraries often define usage protocols that clients must follow in order for the system to work properly. These protocols may be poorly documented and difficult to follow, causing errors and significant lost productivity in software development.
We are exploring a new approach to verifying object protocols using permissions. These permissions track not only the current "typestate" of an object in its protocol, but an abstraction of what operations other aliases might perform on the object and what invariants must remain true. Developers annotate their code with state and permission information, which can be automatically and soundly checked for consistency. Our approach is fully modular, yet allows substantial reasoning about objects even when they are aliased by multiple clients. I will discuss extensions to check protocols in concurrent systems, our practical experience with the system, and current work toward a new programming language based on typestate.
Jonathan Aldrich is Associate Professor of Computer Science at Carnegie Mellon University. He is the director of CMU's undergraduate minor program in Software Engineering, and teaches courses in programming languages, software engineering and program analysis. Dr. Aldrich joined the CMU faculty after completing a Ph.D. at the University of Washington and a B.S. at Caltech.|
Aldrich's research contributions include verifying architectural structure and secure information flow, modular formal reasoning about code, and API protocol safety. For his work on architecture and information flow, Aldrich received a 2006 NSF CAREER award and the 2007 Dahl-Nygaard Junior Prize, given annually for a significant technical contribution to object-oriented programming.
|Date||2:00-3:00, Thursday, November 19, 2009,|
|Speaker||Sumit Gulwani, Microsoft Research|
|Title||The Reachability-bound Problem|
The "reachability-bound problem" is the problem of finding a symbolic worst-case bound on the number of times a given control location inside a procedure is visited in terms of the inputs to that procedure. This has applications in bounding resources consumed by a program such as time, memory, network-traffic, power, as well as estimating quantitative properties (as opposed to boolean properties) of data in programs, such as amount of information leakage or uncertainty propagation.
Our approach to solving the reachability-bound problem brings together three fundamentally different techniques for reasoning about loops in an effective manner. This includes (a) abstract-interpretation based iterative technique for computing precise disjunctive invariants to summarize nested loops, (b) arithmetic constraint solving based technique for computing ranking functions for individual paths inside loops, and (c) proof-rules based technique for appropriate composition of ranking functions for individual paths for precise loop bound computation.
We have implemented our solution to the reachability-bound problem in a tool called SPEED, which computes symbolic computational complexity bounds for procedures in .Net code-bases. The tool scales to large programs taking an average of around one second to analyze each procedure.
|Date||3:00-4:00, Monday, October 5, 2009|
|Speaker||Martin Rinard, MIT|
|Title||Automatically Reducing Energy Consumption, Improving Performance, and Tolerating Failures With Good Quality of Service|
|Abstract||Reducing energy consumption, improving performance, and tolerating failures are important goals in modern computing systems. We present two techniques for satisfying these goals. The first technique, loop perforation, finds the most time-consuming loops, then transforms the loops to execute fewer iterations. Our results show that this technique can reduce the computational resources required to execute the application by a factor of two to three (enabling corresponding improvements in energy consumption, performance, and fault tolerance) while delivering good quality of service. The second technique, goal-directed parallelization, executes the most time-consuming loops in parallel, then (guided by memory profiling information) adds synchronization and replication as necessary to eliminate bottlenecks and enable the application to produce accurate output. Our results show that this approach makes it possible to effectively parallelize challenging applications without the use of complex static analysis. Because traditional program transformations operate in the absence of any specification of acceptable program behavior, the transformed program must produce the identical result as the original program. In contrast, the two techniques presented in this talk exploit the availability of quality of service specifications to apply much more aggressive transformations that may change the result that the program produces (as long as the result satisfies the specified quality of service requirements). The success of these two techniques demonstrates the advantages of this approach.|
|Date||2:00-3:00, Monday, September 28|
|Speaker||Westley Weimer, University of Virginia|
|Title||A Genetic Programming Approach to Automated Software Repair|
|Abstract||Automatic program repair has been a longstanding goal in software engineering, yet debugging remains a largely manual process. We introduce a fully automated method for repairing bugs in software. The approach works on off-the-shelf legacy applications and does not require formal specifications, program annotations or special coding practices. Once a program fault is discovered, an extended form of genetic programming is used to evolve program variants until one is found that both retains required functionality and also avoids the defect in question. Standard test cases are used to exercise the fault and to encode program requirements. After a successful repair has been discovered, it is minimized using structural differencing algorithms and delta debugging. We describe the proposed method and report experimental results demonstrating that it can successfully repair twenty programs totaling almost 200,000 lines of code in under 200 seconds, on average. In addition, we describe how to combine the automatic repair mechanism with anomaly intrusion detection to produce a closed-loop repair system, and empirically evaluate the resulting repair quality. Finally, we propose a test suite sampling technique for automated repair, allowing programs with hundreds of test cases to be repaired in minutes with no loss in repair quality.|
|Date||1:00-2:00, Wednesday, August 26|
|Speaker||Tom Reps, University of Wisconsin and GrammaTech, Inc.|
|Title||WYSINWYX: What You See Is Not What You eXecute|
Computers do not execute source-code programs; they execute
machine-code programs that are generated from source code.
Consequently, some of the elements relevant to understanding the
program's capabilities and potential flaws may not be visible in a
program's source code. This can be due to layout choices made by the
compiler or optimizer, or because transformations have been applied
subsequent to compilation (e.g., to make the code run faster or to
insert software protections). We call this the WYSINWYX phenomenon
(pronounced ``wiz-in-wicks''): What You See [in source code] Is Not
What You eXecute.
Not only can this create a mismatch between what a programmer intends and what is actually executed by the processor, it can cause analyses that are performed on source code -- the approach followed by most program-analysis tools -- to fail to detect bugs and security vulnerabilities. To address this issue, we have developed methods to analyze machine code using a variety of dynamic, static, and symbolic techniques.
Joint work with G. Balakrishnan (NEC), J. Lim (UW), A. Lal (UW), A. Thakur (UW), D. Gopan (GrammaTech, Inc.), and T. Teitelbaum (Cornell and GrammaTech, Inc.).
|Date||2:00-3:00, Thursday, June 4, 2009|
|Speaker||Shaz Qadeer, Microsoft Research|
|Title||Algorithmic verification of systems software using SMT solvers|
Program verification is an undecidable problem; all program verifiers
must make a tradeoff between precision and scalability. Over the past
decade, a variety of scalable program analysis tools have been
developed. These tools, based primarily on techniques such as type
systems and dataflow analysis, scale to large and realistic programs.
However, to achieve scalability they sacrifice precision, resulting in
a significant number of false error reports and adversely affecting
the usability of the tool.
In this talk, I will present a different approach to program verification realized in the HAVOC verifier for low-level systems software. HAVOC works directly on the operational semantics of C programs based on a physical model of memory that allows precise modeling of pointer arithmetic and other unsafe operations prevalent in low-level software. To achieve scalability, HAVOC performs modular verification using contracts in an expressive assertion language that includes propositional logic, arithmetic, and quantified type and data-structure invariants. The assertion logic is closed under weakest precondition, thereby guaranteeing precise verification for loop-free and call-free code fragments. To reduce the effort of writing contracts, HAVOC provides a mechanism to infer them automatically. It allows the user to populate the code with candidate contracts and then searches efficiently through the candidate set for a subset of consistent contracts.
The expressive contract language in HAVOC has two important benefits. First, it allows the documentation and verification of properties and invariants specific to a particular software system. Second, it allows a user to systematically achieve the ideal of precise verification (with no false alarms) by interacting with the verifier and providing key contracts that could not be inferred automatically.
HAVOC has been implemented using the Boogie verification-condition generator and the Z3 solver for Satisfiability-Modulo-Theories. I will describe the design and implementation of HAVOC and our experience applying it to verify typestate assertions on medium-sized device drivers with zero false alarms. I will conclude with a discussion of remaining challenges and directions for future work.
|Date||2:00-3:00, Monday, May 18, 2009|
|Speaker||Jeremy Condit, Microsoft Research|
|Title||Unifying Type Checking and Property Checking for Low-Level Code|
|Abstract||Type checking for low-level code is challenging because type safety often depends on complex, program-specific invariants that are difficult for traditional type checkers to express. Conversely, property checking (or assertion checking) for low-level code is challenging because it is difficult to write concise specifications that distinguish between locations in an untyped program's heap. In this talk, I will present a new technique that addresses both problems simultaneously by implementing a type checker for low-level code as part of a property checker. I will present a low-level formalization of a C program's heap and its types that can be checked with an SMT solver, and I will discuss several case studies that demonstrate the ability of this tool to express and check complex type invariants in low-level C code, including several small Windows device drivers. This is joint work with Shaz Qadeer, Shuvendu Lahiri, and Brian Hackett.|
|Date||3:30-4:30 pm, Monday, May 11, 2009|
|Speaker||Terence Kelly, HP Labs|
|Title||Eliminating Concurrency Bugs with Control Engineering|
Concurrent programming is notoriously difficult and is
becoming increasingly prevalent as multicore hardware
compels performance-conscious developers to parallelize
software. If we cannot enable the average programmer
to write correct and efficient parallel software at
reasonable cost, the computer industry's rate of value
creation may decline substantially.
Our research addresses the challenges of concurrent programming by leveraging control engineering, a body of technique that can constrain the behavior of complex systems, prevent runtime failures, and relieve human designers and operators of onerous responsibilities. In past decades, control theory made complex and potentially dangerous industrial processes safe and manageable and relieved human operators of tedious and error-prone chores. Today, Discrete Control Theory promises similar benefits for concurrent software. This talk describes an application of the control engineering paradigm to concurrent software: Gadara, which uses Discrete Control Theory to eliminate deadlocks in shared-memory multithreaded software.
|Date||3:30-4:30 pm, Monday, April 27, 2009|
|Speaker||Hans Boehm, HP Labs|
|Title||The C++0x concurrency memory model and some of its implications|
The upcoming revision of the C++ standard (historically referred to as C++0x)
integrates support for threads and locks in the language. As a result, it
can be much more precise about the semantics of shared variables than prior
definitions of the language. Althought the specification is intentionally
similar to the corresponding Java rules, there are some significant, and not
We describe and motivate the treatment of shared variables in C++0x. We explore its consequences on compilers, and speculate about its potential impact on hardware instruction sets and on transactional memory semantics.
|Date||3:00-4:00 pm, Monday, April 20, 2009|
|Speaker||Bill Pugh, University of Maryland and Google|
|Title||The Cost of Static Analysis for Defect Detection|
Static analysis tools can find programming mistakes and places where expected or desired safety properties are not provably guaranteed. In some environments, the cost of shipping or deploying software with defects such as a buffer overflow is huge. In such environments, it is relatively easy to justify the use of static analysis tools. But in other environments, the business case for using static analysis tools is harder to make.|
Static analysis tools often find clear coding mistakes. But in the end, companies don't care about whether their code contains coding mistakes; they care about whether the software functions as intended and can be developed and delivered promptly. In this talk, I'll discuss experience with why some coding mistakes don't impact software functionality and ways to minimize the cost/benefit ratio for incorporating static analysis into the software development process.
|Date||3:00-4:00 pm, Monday, February 9, 2009|
|Speaker||Tom Ball, Microsoft Research|
|Title||Program Analysis 2.0|
|Abstract||Microsoft Research's efforts in program analysis began with detecting defects in C and C++ programs with tools such as ESP, PREfast, PREfix, and SLAM, all of which have been widely deployed internally. The PREfast and SLAM technologies also were incorporated into shipping Microsoft products (Visual Studio and the Driver Development Kit, respectively). In the last few years, we have been working on program analysis tools for .NET, focusing on: (1) code contracts and verifying code against contracts; (2) automatic test generation; (3) checking for concurrency defects. These tools are based on advances in abstract interpretation over linear inequalities, symbolic execution of object-oriented programs, efficient and precise automatic theorem proving, and direct model checking of concurrent systems. All of the above tools are available for both academic and commercial use, in partnership with Visual Studio.|
|Speaker Bio||Thomas Ball is Principal Researcher at Microsoft Research where he manages the Software Reliability Research group (http://research.microsoft.com/srr/). Tom received a Ph.D. from the University of Wisconsin-Madison in 1993, was with Bell Labs from 1993-1999, and has been at Microsoft Research since 1999. He is one of the originators of the SLAM project, a software model checking engine for C that forms the basis of the Static Driver Verifier tool. Tom's interests range from program analysis, model checking, testing and automated theorem proving to the problems of defining and measuring software quality.|
|Date||3:00-4:00 pm, Monday, February 23, 2009|
|Speaker||Mooly Sagiv, Tel-Aviv University|
|Title||TVLA: A system for inferring quantified invariants|
|Abstract||The TVLA system was originally designed as a system for inferring shape properties. In this talk, I will present both the traditional view and the also I will show that the TVLA abstractions amounts to representing invariants in a limited form. This allows TVLA to prove properties which are not usually viewed as shape properties and also sheds some light on the limitations of TVLA and in particular the state space explosion. Moreover, I will present TVLA operations as effective heuristics for reasoning about quantified invariants which includes: materialization, finite differencing, kleene evaluation, and consequence finding This is a joint work with Tal Lev-Ami (Tel Aviv University), Roman Manevich (Tel Aviv University), G. Ramalingam (MSR) , Tom Reps (University of Wisconsin), and Greta Yorsh (IBM Research)|
|Date||3:00-4:00 pm, Monday, February 2, 2009|
|Speaker||Ranjit Jhala, UC San Diego|
|Abstract||We present Logically Qualified Data Types, abbreviated to Liquid Types, a new static program verification technique which combines the complementary strengths of automated deduction (SMT solvers), model checking (Predicate Abstraction), and type systems (Hindley-Milner inference). We have implemented the technique in a tool that infers liquid types for Ocaml programs. To demonstrate the utility of our approach, we show how liquid types reduce, by more than an order of magnitude, the manual annotations required to statically verify (1) the safety of array accesses on a diverse set of benchmarks, and (2) invariants like sortedness, balancedness, binary-search-ordering, variable ordering, set-implementation, heap-implementation, and acyclicity of data structure libraries for list-sorting, union-find, splay trees, AVL trees, red-black trees, heaps, associative maps, extensible vectors, and binary decision diagrams.|
|Date||3:00-4:00 pm, Monday, January 26, 2009|
|Speaker||Greg Bronevetsky, Lawrence Livermore National Lab|
|Title||Static Dataflow for Message Passing Applications|
|Abstract||Message passing is a very popular style of parallel programming, used in a wide variety of applications and supported by many APIs, such as BSD sockets, MPI and PVM. It's importance has motivated significant amounts of research on optimization and debugging techniques for such applications. Although this work has produced impressive results, it has also failed to fulfill its full potential. The reason is that while prior work has focused on runtime techniques, there has been very little on compiler analyses that understand the properties of parallel message passing applications and use this information to improve application performance and debuggers quality. This paper presents a novel compiler analysis framework that extends dataflow to parallel message passing applications on arbitrary numbers of processes. It works on an extended control-flow graph that includes all the possible inter-process interactions of any numbers of processes. This enables dataflow analyses built on top of this framework to incorporate information about the application's parallel behavior and communication topology. The overall parallel dataflow framework can be instantiated with a variety of specific dataflow analyses as well as abstractions that can tune the cost/accuracy of detecting the application's communication topology. The proposed framework bridges the gap between prior work on parallel runtime systems and sequential dataflow analyses, enabling new transformations, runtime optimizations and bug detection tools that require knowledge of the application's communication topology. We instantiate this framework with two different symbolic analyses and show how these analyses can detect different types of communication patterns, which enables the use of dataflow analyses on a wide variety of real applications.|
|Date||3:00-4:00 pm, Monday, January 12, 2009|
|Speaker||Cormac Flanagan, UC Santa Cruz|
|Title||Velodrome: A Sound and Complete Dynamic Atomicity Checker For Multithreaded Programs|
|Abstract||Atomicity is a fundamental correctness property in multithreaded programs, both because atomic code blocks are amenable to sequential reasoning (which significantly simplifies correctness arguments), and because atomicity violations often reveal defects in a program's synchronization structure. Unfortunately, all existing atomicity checkers are incomplete, in that they may yield false alarms even on correctly-synchronized programs, which significantly limits their usefulness. We present the first dynamic analysis for atomicity that is both sound and complete. The analysis reasons about the exact dependencies between operations in the observed trace, and so reports error messages if and only if the observed trace is not conflict-serializable. Despite this significant increase in precision, we show that the performance and coverage achieved by our analysis is competitive with earlier incomplete dynamic atomicity analyses.|
|Date||2:00-3:00 pm, Friday, January 9, 2009|
|Speaker||Mooly Sagiv, Tel-Aviv University|
|Title||(Semi) Thread-Modular Shape Analysis|
|Abstract||Thread-modular static analysis of concurrent systems abstracts away the correlations between the local variables (and program locations) of different threads. This idea reduces the exponential complexity due to thread interleaving and allows us to handle programs with an unbounded number of threads. Thread-modular static analyses face a major problem in simultaneously requiring a separation of the reasoning done for each thread, for efficiency purposes, and capturing relevant interactions between threads, which is often crucial to verify properties. Programs that manipulate the heap complicate thread-modular analysis. Naively treating the heap as part of the global state, accessible by all threads, has several disadvantages since it still admits exponential blow-ups in the heap and is not precise enough to capture things like ownership transfers of heap objects. An effective thread-modular analysis needs to determine which parts of the heap are owned by which threads to obtain a suitable thread-modular state abstraction. I will present new thread-modular analysis techniques and adaptations of thread-modular analysis for programs which manipulate the heap. It is shown that the precision of thread-modular analysis is improved by tracking some correlations between the local variables of different threads. I will also describe techniques for reducing the analysis time for common situations. A key observation for handling the heap is using notions of separation and more generally subheaps in order to abstract away correlations between the properties of subheaps. This is a joint work with Josh Berdine (MSR), Byron Cook (MSR), Alexey Gotsman (Cambridge University), Tal Lev-Ami (Tel Aviv University), Roman Manevich (Tel Aviv University), G. Ramalingam (MSR), and Michal Segalov (Tel Aviv University)|
|Date||1:00-2:00, Friday, Dec. 5|
|Title||Deadlock Immunity: Teaching Systems How To Defend Against Deadlocks|
|Abstract||Deadlock immunity is a property by which programs, once afflicted by a given deadlock, develop resistance against future occurrences of that and similar deadlocks. We developed a technique that enables programs to automatically gain such immunity without assistance from programmers or users. We implemented it for both Java and POSIX threads and evaluated it with several real systems, including MySQL, JBoss, SQLite, Apache ActiveMQ, Limewire, and Java JDK. The results demonstrate effectiveness against real, reported deadlock bugs, while incurring modest performance overhead and scaling to 1024 threads. I will discuss how deadlock immunity can offer programmers and users an attractive tool for coping with elusive deadlocks, as well as present extensions of the immunity idea to other types of failures.|
|Date||2:00-3:00 pm, Monday, October 27, 2008|
|Speaker||Mayur Naik, Intel Berkeley Research Lab|
|Title||Effective Static Deadlock Detection|
We present an effective static deadlock detection algorithm
for Java. Our algorithm uses a novel combination of static analyses
each of which approximates a different necessary condition for a
deadlock. We have implemented the algorithm and report upon our
experience applying it to a suite of multi-threaded Java programs.
While neither sound nor complete, our approach is effective in
practice, finding all known deadlocks as well as discovering
previously unknown ones in our benchmarks with few false alarms.
Joint work with Chang-Seo Park (UC Berkeley), Koushik Sen (UC Berkeley), and David Gay (Intel Research, Berkeley)
|Speaker Bio||Mayur Naik is a researcher at Intel Research, Berkeley. His current research interests include languages and tools for helping programmers write parallel programs. He is involved in projects Ivy (http://ivy.cs.berkeley.edu/) and Chord (http://chord.stanford.edu/) which explore techniques for improving the reliability of multi-threaded programs written in C and Java, respectively. He received his Ph.D. in Computer Science from Stanford University in 2008 working on static race detection for Java.|
|Date||2:00-3:00 pm, Monday, October 13, 2008|
|Speaker||Ras Bodik, UC Berkeley|
|Title||Program Synthesis by Sketching|
|Abstract||Software synthesis automatically derives programs that are efficient, even surprising, but it requires a domain theory, elusive for many applications. Trying to make synthesis accessible, we style the synthesizer into a programmer assistant: the programmer writes a partial program that elides tricky code fragments and the synthesizer completes the program to match a specification. Our hypothesis is that the partial program, called a sketch, communicates the programmer insight to the synthesizer more naturally than a domain theory. On the algorithmic side, sketching exploits recent advances in automated decision procedures. This talk will show how we turned a program checker into a synthesizer with a counterexample-guided inductive synthesis. I will also describe the SKETCH language and its linguistic support for synthesis and show how we synthesized complex implementations of ciphers, scientific codes, and even concurrent lock-free data-structures. Joint work with Armando Solar-Lezama, Chris Jones, Gilad Arnold, Lexin Shan, Satish Chandra and many others.|
|Date||2:00-3:00 pm, Monday, June 23, 2008|
|Speaker||Michael Bond, UT Austin|
|Title||Deployed Software: An Ideal Environment for Fixing Bugs?|
|Abstract||Despite extensive in-house testing, deployed software still contains bugs. These bugs cause systems to fail, wasting billions of dollars and sometimes causing injury or death. Bugs in deployed software are hard to diagnose and fix since they are environment dependent and difficult for developers to reproduce. Furthermore, developers cannot use heavyweight approaches that would degrade performance. This talk makes a case for deployment being the ideal environment for fixing bugs. Solutions fall into two categories: helping developers diagnose and fix bugs, and automatically tolerating bugs instead of letting systems fail. The talk focuses on memory leaks in managed languages--the lone memory bug not eliminated by modern languages--and presents an approach for diagnosing leaks, called Bell, and an approach for tolerating leaks, called Melt. Bell encodes per-object program locations into a single bit and uses brute-force decoding to recover likely program locations causing leaks. Melt puts likely leaked memory on disk and keeps time and memory resources proportional to in-use (not leaked) memory even in the face of growing leaks. The talk concludes with future work and thoughts about how software and hardware trends will make bugs a bigger problem and make deployment-time bug detection and tolerance even more appealing.|
|Date||4:00-5:00 pm, Thursday, June 26, 2008|
|Speaker||Todd Millstein, UCLA|
|Title||Enforcing and Validating User-Defined Programming Disciplines|
|Abstract||One way that programmers manage the complexity of building and maintaining software systems is by adhering to "programming disciplines" of various sorts. For example, locking disciplines prevent concurrency errors, "ownership" disciplines control aliasing among pointers, and design patterns enforce architectural styles or constraints. However, these disciplines are typically only documented informally through comments, if they are documented at all, and can easily be forgotten or misused. Over the past few years, colleagues and I have developed frameworks that allows programmers to specify desired programming disciplines, enforce them statically on programs, and validate them against intended run-time invariants. In this talk, I will overview our approach and illustrate it through our JavaCOP framework for "pluggable type systems" in Java.|
|Speaker Bio||Todd Millstein is an Assistant Professor in the Computer Science Department at the University of California, Los Angeles. Todd received his Ph.D. and M.S. from the University of Washington and his A.B. from Brown University, all in Computer Science. He received an NSF CAREER award in 2006 and an IBM Faculty Award in 2008.|
|Date||3:00-4:00 pm, Monday, May 14, 2007|
|Speaker||Manuvir Das, Microsoft|
|Title||Formal Specifications on Industrial-Strength Code - From Myth to Reality|
|Abstract||The research community has long
understood the value of formal specifications in building robust
software. However, the adoption of any specifications beyond run-time
assertions in industrial software has been limited. All of this has
changed at Microsoft in the last few years. Today, formal
specifications are a mandated part of the software development process
in the largest Microsoft product groups. Millions of specifications
have been added, and tens of thousands of bugs have been exposed and
fixed in future versions of products under development. In addition,
Windows public interfaces are formally specified and the Visual Studio
compiler understands and enforces these specifications, meaning that
programmers anywhere can now use formal specifications to make their
software more robust. |
The goal of this talk is to share the technical, social and practical story of how we were able to move organizations with thousands of programmers to an environment where the use of formal specifications is routine.
|Speaker Bio||Manuvir Das leads the Program Analysis research group in the Center for Software Excellence at Microsoft Corporation, and is an affiliate faculty member at the University of Washington. His research interests are in inventing and applying techniques from Programming Languages, Compilers, and Systems to the software engineering process. Manuvir holds a bachelor's degree in Computer Science from IIT Bombay, and a PhD in Computer Science from the University of Wisconsin-Madison.|
|Date||4:00-5:00 pm, Thursday, April 26, 2007|
|Place||Gates 463a (Theory Lounge)|
|Speaker||Philip Wadler, University of Edinburgh|
|Title||Faith, Evolution, and Programming Languages|
|Abstract||Faith and evolution provide complementary--and sometimes conflicting--models of the world, and they also can model the adoption of programming languages. Adherents of competing paradigms, such as functional and object-oriented programming, often appear motivated by faith. Families of related languages, such as C, C++, Java, and C#, may arise from pressures of evolution. As designers of languages, adoption rates provide us with scientific data, but the belief that elegant designs are better is a matter of faith. This talk traces one concept, second-order quantification, from its inception in the symbolic logic of Frege through to the generic features introduced in Java 5, touching on features of faith and evolution. The remarkable correspondence between natural deduction and functional programming informed the design of type classes in Haskell. Generics in Java evolved directly from Haskell type classes, and are designed to support evolution from legacy code to generic code. Links, a successor to Haskell aimed at AJAX-style three-tier web applications, aims to reconcile some of the conflict between dynamic and static approaches to typing.|
|Speaker Bio||Philip Wadler is Professor of Theoretical Computer Science at the University of Edinburgh. He holds a Royal Society-Wolfson Research Merit Fellowship and is a Fellow of the Royal Society of Edinburgh. Previously, he worked or studied at Avaya Labs, Bell Labs, Glasgow, Chalmers, Oxford, CMU, Xerox Parc, and Stanford, and has visited as a guest professor in Paris, Sydney, and Copenhagen. Prof. Wadler appears at position 70 on Citeseer's list of most-cited authors in computer science; is a winner of the POPL Most Influential Paper Award; and sits on the ACM Sigplan Executive Committee. He contributed to the designs of Haskell, Java, and XQuery, and is a co-author of XQuery from the Experts (Addison Wesley, 2004) and Generics and Collections in Java (O'Reilly, 2006). He has delivered invited talks in locations ranging from Aizu to Zurich.|
|Date||3:00-4:00 pm, Monday, April 9, 2007|
|Speaker||Corina Pasareanu, NASA Ames|
|Title||Abstract State Matching and Symbolic Execution for Software Model Checking|
|Abstract||This talk describes abstraction
based model checking techniques that compute under-approximations of
the feasible behaviors of the system under analysis. The techniques
perform model checking with explicit, or symbolic, execution and
abstract state matching. Model checking explores the concrete program
transitions while storing abstract versions of the explored states, as
specified by an abstraction mapping. State matching determines whether
an abstract state is being revisited, in which case the model checker
backtracks. Applications include program error detection and test input
This is joint work with Willem Visser (SEVEN Networks), Saswat Anand (Georgia Institute of Technology), and Radek Pelanek (Masaryk University).
|Date||3:00-4:00 pm, Monday, March 5, 2007|
|Speaker||Gary T. Leavens, Iowa State University|
|Title||JML: Expressive Modular Reasoning for Java|
|Abstract||The Java Modeling Language (JML)
is used to specify, check, and verify detailed designs for Java classes
and interfaces. JML is an open, international, collaborative effort
among 20 research groups and projects. This talk briefly describes
background on the JML effort, including its tool support for runtime
assertion checking, extended static checking, static verification, unit
testing, etc. |
Subtyping and dynamic dispatch in object-oriented languages pose a challenge for modular reasoning, which is the key to practical tools. Moreover, a specification language must be clear and expressive to be used by practicing programmers. Modular reasoning is done by using supertype abstraction -- reasoning based on static type information. Expressiveness comes from a rich set of features for specifying methods. This talk describes how specification inheritance in JML forces behavioral subtyping, through a discussion of semantics and examples. Behavioral subtyping, together with a set of methodological restrictions, makes modular reasoning with supertype abstraction valid.
This work was supported in part by NSF grant CCF-0429567. The work on behavioral subtyping in particular is based on joint work with Prof. David A. Naumann of Stevens University. See http://jmlspecs.org for more information about JML.
|Date||2:00-3:00 pm, Monday, February 26, 2007|
|Speaker||David Monniaux, National Center for Scientific Research (France)|
|Title||Verification of device drivers: asynchronous "intelligent" peripherals|
|Abstract||It is common in current embedded
systems to use powerful off-the-shelf technologies such as programmable
I/O controllers doing direct-to-memory access, handling worklists etc.,
for instance for USB busses. This poses the question of verifying
properties of the driver, since the driver cannot be analyzed in
isolation but must be asynchronously composed with a specification of
the controller. |
We'll see some results of the verification of properties of a simulation of a USB OHCI controller asynchronously composed with an industrial device driver, using the Astree static analyzer. We'll then discuss future works and challenges.
|Date||3:00-4:00 pm, Monday, February 5, 2007|
|Speaker||Michael Hicks, University of Maryland|
|Title||Modular Information Hiding and Type Safe Linking for C|
|Abstract||The overarching goal of my
research is to explore means to develop more flexible, reliable, and
secure software systems. In this talk, I will present a brief overview
of my work, and in particular focus on CMod, a novel tool that provides
a sound module system for C. |
CMod works by enforcing a set of four rules that are based on principles of modular reasoning and on current programming practice. CMod's rules flesh out the convention that .h header files are module interfaces and .c source files are module implementations. Although this convention is well known, developing CMod's rules revealed there are many subtleties in applying the basic pattern correctly. We have proven formally that CMod's rules enforce both information hiding and type-safe linking. We evaluated CMod on a number of benchmarks, and found that most programs obey CMod's rules, or can be made to with minimal effort, while rule violations reveal brittle coding practices including numerous information hiding violations and occasional type errors.
|Date||3:00-4:00 pm, Monday, January 29, 2007|
|Speaker||Radu Rugina, Cornell University|
|Title||Practical Shape Analysis|
|Abstract||Shape analyses are aimed at
extracting invariants that describe the "shapes" of heap-allocated
recursive data structures. Although existing shape analyses have been
successful at verifying complex heap manipulations, they have had
limited success at being practical for larger programs. |
In this talk I will present two practical approaches to heap analysis and their application to error detection, program verification, and compiler transformations. First, I will present a reference counting shape analysis where the compiler uses local reasoning about individual heap cells, instead of global reasoning about the entire heap. Second, I will present a heap analysis by contradiction, where the analysis checks the absence of heap errors by disproving their presence. These techniques are both sufficiently precise to accurately analyze a large class of heap manipulation algorithms, and sufficiently lightweight to scale to larger programs.
|Date||3:00-4:00 pm, Monday, January 8, 2007|
|Speaker||Michael Ernst, MIT|
|Title||Refactoring for parameterizing Java classes|
|Abstract||Type safety and expressiveness of
many existing Java libraries and their client applications would
improve, if the libraries were upgraded to define generic classes.
Efficient and accurate tools exist to assist client applications to use
generics libraries, but so far the libraries themselves must be
parameterized manually, which is a tedious, time-consuming, and
error-prone task. We present a type-constraint-based algorithm for
converting non-generic libraries to add type parameters. The algorithm
handles the full Java language and preserves backward compatibility,
thus making it safe for existing clients. Among other features, it is
capable of inferring wildcard types and introducing type parameters for
mutually-dependent classes. We have implemented the algorithm as a
fully automatic refactoring in Eclipse. |
We evaluated our work in two ways. First, our tool parameterized code that was lacking type parameters. We contacted the developers of several of these applications, and in all cases where we received a response, they confirmed that the resulting parameterizations were correct and useful. Second, to better quantify its effectiveness, our tool parameterized classes from already-generic libraries, and we compared the results to those that were created by the libraries' authors. Our tool performed the refactoring accurately ? in 87% of cases the results were as good as those created manually by a human expert, in 9% of cases the tool results were better, and in 4% of cases the tool results were worse.
|Date||3:00-4:00 pm, Monday, December 4, 2006|
|Speaker||Koushik Sen, University of California, Berkeley|
|Title||Concolic Testing of Sequential and Concurrent Programs|
|Abstract||Testing with manually generated test cases is the primary technique used in industry to improve reliability of software--in fact, such testing is reported to account for 50%-80% of the typical cost of software development. I will describe Concolic Testing, a systematic and efficient method which combines random and symbolic testing. Concolic testing enables automatic and systematic testing of large programs, avoids redundant test cases and does not generate false warnings. Experiments on real-world software show that concolic testing can be used to effectively catch generic errors such as assertion violations, memory leaks, uncaught exceptions, and segmentation faults. Combined with dynamic partial order reduction techniques and predictive analysis, concolic testing is effective in catching concurrency bugs such as data races and deadlocks as well as specification related bugs. I will describe my experience with building two concolic testing tools, CUTE for C programs and jCUTE for Java programs, and applying these tools to real-world software systems. I will briefly describe some ongoing projects to improve and extend concolic testing.|
|Date||3:00-4:00 pm, Thursday, November 30, 2006|
|Speaker||Rupak Majumdar, University of California, Los Angeles|
|Title||What's Next for Software Verification?|
|Abstract||Over the last few years, software verification based on predicate abstraction and counterexample-guided refinement has been a successful technique for performing automatic and precise static analysis of programs. Other than device driver protocols, though, software verifiers have so far mostly been applied to simple properties of systems. For these programs and properties, we show a simple syntactic algorithm to construct approximate program invariants that is surprisingly effective, suggesting perhaps that software verification tools are overkill for these applications. For deeper properties, we describe our recent attempts to verify a memory management subsystem and outline several challenges that must be met before the verification will succeed. We describe some partial progress. In particular, we describe a technique to infer predicates in the presence of data structures such as lists and sets, and a type system that extracts word-level relationships from code containing bitwise operations.|
|Date||3:00-4:00 pm, Monday, November 6, 2006|
|Speaker||Ranjit Jhala, University of California, San Diego|
|Abstract||A modular program analysis
considers components independently and provides succinct summaries for
each component, which can be used when checking the rest of the system.
Consider a system comprising of two components, a library and a client.
A temporal summary, or interface, of the library specifies legal
sequences of library calls. The interface is safe if no call sequence
violates the library's internal invariants; the interface is permissive
if it contains every such sequence. Modular program analysis requires
full interfaces, which are both safe and permissive: the client does
not cause errors in the library if and only if it makes only sequences
of library calls that are allowed by the full interface of the library.
Previous interface-based methods have focused on safe interfaces, which
may be too restrictive and thus reject good clients. We present an
algorithm for automatically synthesizing software interfaces that are
both safe and permissive. The algorithm generates interfaces as graphs
whose vertices are labeled with predicates over the library's internal
state, and whose edges are labeled with library calls. The interface
state is refined incrementally until the full interface is constructed.
In other words, the algorithm automatically synthesizes a typestate
system for the library, against which any client can be checked for
compatibility. We present an implementation of the algorithm which is
based on the BLAST model checker, and we evaluate some case studies. |
Joint work with Thomas A. Henzinger and Rupak Majumdar.
|Date||2:30-3:30 pm, Tuesday, October 31, 2006|
|Speaker||Guy L. Steele Jr., Sun Microsystems Laboratories|
|Title||Parallel Programming and Code Selection in Fortress|
|Abstract||As part of the DARPA program for High Productivity Computing Systems, the Programming Language Research Group at Sun Microsystems Laboratories is developing Fortress, a language intended to support large-scale scientific computation with the same level of portability that the Java programming language provided for multithreaded commercial applications. One of the design principles of Fortress is that parallelism be encouraged everywhere; for example, it is intentionally just a little bit harder to write a sequential loop than a parallel loop. Another is to have rich mechanisms for encapsulation and abstraction; the idea is to have a fairly complicated language for library writers that enables them to write libraries that present a relatively simple set of interfaces to the application programmer. Thus Fortress is as much a framework for language developers as it is a language for coding scientific applications. We will discuss ideas for using a rich parameterized polymorphic type system to organize multithreading and data distribution on large parallel machines. The net result is similar in some ways to data distribution facilities in other languages such as HPF and Chapel, but more open-ended, because in Fortress the facilities are defined by user-replaceable and -extendable libraries rather than wired into the compiler. A sufficiently rich type system can take the place of certain kinds of flow analysis to guide certain kinds of code selection and optimization, again moving policymaking out of the compiler and into libraries coded in the Fortress source language.|
|Date||3:00-4:00 pm, Monday, October 16, 2006|
|Speaker||Zhendong Su, University of California, Davis|
|Title||Scalable and Accurate Tree-based Detection of Code Clones|
|Abstract||Detecting code clones has many
software engineering applications. Existing approaches either do not
scale to large code bases or are not robust against code modifications.
In this talk, I will present an efficient algorithm for identifying
similar subtrees. The algorithm is based on a novel characterization of
subtrees with numerical vectors in the Euclidean space R^n and an
efficient algorithm to cluster these vectors w.r.t. the Euclidean
distance metric. Subtrees with vectors in one cluster are considered
similar. We have implemented our tree similarity algorithm as a clone
detection tool called Deckard and evaluated it on large code bases
written in C and Java including the Linux kernel and JDK. Our
experiments show that Deckard is both scalable and accurate. It is also
language independent, applicable to any language with a formally
specified grammar. |
Joint work with Lingxiao Jiang, Ghassan Misherghi, and Stephane Glondu.
|Date||3:00-4:00 pm, Thursday, June 8, 2006|
|Speaker||Bowen Alpern, IBM T. J. Watson Research Center|
|Title||Early Ruminations on Anticipating Demand for Software Delivered as a Stream|
|Abstract||This is not, at least not primarily, a talk about the economic viability of a particular model of software distribution. Rather, it reports on some preliminary investigations into whether predictive caching techniques might be able to mitigate the first-time performance penalty incurred when software executes as it is being delivered. The context for this work is the PDS (Progressive Deployment System) project at IBM Research which uses application virtualization to deliver an application and its non operating system dependencies as a stream in order to avoid dependency problems that can occur when multiple applications are installed on the same machine. This is joint work with the PDS group.|
|Date||2:00-3:00 pm, Thursday, May 18, 2006|
|Title||Automated Detection of Refactorings in Evolving Components|
|Abstract||One of the costs of reusing
software components is upgrading applications to use the new version of
the components. Upgrading an application can be error-prone, tedious,
and disruptive of the development process. An important kind of change
in OO components is a refactoring. Refactorings are program
transformations that improve the internal design without changing the
observable behavior (e.g., renamings, moving methods between classes,
splitting/merging of classes). Our previous study showed that more than
80% of the disruptive changes in five different components were caused
by refactorings. If the refactorings that happened between two versions
of a component could be automatically detected, a refactoring tool
could replay them on applications. |
I will present an algorithm that detects refactorings performed during component evolution. Our algorithm uses a combination of a fast syntactic analysis to detect refactoring candidates and a more expensive semantic analysis to refine the results. The experiments on components ranging from 17 KLOC to 352 KLOC show that our algorithm detects refactorings in real-world components with accuracy over 85%.
Joint work with Can Comertoglu, Darko Marinov and Ralph Johnson (UIUC).
|Date||3:00-4:00 pm, Monday, May 15, 2006|
|Speaker||Stephen Freund, Williams College|
|Title||Practical Hybrid Type Checking|
Software systems typically contain large API's that are only informally
and imprecisely specified and hence easily misused. Practical
mechanisms for documenting and verifying precise specifications would
significantly improve software reliability. |
The Sage programming language is designed to provide high-coverage checking of expressive specifications. The Sage type language is a synthesis of (unrestricted) refinement types and Pure Type Systems. Since type checking for this language is not statically decidable, Sage uses hybrid type checking, which extends static type checking with dynamic contract checking, automatic theorem proving, and a database of refuted subtype judgments.
In this talk, I present the key ideas behind hybrid type checking, the Sage language, and preliminary experimental results suggesting that hybrid type checking of precise specifications is a promising approach for the development of reliable software. I will also discuss more recent work on extending Sage to include mutable objects.
This is joint work with Cormac Flanagan, Jessica Gronski, Kenn Knowles, and Aaron Tomb at University of California, Santa Cruz.
|Date||3:00-4:00 pm, Monday, April 24, 2006|
|Speaker||Kathy Yelick, UC Berkeley and Lawrence Berkeley National Lab|
|Title||Compilation Technology for Computational Science|
The emergence of multicore processors marks the end of an era in
computing: whereas hardware developers were largely responsible for
exponential performance gains in the past decade, software developers
will now be equally responsible if these gains are to continue. The
notoriously difficult problem of writing parallel software will now be
commonplace. Much of the experience using parallelism for performance
resides in the scientific computing community, and while that group has
focused on numerical simulations and very large scale parallelism,
surely some lessons can be learned. In this talk, I will describe the
class of partitioned global address space languages, which have
recently received support from the scientific computing community as a
possible alternative to message passing and threads. One of these
languages, Titanium, is a modest extension of Java with domain-specific
extensions for scientific computing and large-scale parallelism. |
Titanium has proven to be significantly more expressive than message passing and has been used for significant scientific problems, including a parallel simulation of blood flow in the heart and an elliptic solver based on adaptive mesh refinement. Several interesting computer science challenges have arisen from this work, including the need for program analysis techniques specialized to parallel languages, machine-independent optimization strategies, and runtime support for latency hiding. I will describe some recent work in compilation of parallel languages, including thread-aware pointer analysis, sequential consistency enforcement, and model-driven communication optimizations, and give an overview of some open questions in the field.
|Date||3:00-4:00 pm, Monday, April 17, 2006|
|Speaker||Brad Chamberlain, Cray Inc.|
|Title||Chapel: Cray Cascade's High Productivity Language|
|Abstract||In 2002, DARPA launched the High
Productivity Computing Systems (HPCS) program, with the goal of
improving user productivity on High-End Computing systems for the year
2010. As part of Cray's research efforts in this program, we have been
developing a new parallel language named Chapel, designed to: |
1. support a global view of parallel programming with the ability to tune for locality,
2. support general parallel programming including data- and task-parallel codes, as well as nested parallelism, and
3. help narrow the gulf between mainstream and parallel languages.
In this talk I will introduce the motivations and foundations for Chapel, describe several core language concepts, and show some sample computations written in Chapel.
|Date||2:15-3:15 pm, Friday, March 10, 2006|
|Speaker||Stephen Fink, IBM T. J. Watson Research Center|
|Title||Effective Typestate Verification in the Presence of Aliasing|
|Abstract||We describe a novel framework for
verification of typestate properties, including several new techniques
to precisely treat aliases without undue performance costs. In
particular, we present a flow-sensitive, context-sensitive, integrated
verifier that utilizes a parametric abstract domain combining typestate
and aliasing information. To scale to real programs without
compromising precision, we present a staged verification system in
which faster verifiers run as early stages which reduce the workload
for later, more precise, stages. |
We have evaluated our framework on a number of real Java programs, checking correct API usage for various Java standard libraries. The results show that our approach scales to hundreds of thousands of lines of code, and verifies correctness for over 95% of the potential points of failure.
|Date||4:00-5:15 pm, Thursday, February 23, 2006|
|Speaker||Vivek Sarkar, IBM T. J. Watson Research Center|
|Title||X10: An Object-Oriented Approach to Non-Uniform Cluster Computing|
|Abstract||It is now well established that
the device scaling predicted by Moore's Law is no longer a viable
option for increasing the clock frequency of future uniprocessor
systems at the rate that had been sustained during the last two
decades. As a result, future systems are rapidly moving from
uniprocessor to multiprocessor configurations, so as to use parallelism
instead of frequency scaling as the foundation for increased compute
capacity. The dominant emerging multiprocessor structure for the future
is a Non-Uniform Cluster Computing (NUCC) system with nodes that are
built out of multi-core SMP chips with non-uniform memory hierarchies,
and interconnected in horizontally scalable cluster configurations such
as blade servers. Unlike previous generations of hardware evolution,
this shift will have a major impact on existing software. Current OO
language facilities for concurrent and distributed programming are
inadequate for addressing the needs of NUCC systems because they do not
support the notions of non-uniform data access within a node, or of
tight coupling of distributed nodes. |
We have designed a modern object-oriented programming language, X10, for high performance, high productivity programming of NUCC systems. A member of the partitioned global address space family of languages, X10 highlights the explicit reification of locality in the form of places; lightweight activities embodied in async, future, foreach, and ateach constructs; a construct for termination detection (finish); the use of lock-free synchronization (atomic blocks); and the manipulation of cluster-wide global data structures. We present an overview of the X10 programming model and language, experience with our reference implementation, and results from some initial productivity comparisons between the X10 and Java(TM) languages.
This is joint work with other members of the X10 core team --- Vijay Saraswat, Raj Barik, Philippe Charles, Christopher Donawa, Christian Grothoff, Allan Kielstra, Igor Peshansky, and Christoph von Praun.
|Date||3:00-4:00 pm, Monday, February 6, 2006|
|Title||PADS: Processing Arbitrary Data Sources|
|Abstract||Many high-volume data sources
exist that can be mined very profitably, for example: call detail
records, web server logs, network packets, network configuration and
log files, provisioning records, credit card records, stock market
data, etc. Unfortunately, many such data sources are in formats over
which data consumers have no control. A significant effort is required
to understand such a data source and write a parser for the data, a
process that is both tedious and error-prone. Often, the hard-won
understanding of the data ends up embedded in parsing code, making both
sharing the understanding and maintaining the parser difficult.
Typically, such parsers are incomplete, failing to specify how to
handle situations where the data does not conform to the expected
In this talk, I will describe the PADS project, which aims to provide languages and tools for simplifying the analysis of ad hoc data. We have designed a declarative data-description language, PADS/C, expressive enough to describe the data sources we see in practice at AT&T, including ASCII, binary, EBCDIC (Cobol), and mixed formats. From PADS/C we generate a C library with functions for parsing, manipulating, summarizing, querying, and writing the data.
This work is joint with Bob Gruber, Mary Fernandez, David Walker, Yitzhak Mandelbaum, and Mark Daly.
|Date||4:15-5:30 pm, Tuesday, November 29, 2005|
|Speaker||Sanjit Seshia, University of California, Berkeley|
|Title||SAT-Based Decision Procedures and Malware Detection|
|Abstract||SAT-based decision procedures operate by performing a satisfiability-preserving encoding of their input to a Boolean satisfiability (SAT) problem, on which a SAT solver is invoked. In this talk I will present UCLID, a verification tool based on SAT-based decision procedures, and describe an application to detecting malware (e.g., viruses and worms). UCLID's SAT-based decision procedures are for quantifier-free first-order logics involving arithmetic. These have been used within a malware detector that shows greater resilience to malware obfuscations than commercial tools. I will describe the notion of a "semantic signature," the detection algorithm, and experimental results.|
|Date||3:30-4:30 pm, Tuesday, November 15, 2005|
|Speaker||Norman Ramsey, Harvard University|
|Title||A Low-level Approach to Reuse for Programming-Language Infrastructure|
|Abstract||New ideas in programming
languages are best evaluated experimentally. But experimental
evaluation is helpful only if there is an implementation that is
efficient enough to encourage programmers to use the new features.
Ideally, language researchers would build efficient implementations by
reusing existing infrastructure, but existing infrastructures do not
serve researchers well: in high-level infrastructures, many high-level
features are built in and can't be changed, and in low-level
infrastructures, it is hard to support important *run-time* services
such as garbage collection, exceptions, and so on. |
I am proposing a different approach: for reuse with many languages, an infrastructure needs *two* low-level interfaces: a compile-time interface and a *run-time* interface. If both interfaces provide appropriate mechanisms, the mechanisms can be composed to build many high-level abstractions, leaving the semantics and cost model up to the client.
In this talk, I will illustrate these ideas with examples drawn from two parts of the C-- language infrastructure: exception dispatch and procedure calls. I will focus on the mechanisms that make it possible for you to choose the semantics and cost model you want. For exceptions, these mechanisms are drawn from both compile-time and run-time interfaces, and together they enable you to duplicate all the established techniques for implementing exceptions. For procedure calls, the mechanisms are quite different; rather than provide low-level mechanisms that combine to form different kinds of procedure calls, I have found it necessary to extend the compile-time interface to enable direct control of the semantics and cost of procedure calls. I will also sketch some important unsolved problems regarding mechanisms to support advanced control features such as threads and first-class continuations.
|Date||4:15-5:30 pm, Tuesday, November 8, 2005|
|Speaker||Terence Parr, University of San Francisco|
|Title||ANTLR and Computer Language Implementation|
|Abstract||While parsing has been well
understood for decades and a number of decent parser generators exist,
anyone who has built an interpreter, translator, or compiler of
consequence will tell you that the overall problem of supporting
language development has not been adequately solved. Yes, parsing has
been solved in theory but many of the strongest parsing strategies and
systems are cumbersome in practice. Moreover, parsing is only one
component of language implementation and the other pieces have either
been studied purely from a compiler point of view or have resulted in
powerful but inaccessible solutions that programmers in the trenches
are unable or unwilling to use.|
My goal in this talk is twofold: (1) to convince you that there is still work to be done and interesting problems to solve in the realm of language tools such as IDEs tailored to language development, grammar reuse strategies, automatic grammar construction from sample inputs, and transformation systems that are accessible to the average programmer; (2) to demonstrate a few items from my ANTLR research program such as the LL(*) parsing algorithm, tree grammars, grammar rewrite rules, and ANTLRWorks grammar development environment.
|Date||4:15-5:30 pm, Tuesday, November 1, 2005|
|Speaker||George Necula, University of California, Berkeley|
|Title||Data Structure Specifications via Local Equality Axioms|
|Abstract||We describe a program verification methodology for specifying global shape properties of data structures by means of axioms involving arbitrary predicates on scalar fields and pointer equalities in the neighborhood of a memory cell. We show that such local invariants are both natural and sufficient for describing a large class of data structures. We describe a complete decision procedure for such a class of axioms. The decision procedure is not only simpler and faster than in other similar systems, but has the advantage that it can be extended easily with reasoning for any decidable theory of scalar fields.|
|Date||4:15-5:30 pm, Tuesday, October 18, 2005|
|Speaker||Ulfar Erlingsson, Microsoft Research|
|Title||Principles and Applications of Software Control-Flow Integrity|
|Abstract||Current software attacks often
build on exploits that subvert machine-code execution. The enforcement
of a basic safety property, Control-Flow Integrity (CFI), can prevent
such attacks from arbitrarily controlling program behavior. CFI
enforcement is simple, and its guarantees can be established formally,
even with respect to powerful adversaries. Moreover, CFI enforcement is
practical: it is compatible with existing software and can be done
efficiently using software rewriting in commodity systems. Finally, CFI
provides a useful foundation for enforcing further security policies. |
This talk describes CFI and an x86 implementation of CFI enforcement, assesses its security benefits against real-world attacks, and shows how the CFI guarantees can enable efficient software implementations of a protected shadow call stack and of access control for memory regions.
This is joint work with Martin Abadi, Mihai Budiu, and Jay Ligatti. More information about the work can be found at http://research.microsoft.com/research/sv/gleipnir/
|Date||2:30-3:45 pm, Monday, October 10, 2005|
|Speaker||Dan Grossman, University of Washington|
|Title||Strong Atomicity for Today's Programming Languages|
The data races and deadlocks that riddle threaded applications are an
ever-greater impediment to reliable and responsive desktop
applications. The lock-based approach to shared-memory concurrency
(i.e., the approach taken in current programming languages such as
Java) has software-engineering shortcomings compared to atomicity. An
atomic construct is a concurrency primitive that executes code as
though no other thread has interleaved execution. To ensure
correctness, fair scheduling, and reasonable performance, we advocate a
logging-and-rollback approach to implementing atomic. Moreover, we
believe we can implement atomic well enough on today's commodity
hardware to utilize atomicity and further investigate its usefulness. |
This talk will describe ongoing work designing and implementing languages with atomicity. After describing the advantages of atomicity, we will describe our experience with two prototypes: (1) AtomCaml is a working prototype that extends the mostly-functional language OCaml with atomicity. OCaml does not support true parallelism (it essentially assumes a uniprocessor), which lets us perform key optimizations for this common case. In particular, non-atomic code can run unchanged. (2) AtomJava is a Java extension currently under development. It implements atomicity in terms of locks (which are not visible to programmers) without potential deadlock. The AtomJava compiler produces Java source code that can run on any Java implementation.