Arithmetic Integration of Decision Procedures
Ting Zhang


Abstract >>  Click to span or collapse


Since the beginning of the computer era, there have been strong demands for high quality programs from the perspectives of reliability, security and performance. As complex computer systems become ever more pervasive, the importance of rigorous program verification can hardly be over-stressed. Formal verification has been successfully used in specifying and verifying hardware. At the heart of virtually every verification system are Decision Procedures that reason about the system behaviors, proving their safety or finding situations that could lead to bugs. Besides the indispensable role in rigorous formal verification, decision procedures are also of great importance in model checking and program analysis where automatic proof engines can improve the overall efficiency and increase the analysis accuracy.

However, many important problems remain. Decision procedures exists for many specialized logical domains including integers, reals as well as for many data structures frequently appearing in programs such that lists, queues and sets. Programs, even of very simple kind, however, often involve multiple data domains, resulting in verification conditions spanning more than one logical theories. Therefore, from the perspective of high-level program analysis, it is of great importance to develop decision procedures for complex domains.

This thesis offers novel solutions to an important class of decision problems, the mixed constraints on data structures with integer constraints on the size of those data structures. Such constraints can express memory safety properties such as absence of memory overflow and out-of-bound array access, which are crucial for program correctness.

The approach can be summarized as follows: we reduce constraints on data structures to constraints on integers, and in the presence of quantifiers, reduce quantifiers on data objects to quantifiers on integers.

Chapter 1. Introduction >>  Click to span or collapse


Decision procedures are algorithms that can determine whether a formula in a given logical theory is valid or satisfiable. An important distinction between these algorithms and general-purpose theorem provers is the level of automation. Decision procedures are fully automatic; they always terminate with either positive or negative answers. In formal program verification, hundreds of thousands of verification conditions are generated. Program verification will never be practical if human efforts are needed to prove these formulas, most of which are repetitive, tediously long and mathematically uninteresting. Decision procedures can automatically discharge verification conditions that fall in the scope of certain decidable theories, and hence relieve users of verification systems from tedious interaction with general-purpose theorem provers. Besides their indispensable role in rigorous formal verification, decision procedures are also of great importance in model checking and program analysis where automatic proof engines can improve the overall efficiency and increase the analysis accuracy.

Decision procedures exist for many specialized logical domains including integers, reals as well as for many data structures frequently appearing in programs such as lists, queues, sets and multi-sets. These specialized decision procedures can only handle a particular class of formulas in a particular theory. Programs, even of very simple kind, however, often involve multiple data domains, resulting in verification conditions spanning multiple logical theories. To be able to verify high-level programs with rich data types, we have to find decision procedures that can reason on complex domains.

Designing a decision procedure not only requires a good understanding of the algebraic properties of the specific domains, but also requires a lot of effort and often ingenuity to exploit these algebraic properties. Decision procedures for simple domains are well-studied in this way; by exploiting specific structures of the domains, efficient algorithms are obtained whose complexity in most cases matches the optimal theoretical bounds. Designing a decision procedure for a combined theory from scratch, however, is not practical because there are potentially many different combinations. Combined structures, however, usually are constructed in a modular way, by imposing additional constructs on component substructures. Therefore, it is natural to investigate whether decision procedures for such combined structures can be obtained modularly, namely, by utilizing decision procedures for the component theories as black boxes.

This line of investigation goes back to Nelson and Oppen who discovered a combination procedure for stably infinite quantifier-free theories with disjoint signatures. The procedure combines satisfiability procedures for component theories using equality propagation. The applicability of the Nelson-Oppen method, however, is limited mainly by two conditions: (i) it works only with quantifier-free theories and, (ii) it requires that the signatures of component theories are disjoint. It turns out that these two restrictions are very hard to remove in spite of much research in this direction. There are several recent advances for combining quantified theories or non-disjoint theories. The practical applicability of these general combination methods, however, is limited in that these methods require additional nontrivial conditions on component theories or structures. In fact accumulated evidence shows that the existence of modular combinations like the Nelson-Oppen method should be viewed as an exception. In combining theories with quantifiers or non-disjoint signatures, we should not expect the same level of modularity as in the Nelson-Oppen schema, but rather rely on close examination of the characteristics of the combining constructs as well as the properties of the individual component theories.

Chapter 2. Decision Procedures for Term Algebras with Presburger Arithmetic. >>  Click to span or collapse



In this chapter we present the integration of Presburger arithmetic with term algebras. It first introduces the technical machinery and then presents decision procedures for quantifier-free theories and quantified theories.

Term algebras can represent an important class of recursively defined data structures known as recursive data structures. This class of structures satisfies the following two properties of term algebras: (i) the data domain is the set of data objects generated exclusively by applying constructors, and (ii) each data object is uniquely generated. Examples of such structures include lists, stacks, counters, trees and records; queues do not belong to this class as they are not uniquely generated: they can grow at both ends. The combined constraints can express memory safety properties (e.g., absence of memory overflow) and other augmented properties of data structures (e.g., being a balanced tree).

We developed the basic reduction technique, namely, extraction of accurate integer constraints from term constraints. From the construction of accurate integer constraints that precisely characterize term constraints, we can derive decision procedures for combined constraints by utilizing decision procedures for term algebras and decision procedures for integer arithmetic.

We showed that for structures with infinite constant domain such an accurate integer constraint, which is satisfiable if and only if the corresponding term constraint is satisfiable, can be effectively computed and is expressible by a quantifier-free Presburger formula linear in the size of the term constraint. For structures with finite constant domain we introduce an additional counting constraint to account for the fact that with finitely many constants the number of distinct terms of a particular length is bounded. We showed that also this counting constraint is expressible by a quantifier-free Presburger formula. The latter decision procedure directly extends Oppen's decision procedure for infinite data domains to finite domains.

For the first-order theory, We first developed a new quantifier elimination procedure for the theory of *pure* term algebras which can eliminate blocks of quantifiers of the same kind in one step. Then we extended it to a decision procedure for the theory of term algebras with integer constraints by, again, extracting integer constraints from term constraints combined with a reduction of quantifiers on term variables to quantifiers on integer variables. The complexity of the new decision procedure is k-fold exponential for formulas with k quantifier alternations. This is optimal in the sense that the theory of pure term algebras itself has non-elementary complexity.

Chapter 3. Decision Procedures for Queues with Presburger Arithmetic. >>  Click to span or collapse


In this chapter we present the integration of Presburger arithmetic with queues. It first adapts the technical machinery introduced in Chapter 2 and then presents decision procedures for quantifier-free theories and quantified theories.

Queues are widely used in many fields of computer science such as communication networks, job scheduling and simulation. They also provide an important synchronization mechanism in modeling distributed protocols and hence form the basis of many concurrent algorithms.

As a queue can grow at both ends, it does not fall into the category of recursive data structures, which can be modeled as term algebras. For this reason, We have further improved the reduction technique and developed new normalization procedures to handle the distinguished properties of queues.

We present decision procedures for quantifier-free theories of queues with Presburger arithmetic. We consider two kinds of quantifier-free theories, based on whether they include the prefix relation or not. We also present a quantifier elimination procedure for the first-order theory of queues with integers. The elimination procedure removes a block of existential quantifiers in one step.

Chapter 4. Decidability of the First-order Theory of Knuth-Bendix Order. >>  Click to span or collapse


Using quantifier elimination and the reduction technique developed for solving the decision problem of term algebras augmented with integer arithmetic, We proved the decidability of the first-order theory of Knuth-Bendix Order, thereby solving a long-standing open problem in term rewriting (officially listed as RTA open problem 99 since 2000).

This decidability result is obtained by quantifier elimination on a complex structure containing term algebras and integer arithmetic. In this structure, we have a weight function mapping terms to integers as well as various boundary functions mapping integers to terms. In addition, the Knuth-Bendix order is expanded in two directions. First, the order is decomposed into three disjoint suborders depending on which of three conditions is used in the definition. Secondly, all orders (including the suborders) are extended to gap orders, which assert the least number of distinct objects between two terms. Moreover, as Knuth-Bendix order is recursively defined on a lexicographic extension of itself, gap orders are extended to tuples of terms. Thus we actually established the decidability of a very richer theory. In constructing the quantifier elimination procedure, we overcame several technical challenges, including simplification of complex literals, elimination of integer quantifiers, elimination of equality, elimination of negative literals, and proof of termination (using multi-set orderings).

Knuth-Bendix order has numerous applications in term rewriting and theorem proving, along with two other types of orders, polynomial order and recursive path order. In ordered term rewriting, a strategy built on ordering constraints can dynamically orient an equation, at the time of instantiation, even if the equation is not uniformly orientable. This provides a powerful tool to prove the termination of rewriting systems. In ordered resolution and paramodulation, ordering constraints are used to select maximal literals to perform resolution. It also serves as enabling conditions for inference rules and such conditions can be inherited from previous inferences at each deduction step. This helps to prune redundancy of the search space without compromising refutational completeness. Of these three orderings, however, only the Knuth-Bendix order has a decidable first-order theory. Therefore, this decidability result may greatly benefit future algorithm design in term rewriting and theorem proving.

It is interesting that the combination of term algebras with integer arithmetic can help solve an open problem in another quite different field. We believe that this demonstrates the effectiveness of our approach to the combination problem; studying concrete combination types where richer algebraic properties can be exploited.

Chapter 5. Conclusion >>  Click to span or collapse


This thesis offers novel solutions to an important class of decision problems, the mixed constraints on data structures with quantitative properties. We developed the reduction technique, namely, extraction of accurate integer constraints from data constraints, and in case of quantified theories, reduction of quantifiers on data objects to quantifiers on integers. From the construction of accurate integer constraints that precisely characterize data constraints, we can derive decision procedures for the combined constraints by utilizing decision procedures for data structures and decision procedures for Presburger arithmetic. We presented decision procedures for term algebras with integers and decision procedures for queues with integers. Using our reduction technique and quantifier elimination, we proved the decidability of the first-order theory of Knuth-Bendix Order, thereby solving a long-standing open problem in term rewriting (officially listed as RTA open problem 99 since 2000).

We envisage that decision procedures will play a bigger role in formal methods, model checking and program analysis. They will render more valuable tools for specifying and analyzing security applications, and embedded and reactive systems. We plan to expand the thesis work in the following directions.

Security Verification. A vast majority of security problems of software systems is caused by memory access violations such as stack or heap overflow and out-of-bound array access. This brings unprecedented demands in reasoning about memory safety properties, that is, memory accesses, in terms of various data manipulations, always stay within designated boundaries. In fact, memory safety properties are a subclass of the more general quantitative properties of resource reallocation which can be expressed in the language of data structures with integer constraints. We believe work in this thesis can be used as the basis for specifying and verifying such quantitative properties.

High-level Static Analysis. Many advanced data structures are widely used in industry-sized applications such as Java Runtime Library and C++ Standard Template Library. They include linked lists, heaps, priority queues, hash tables, skip lists, splay trees, etc. Program reliability and efficiency rely on high-level properties of these data structures. The traditional low-level logic representation of these structures easily leads to undecidability. Here the challenge is to strike the right balance between expressive power and complexity. A specification language should be well-designed so that it can model the core properties of a data structure while retaining decidability or even low complexity. We believe more new decision procedures for advanced data structures will make important contributions to high-level static analysis.

Verification of Embedded and Reactive Systems. It is of essential importance to develop techniques for designing and analyzing embedded and reactive systems, as they are ubiquitous in our daily lives, particularly in many safety-critical devices that we use. One of the challenges that we would like to focus on is to carry out symbolic exploration of the state-space efficiently. Such symbolic computation unavoidably involves quantified formulas, while many first-order theories either are undecidable or intractable due to high complexity. As many have observed, however, we hardly deal with formulas with a large quantifier alternation depth, and hence it is worthwhile to investigate the class of formulas that can have arbitrarily long sequences of quantifiers of the same kind while the total number of quantifier alternations is bounded by a constant number. In the search of quantifier elimination procedures for the theory of term algebras with integers and for the theory of queues with integers, we already aimed at and successfully obtained block-wise quantifier eliminations which are practically more efficient than single-variable quantifier eliminations. We propose to continue the development of more efficient quantifier eliminations, in particular elimination procedures for the combined theory of integer and real arithmetic, which finds applications in the verification of hybrid systems and real-time systems.

More Powerful Decision Theories. On the theoretical front, we plan to search for new powerful tools to prove decision problems. Currently we are investigating the decidability of the theory of queues with integers and with subsequence relations including subqueue, prefix and suffix relations. This theory has a very strong expressive power; it can interpret the theory of word concatenation with length function, the theory of Presburger arithmetic with divisibility predicate, the theory of arrays, etc. Determining the decidability of this theory will have far-reaching consequences for solving other decision problems. In particular, it could give us a better understanding and classification of solutions to word equations. Besides the theoretical importance, the decidability of this theory may give us more powerful algorithms in pattern matching, which has numerous applications in computer science. It can precisely characterize the semantics of common string operations in the C language, and hence would be a powerful tool to reason about memory safety properties. It may also lead to a decision procedure for the theory of unbounded bit-vectors which potentially has many applications in hardware verification.

Download >>  Click to span or collapse


Special University Ph.D. Oral Examination: .
Ph.D. Thesis:
Thesis Summary: