# Arithmetic Integration of Decision Procedures

Ting Zhang

## Abstract >> Click to span or collapse

Since the beginning of the computer era, there have been strong
demands for high quality programs from the perspectives of
reliability, security and performance. As complex computer systems
become ever more pervasive, the importance of rigorous program
verification can hardly be over-stressed. Formal verification has
been successfully used in specifying and verifying hardware. At
the heart of virtually every verification system are Decision
Procedures that reason about the system behaviors, proving their
safety or finding situations that could lead to bugs. Besides the
indispensable role in rigorous formal verification, decision
procedures are also of great importance in model checking and
program analysis where automatic proof engines can improve the
overall efficiency and increase the analysis accuracy.

However, many important problems remain. Decision procedures
exists for many specialized logical domains including integers,
reals as well as for many data structures frequently appearing in
programs such that lists, queues and sets. Programs, even of very
simple kind, however, often involve multiple data domains,
resulting in verification conditions spanning more than one
logical theories. Therefore, from the perspective of high-level
program analysis, it is of great importance to develop decision
procedures for complex domains.

This thesis offers novel solutions to an important class of decision
problems, the mixed constraints on data structures with integer
constraints on the size of those data structures. Such constraints
can express memory safety properties such as absence of memory
overflow and out-of-bound array access, which are crucial for
program correctness.

The approach can be summarized as follows: we reduce constraints on data structures to constraints on integers, and in the presence of quantifiers, reduce quantifiers on data objects to quantifiers on integers.

## Chapter 1. Introduction >> Click to span or collapse

Decision procedures are algorithms that can determine whether a
formula in a given logical theory is valid or satisfiable. An
important distinction between these algorithms and general-purpose
theorem provers is the level of automation. Decision procedures are
fully automatic; they always terminate with either positive or
negative answers. In formal program verification, hundreds of
thousands of verification conditions are generated. Program
verification will never be practical if human efforts are needed to
prove these formulas, most of which are repetitive, tediously long
and mathematically uninteresting. Decision procedures can
automatically discharge verification conditions that fall in the
scope of certain decidable theories, and hence relieve users of
verification systems from tedious interaction with general-purpose
theorem provers. Besides their indispensable role in rigorous formal
verification, decision procedures are also of great importance in
model checking and program analysis where automatic proof engines
can improve the overall efficiency and increase the analysis
accuracy.

Decision procedures exist for many specialized logical domains
including integers, reals as well as for many data structures
frequently appearing in programs such as lists, queues, sets and
multi-sets. These specialized decision procedures can only handle a
particular class of formulas in a particular theory. Programs, even
of very simple kind, however, often involve multiple data domains,
resulting in verification conditions spanning multiple logical
theories. To be able to verify high-level programs with rich data
types, we have to find decision procedures that can reason on
complex domains.

Designing a decision procedure not only requires a good
understanding of the algebraic properties of the specific domains,
but also requires a lot of effort and often ingenuity to exploit
these algebraic properties. Decision procedures for simple domains
are well-studied in this way; by exploiting specific structures of
the domains, efficient algorithms are obtained whose complexity in
most cases matches the optimal theoretical bounds. Designing a
decision procedure for a combined theory from scratch, however, is
not practical because there are potentially many different
combinations. Combined structures, however, usually are constructed
in a modular way, by imposing additional constructs on component
substructures. Therefore, it is natural to investigate whether
decision procedures for such combined structures can be obtained
modularly, namely, by utilizing decision procedures for the
component theories as black boxes.

This line of investigation goes back to Nelson and Oppen who discovered a combination procedure for stably infinite quantifier-free theories with disjoint signatures. The procedure combines satisfiability procedures for component theories using equality propagation. The applicability of the Nelson-Oppen method, however, is limited mainly by two conditions: (i) it works only with quantifier-free theories and, (ii) it requires that the signatures of component theories are disjoint. It turns out that these two restrictions are very hard to remove in spite of much research in this direction. There are several recent advances for combining quantified theories or non-disjoint theories. The practical applicability of these general combination methods, however, is limited in that these methods require additional nontrivial conditions on component theories or structures. In fact accumulated evidence shows that the existence of modular combinations like the Nelson-Oppen method should be viewed as an exception. In combining theories with quantifiers or non-disjoint signatures, we should not expect the same level of modularity as in the Nelson-Oppen schema, but rather rely on close examination of the characteristics of the combining constructs as well as the properties of the individual component theories.

## Chapter 2. Decision Procedures for Term Algebras with Presburger Arithmetic. >> Click to span or collapse

In this chapter we present the integration of Presburger arithmetic with term algebras. It first introduces the technical machinery and then presents decision procedures for quantifier-free theories and quantified theories.

Term algebras can represent an important class of recursively defined data structures known as *recursive data structures*. This class of structures satisfies the following two properties of term algebras: (i) the data domain is the set of data objects generated exclusively by applying constructors, and (ii) each data object is uniquely generated. Examples of such structures include lists, stacks, counters, trees and records; queues do not belong to this class as they are not uniquely generated: they can grow at both ends. The combined constraints can express memory safety properties (e.g., absence of memory overflow) and other augmented properties of data structures (e.g., being a balanced tree).

We developed the basic reduction technique,
namely, extraction of accurate integer constraints from term
constraints. From the construction of accurate integer constraints
that precisely characterize term constraints, we can derive
decision procedures for combined constraints by utilizing decision
procedures for term algebras and decision procedures for integer
arithmetic.

We showed that for structures with infinite constant domain such
an accurate integer constraint, which is satisfiable if and only
if the corresponding term constraint is satisfiable, can be
effectively computed and is expressible by a quantifier-free
Presburger formula linear in the size of the term constraint. For
structures with finite constant domain we introduce an additional
*counting constraint* to account for the fact that with
finitely many constants the number of distinct terms of a
particular length is bounded. We showed that also this counting
constraint is expressible by a quantifier-free Presburger formula.
The latter decision procedure directly extends Oppen's decision
procedure for infinite data domains to finite domains.

For the first-order theory, We first developed a new quantifier
elimination procedure for the theory of *pure* term algebras which
can eliminate blocks of quantifiers of the same kind in one step.
Then we extended it to a decision procedure for the theory of term
algebras with integer constraints by, again, extracting integer
constraints from term constraints combined with a reduction of
quantifiers on term variables to quantifiers on integer variables.
The complexity of the new decision procedure is k-fold
exponential for formulas with k quantifier alternations. This is
optimal in the sense that the theory of pure term algebras itself
has non-elementary complexity.

## Chapter 3. Decision Procedures for Queues with Presburger Arithmetic. >> Click to span or collapse

In this chapter we present the integration of Presburger arithmetic with queues. It first adapts the technical machinery introduced in Chapter 2 and then presents decision procedures for quantifier-free theories and quantified theories.

Queues are widely used in many fields of computer science such as communication networks, job scheduling and simulation. They also provide an important synchronization mechanism in modeling distributed protocols and hence form the basis of many concurrent algorithms.

As a queue can grow at
both ends, it does not fall into the category of recursive data
structures, which can be modeled as term algebras. For this
reason, We have further improved the reduction technique and
developed new normalization procedures to handle the distinguished
properties of queues.

We present decision procedures for quantifier-free theories of queues with Presburger arithmetic. We consider two kinds of quantifier-free theories, based on whether they include the prefix relation or not. We also present a quantifier elimination procedure for the first-order theory of queues with integers. The elimination procedure removes a block of existential quantifiers in one step.

## Chapter 4. Decidability of the First-order Theory of Knuth-Bendix Order. >> Click to span or collapse

Using quantifier elimination and the reduction technique developed
for solving the decision problem of term algebras augmented with
integer arithmetic, We proved the decidability of the first-order
theory of *Knuth-Bendix Order*, thereby solving a
long-standing open problem in term rewriting (officially listed as
RTA open problem 99 since 2000).

This decidability result is obtained by quantifier elimination on
a complex structure containing term algebras and integer
arithmetic. In this structure, we have a weight function mapping
terms to integers as well as various boundary functions mapping
integers to terms. In addition, the Knuth-Bendix order is expanded
in two directions. First, the order is decomposed into three
disjoint suborders depending on which of three conditions is used
in the definition. Secondly, all orders (including the suborders)
are extended to gap orders, which assert the least number of
distinct objects between two terms. Moreover, as Knuth-Bendix
order is recursively defined on a lexicographic extension of
itself, gap orders are extended to tuples of terms. Thus we
actually established the decidability of a very richer theory. In
constructing the quantifier elimination procedure, we overcame
several technical challenges, including simplification of complex
literals, elimination of integer quantifiers, elimination of
equality, elimination of negative literals, and proof of
termination (using multi-set orderings).

Knuth-Bendix order has numerous applications in term rewriting and
theorem proving, along with two other types of orders,
*polynomial order* and *recursive path order*. In
ordered term rewriting, a strategy built on ordering constraints
can dynamically orient an equation, at the time of instantiation,
even if the equation is not uniformly orientable. This provides a
powerful tool to prove the termination of rewriting systems. In
ordered resolution and paramodulation, ordering constraints are
used to select maximal literals to perform resolution. It also
serves as enabling conditions for inference rules and such
conditions can be inherited from previous inferences at each
deduction step. This helps to prune redundancy of the search space
without compromising refutational completeness. Of these three
orderings, however, only the Knuth-Bendix order has a decidable
first-order theory. Therefore, this decidability result may
greatly benefit future algorithm design in term rewriting and
theorem proving.

It is interesting that the combination of term algebras with integer
arithmetic can help solve an open problem in another quite different
field. We believe that this demonstrates the effectiveness of our
approach to the combination problem; studying concrete combination
types where richer algebraic properties can be exploited.

## Chapter 5. Conclusion >> Click to span or collapse

This thesis offers novel solutions to an important class of decision
problems, the mixed constraints on data structures with quantitative
properties. We developed the reduction technique, namely, extraction
of accurate integer constraints from data constraints, and in case
of quantified theories, reduction of quantifiers on data objects to
quantifiers on integers. From the construction of accurate integer
constraints that precisely characterize data constraints, we can
derive decision procedures for the combined constraints by utilizing
decision procedures for data structures and decision procedures for
Presburger arithmetic. We presented decision procedures for term
algebras with integers and decision procedures for queues with
integers. Using our reduction technique and quantifier elimination,
we proved the decidability of the first-order theory of
*Knuth-Bendix Order*, thereby solving a long-standing open
problem in term rewriting (officially listed as RTA open problem 99
since 2000).

We envisage that decision procedures will play a bigger role in
formal methods, model checking and program analysis. They will
render more valuable tools for specifying and analyzing security
applications, and embedded and reactive systems. We plan to expand
the thesis work in the following directions.

*Security Verification.* A vast majority of security
problems of software systems is caused by memory access violations
such as stack or heap overflow and out-of-bound array access. This
brings unprecedented demands in reasoning about *memory safety*
properties, that is, memory accesses, in terms of various data
manipulations, always stay within designated boundaries. In fact,
memory safety properties are a subclass of the more general
*quantitative* properties of *resource reallocation* which
can be expressed in the language of data structures with integer
constraints. We believe work in this thesis can be used as the basis
for specifying and verifying such quantitative properties.

*High-level Static Analysis.* Many advanced data
structures are widely used in industry-sized applications such as
Java Runtime Library and C++ Standard Template Library. They include
linked lists, heaps, priority queues, hash tables, skip lists, splay
trees, etc. Program reliability and efficiency rely on
*high-level* properties of these data structures. The
traditional *low-level* logic representation of these
structures easily leads to undecidability. Here the challenge is to
strike the right balance between expressive power and complexity. A
specification language should be well-designed so that it can model
the *core* properties of a data structure while retaining
decidability or even low complexity. We believe more new decision
procedures for advanced data structures will make important
contributions to high-level static analysis.

*Verification of Embedded and Reactive Systems.* It is
of essential importance to develop techniques for designing and
analyzing *embedded* and *reactive* systems, as they are
ubiquitous in our daily lives, particularly in many safety-critical
devices that we use. One of the challenges that we would like to
focus on is to carry out symbolic exploration of the state-space
efficiently. Such symbolic computation unavoidably involves
quantified formulas, while many first-order theories either are
undecidable or intractable due to high complexity. As many have
observed, however, we hardly deal with formulas with a large
quantifier alternation depth, and hence it is worthwhile to
investigate the class of formulas that can have arbitrarily long
sequences of quantifiers of the same kind while the total number of
quantifier alternations is bounded by a constant number. In the
search of quantifier elimination procedures for the theory of term
algebras with integers and for the theory of queues with integers,
we already aimed at and successfully obtained *block-wise*
quantifier eliminations which are practically more efficient than
*single-variable* quantifier eliminations. We propose to
continue the development of more efficient quantifier eliminations,
in particular elimination procedures for the combined theory of
integer and real arithmetic, which finds applications in the
verification of hybrid systems and real-time systems.

*More Powerful Decision Theories.* On the theoretical
front, we plan to search for new powerful tools to prove decision
problems. Currently we are investigating the decidability of the
theory of queues with integers and with subsequence relations
including *subqueue*, *prefix* and *suffix*
relations. This theory has a very strong expressive power; it can
interpret the theory of word concatenation with length function, the
theory of Presburger arithmetic with divisibility predicate, the
theory of arrays, etc. Determining the decidability of this theory
will have far-reaching consequences for solving other decision
problems. In particular, it could give us a better understanding and
classification of solutions to word equations. Besides the
theoretical importance, the decidability of this theory may give us
more powerful algorithms in pattern matching, which has numerous
applications in computer science. It can precisely characterize the
semantics of common string operations in the C language, and hence
would be a powerful tool to reason about memory safety properties.
It may also lead to a decision procedure for the theory of unbounded
bit-vectors which potentially has many applications in hardware
verification.