-

S. Schulz. E - A Brainiac Theorem Prover. Journal of AI Communications

6th International Workshop on the Implementation of Logics

Christoph Benzmu¨ller

Bernd Fischer

Geoff Sutcliffe

Universit¨at des Saarlandes

Germany

chris@ags.uni-sb.de

University of Southampton

England

b.fischer@ecs.soton.ac.uk

University of Miami

geoff@cs.miami.edu

0 0 Logic for Programming , Artificial Intelligence and Reasoning

2006

111 126

Held at The 13th International Conference on

The 6th International Workshop on the Implementation of Logics

The IWIL workshop series brings together developers and users of systems that implement reasoning in logic, to share information about successful implementation techniques for automated reasoning systems and similar programs. Systems of all types (automated, interactive, etc), and for all logics (classical, non-classical, all orders, etc) are of interest to the workshop. Contributions that help the community to understand how to build useful and powerful reasoning systems in practice are of particular interest. Two invited papers, and six research papers selected from the submissions (by the program committee listed below), will be presented. A panel discussion will enable all attendees to participate in open discussion. Previous IWIL workshops include the 5th IWIL (Montevideo, Uruguay), 4th IWIL (Almati, Kazakhstan), 3rd IWIL (Tbilisi,

Georgia), 2nd IWIL (Havana, Cuba), and 1st IWIL (Reunion Island). Journal Publication

The Journal of Algorithms in Applied Logic, Artificial Intelligence and Computer Science has agreed to a special issue based around around the topic of the IWIL workshop. The special issue targets IWIL participants, but will also also accept submissions from the broader community.

Program Committee Wolfgang Ahrendt (Chalmers University of Technology, Sweden) Anbulagan (National ICT Australia, Australia) Serge Autexier (Universit¨at des Saarlandes, Germany) Chad Brown (Universitt des Saarlandes, Germany) Hans de Nivelle (Max-Planck Institut fr Informatik, Germany) Alexander Fuchs (University of Iowa, USA) Thomas Hillenbrand (Max-Planck Institut fu¨r Informatik, Germany) Boris Konev (University of Liverpool, England) Konstantin Korovin (University of Manchester, England) Albert Oliveras (Technical University of Catalonia, Spain) Brigitte Pientka (McGill University, Canada) Stephan Schulz (Technische Universit¨at Mnchen, Germany) Volker Sorge (University of Birmingham, England) Alwen Tiu (Australian National University, Australia) Ullrich Hustadt (University of Liverpool, England) Presentation Schedule Session 1: 9:00-10:00 Session 2: 10:30-11:00 11:00-11:30 11:30-12:00 Session 3: 14:00-15:00 Session 4:

15:30-16:00 16:00-16:30 16:30-17:00 17:00-18:00

Invited Talk: Algorithms and Data Structures for First-Order Equational Deduction

Stephan Schulz

Term Indexing for the LEO-II Prover

Frank Theiß and Christoph Benzmu¨ller

Integrating External Deduction Tools with ACL2

Matt Kaufmann, J Strother Moore, Sandip Ray and Erik Reeber

Efficiently Checking Propositional Resolution Proofs in Isabelle/HOL

Tjark Weber

Invited Talk: Implementing an Instantiation-based Theorem Prover for First-order Logic

Konstantin Korovin

Tableau Decision Procedure for Propositional Intuitionistic Logic

Guido Fiorino, Alessandro Avellone and Ugo Moscato

LIFT-UP: Lifted First-Order Planning Under Uncertainty

Steffen Ho¨lldobler and Olga Skvortsova

Multiple Preprocessing for Systematic SAT Solvers

Anbulagan and John Slaney

Panel Interrogation: Experts on the implementation of logics will answer questions from the workshop attendees

Anbulagan, Martin Giese, Konstantin Korovin, Stephan Schulz

Acknowledgements

We wish to thank the following for their contribution to the success of this conference: Air Force Office of Scientific Research, Asian Office of Aerospace Research and

Development.

Algorithms and Data Structures for First-Order Equational Deduction (extended abstract)

Stephan Schulz Technische Universit¨at Mu¨nchen schulz@eprover.org 1

Introduction

First-order logic with equality is one of the most widely used logics. While there is a large number of different approaches to theorem proving in this logic, the field has been dominated by saturation-based systems using some variant of the superposition calculus [BG90, BG94, BG98, NR01], i.e. systems that employ paramodulation, restricted by ordering constraints and possibly literal selection, as the main inference mechanism, and rewriting and subsumption as the main redundancy elimination techniques. Many systems complement equational reasoning with explicit resolution for non-equational literals. Examples of provers based on the combination of paramodulation, rewriting and (possibly) resolution include SPASS [WBH+02], Vampire [RV01], Otter [MW97], its successor Prover9, and E [Sch02, Sch04b].

The power of a saturating prover depends on four different, but interrelated aspects: • The calculus (What inferences are necessary and possible?) • The inference engine (How are they implemented?) • The search organization (How is the the proof state organized and which invariants are maintained?) • The heuristic control of the search (Which subset of inferences is performed and in what order?)

In this talk I will discuss the basic concepts of the given clause saturation algorithm and its implementation. In particular, I will describe the behaviour of the DISCOUNT loop version of this algorithm on practical examples, and discuss how this affected the choice of algorithms and data structures for E. I will try point out some low-hanging fruit, where a lot of performance can be gained for relatively modest investment in code complexity, as well as some more advanced techniques that have a significant pay-off.

Rewrite-Based Theorem Proving Superposition is a refutational calculus. It attempts to make the potential unsatisfiability of a formula (consisting of axioms and negated conjecture) explicit using saturation. The search state is represented by a set of first-order clauses (disjunctions of literals over terms). The proof search employs two different mechanisms. First, new clauses are added to the proof state by generating inference rules, using existing clauses as premises. Secondly, simplifying or contracting inference rules are used to remove clauses or to replace them by simpler ones. The proof is successful if this process eventually produces the empty clause, i.e. an explicit contradiction.

While the generating inferences are crucial to establish the theoretical completeness of the calculus, extensive use of contracting inferences has turned out to be indispensable for actually finding proofs.

The practical difficulty of implementing a high-performance theorem prover results mostly from the fact that the proof state grows extremely fast with the depth of the proof search. A successful implementation has to be able to handle large data sets, efficiently find potential inference partners, and in particular, be able to quickly identify clauses that can be simplified or removed.

While the number of inference and simplification rules can be much larger, in nearly all cases only three rules are critical from a performance point of view: • Superposition (including resolution as a special case) or a similar restricted form of paramodulation is by far the most prolific generating inference rule. Typically, between 95% and 99% of clauses in a non-trivial proof search are generated by a paramodulation inference. Paramodulation can essentially be described as a combination of instantiation (guided by unification) and lazy conditional rewriting. For superposition, this inference is further restricted by ordering constraints (a smaller term cannot be replaced by a larger term) and literal selection (only certain literals need to be considered as applicable or as targets of the application). • Unconditional rewriting allows the simplification of a clause by replacing a term with an equivalent, but simpler term. In contrast to superposition, no instantiation of the rewritten clause takes place, and a term is always replaced by a smaller one. As a consequence, this is a simplifying inference, and the original of the rewritten clause can be discarded. Practical experience has shown that in most cases rewriting drastically improves the search behaviour of a prover. • Subsumption allows the elimination of clauses if a more general clause already is known. As such, it helps to reduce the search space explosion typical for saturating provers. However, finding subsumption relations between clauses can be very expensive.

For completeness, it is necessary to eventually consider all combinations of nonredundant clauses as premises for generating inferences. To avoid the overhead of keeping track of each possible combination, the given-clause algorithms splits the proof state into two distinct subsets, the set P of processed (or active) clauses, and the set U of unprocessed (or passive) clauses, and maintains the invariant that all necessary inferences between clauses in P have been performed. On the most abstract level, the algorithm picks a clause from U , performs all inferences with this clause and clauses from P (adding the resulting newly deduced clauses to U ), and puts it into P . This process is repeated until either the empty clause is deduced, the set U runs empty (in which case there is not proof), or some time or resource limit has been reached (in which case the prover terminates without a useful result). The major heuristic decision in this algorithm is in which order clauses from U are picked for processing.

The two main variants of the given clause algorithm differ in how they integrate simplification into this basic loop. The Otter loop maintains the whole proof state in a fully simplified (or interreduced) state. It uses all unit equations for rewriting and all clauses for subsumption attempts. The DISCOUNT loop, on which E is built, only maintains this invariant for the set P of processed clauses. It simplifies newly generated clauses once (to weed out obviously redundant clauses and to aid heuristic evaluation, but it does not use them for rewriting or subsumption until they are actually selected for processing. It thus trades reduced simplification for a higher rate of iterations of the main loop. Opinions differ on which of the two designs is more efficient (see e.g. [RV03]). 3

Representing the Proof State Analysis of the behavior of provers over a large set of examples has shown that the size of U typically grows about quadratically with the size of P . Therefore for non-trivial proof problems, the vast majority of clauses is in U , and unprocessed clauses are responsible for most of the resources used by the prover. The overall proof state can easily reach millions clauses, with a corresponding number of literals and terms as their constituents.

Surprisingly, with naive implementations much of the CPU time can be taken up with seemingly trivial operations. In the first version of DISCOUNT [ADF95, DKS97] we found to our surprise that assumedly complex operations like first-order unification were negligible, while linear time operations like the naive insertion of clauses into U (organized as a linear list sorted by heuristic evaluation) took up around half of the total search time.

The first and most basic data type for a first-order prover is the term. Simple, direct implementations realize terms as ordered trees, with each node labeled by a function or variable symbol, and with the subterms of a term as successor nodes in the tree. Alternatives to this basic design are either increasingly optimized for size, as flat-terms, string terms, and eventually implicit terms (reconstructed on demand) as in the Waldmeister prover [GHLS03], or they try to store more and more precomputed information at the term nodes to speed up repetitive operations. This is particularly successful if sets of terms are not represented as sets of trees, but as a shared directed acyclic graph, with repeated occurrences of any given term or subterm only represented once. This approach has been followed in E. We found sharing factors varying from 5-15 in the unit equational case, up to several 1000 in the general non-Horn case.

Rewriting

Shared terms do not only allow a reduction in memory consumption, they also allow the sharing of (some) term-related inferences. In particular, they allow the sharing of rewrite operations and, even more importantly, normal form information among terms. For any rewritten term, we can add a link pointing to the result of the rewrite operation. If we encounter the same term again in the future, we can just follow this link instead of performing a real search for matching and applicable rewrite rules. Less glorious, but not less effective, is the sharing of information about non-rewritability. In either version of the given clause algorithm, the set of potential rewrite rules changes over time, and the strength of the rewrite relation grows monotonically. If a term is in normal form with respect to all rules at a given time, no older rule has to be ever considered again for rewriting this term. This criterion can be integrated into indexing techniques to further speed up normal form computation. 5

Subsumption

A single rewriting step is usually cheap to perform. The high cost of rewriting comes from the large number of term positions and rules to consider. For subsumption, on the other hand, even a single clause-clause check can be very expensive, as the problem is known to be NP-complete [KN86]. Unless reasonable care is taken, the exponential worst case can indeed be encountered, and with clauses that are large enough that this hurts performance significantly1.

To overcome this problem, a number of strategies can be implemented. First, presorting of literals with a suitable ordering stable under substitutions can greatly reduce the number of permutations that need to be considered. Secondly, a number of required conditions for subsumption can be tested rather cheaply. If any of these tests fail, the full subsumption test becomes superfluous.

Feature vector indexing [Sch04a] arranges several of these tests in a way that they can be performed not only for single clauses, but for sets of clauses at a time. 6

Conclusion References

Engineering a modern first-order theorem prover is part science, part craft, and part art. Unfortunately, there is no single exposition of the necessary knowledge available at the moment - something that the community should aim to rectify.

J. Avenhaus, J. Denzinger, and M. Fuchs. DISCOUNT: A System for Distributed Equational Deduction. In J. Hsiang, editor, Proc. of the 6th RTA, Kaiserslautern, volume 914 of LNCS, pages 397–402. Springer, 1995. 1This is an understated version of “The prover stops cold”. [BG94] [BG98] [DKS97]

L. Bachmair and H. Ganzinger. On Restrictions of Ordered Paramodulation with Simplification. In M.E. Stickel, editor, Proc. of the 10th CADE, Kaiserslautern, volume 449 of LNAI, pages 427—441. Springer, 1990.

L. Bachmair and H. Ganzinger. Rewrite-Based Equational Theorem Proving with Selection and Simplification. Journal of Logic and Computation, 3(4):217–247, 1994.

L. Bachmair and H. Ganzinger. Equational Reasoning in Saturation-Based Theorem Proving. In W. Bibel and P.H. Schmitt, editors, Automated Deduction — A Basis for Applications, volume 9 (1) of Applied Logic Series, chapter 11, pages 353–397. Kluwer Academic Publishers, 1998.

J. Denzinger, M. Kronenburg, and S. Schulz. DISCOUNT: A Distributed and Learning Equational Prover. Journal of Automated Reasoning, 2(18):189–198, 1997. Special Issue on the CADE 13 ATP System Competition. [KN86] [MW97] [NR01] [RV01] [RV03] [Sch02] [Sch04a]

D. Kapur and P. Narendran. NP-Completeness of the Set Unification and Matching Problems. In J.H. Siekmann, editor, Proc. of the 8th CADE, Oxford, volume 230 of LNCS, pages 489–495. Springer, 1986.

W.W. McCune and L. Wos. Otter: The CADE-13 Competition Incarnations. Journal of Automated Reasoning, 18(2):211–220, 1997. Special Issue on the CADE 13 ATP System Competition.

R. Nieuwenhuis and A. Rubio. Paramodulation-Based Theorem Proving. In A. Robinson and A. Voronkov, editors, Handbook of Automated Reasoning, volume I, chapter 7, pages 371–443. Elsevier Science and MIT Press, 2001. A. Riazanov and A. Voronkov. Vampire 1.1 (System Description). In R. Gor´e, A. Leitsch, and T. Nipkow, editors, Proc. of the 1st IJCAR, Siena, volume 2083 of LNAI, pages 376–380. Springer, 2001.

A. Riazanov and A. Voronkov. Limited resource strategy in resolution theorem proving. Journal of Symbolic Computation, 36(1–2):101–115, 2003.

S. Schulz. Simple and Efficient Clause Subsumption with Feature Vector Indexing. In G. Sutcliffe, S. Schulz, and T. Tammet, editors, Proc. of the IJCAR-2004 Workshop on Empirically Successful First-Order Theorem Proving, Cork, Ireland, ENTCS. Elsevier Science, 2004. (to appear). Term Indexing for the LEO-II Prover

Frank Theiß1

Christoph Benzmu¨ller2 1FR Informatik, Universit¨at des Saarlandes, Saarbru¨cken, Germany

lime@ags.uni-sb.de 2Computer Laboratory, The University of Cambridge, UK chris@ags.uni-sb.de

Abstract

We present a new term indexing approach which shall support efficient automated theorem proving in classical higher order logic. Key features of our indexing method are a shared representation of terms, the use of partial syntax trees to speedup logical computations and indexing of subterm occurrences. For the implementation of explicit substitutions, additional support is offered by indexing of bound variable occurrences. A preliminary evaluation of our approach shows some encouraging first results. 1

Introduction Term indexing has become standard in first order theorem proving and is applied in all major systems in this domain [RV02, Sch02, WBH+02]. An overview on first order term indexing is given in [RSV01] and [NHRV01] presents an evaluation of different techniques. Comparably few term indexing techniques have been developed and studied for higher order logic. An example is Pientka’s work on higher order substitution tree indexing [Pie03].

In this paper we present a new approach to higher order term indexing developed for the higher order resolution prover LEO-II1, the successor of LEO [BK98]. Our approach is motivated by work presented in [TSP06], which studies the application of indexing techniques for interfacing between theorem proving and computer algebra.

Pientka’s approach is based on substitution tree indexing and relies on unification of linear higher order patterns. While higher order pattern unification is a comparatively high level operation, the approach we present here is based on coordinate and path indexing [Sti89] and thus relies on lower level operations, for example, operations on hashtables. Apart from indexing and retrieval of terms, we particularly want to speedup basic operations such as replacement of (sub-)terms and occurs checks.

1The LEO-II project at Cambridge University has just started (in October 2006). The project is funded by EPSRC under grant EP/D070511/1.

Terms in de Bruijn Notation LEO (and its successor LEO-II under development) is based on Church’s simple type theory, that is, a logic built on top of the simply typed λ-calculus [Chu40]. In contrast to LEO, our new term data structure for LEO-II uses de Bruijn [dB72] indices for the internal representation of bound variables. In this paper, de Bruijn indices have the form xi, where x is a nameless dummy and i the actual index. Constants and free variables in LEO-II, called symbols in the remainder of this paper, have named representations. Due to Currying, applications have only one argument term in LEO-II.2

Terms in LEO-II are thus defined as follows: • Symbols are either constant symbols (taken from an alphabet Σ) or (free, existential) variable symbols (taken from an alphabet V). Every symbol is a term. • Bound variables, represented by de Bruijn indices xi for some index i ∈ {1, 2, . . .}, are terms. • If s and t are terms, then the application s@t is a term. • If t is a term, then the abstraction λ.t is a term.

For bound variables xi, the de Bruijn index i denotes the distance between the variable and its binder in terms of scopes. Scopes are limited by occurrences of λ-binders, thus the index i is determined by the number of occurrences of λ-binders between the variable and its binder.

For instance, the term translates to λa.λb.(b = ((λc.(cb))a)) λλ.(x0 = ((λ.(x0x1))x1)) in de Bruijn notion. While de Bruijn indices ease the treatment of α-conversion in the implementation, they are less intuitive. As it can be seen in the above example, different occurrences of the same bound variable may have different de Bruijn indices. This is the case here for b, which translates to both x0 and x1. Vice versa, different occurrences of the same de Bruijn index may refer to different λ-binders. This is the case for x0, which relates to both the bound variable b (first occurrence of x0) and the bound variable c (second occurrence of x0). Similarly, x1 is related to the bound variables b and the bound variable a.

2Alternative representations, for example, spine notation [CP97], offer at first sight shorter paths to term parts that are relevant for a number of operations. The difference is primarily the order in which the parts of a term can be accessed. In the case of the spine notation, for example, the head symbol of a term can be directly accessed. In our approach we try to offer these shortcuts by representing indexed terms in a graph structure. This allows to adopt additional ways of accessing (sub-)terms by introducing additional graph edges. For instance, the head symbol of each term and its position are indexed in our data structure.

The presentation of LEO-II’s type system (simple types) is omitted here. With respect to LEO-II’s term indexing, typing only provides an additional criterion for the distinction between terms, for example, different occurrences of the same de Bruijn index may have different types. Apart from this, typing has no further impact on the indexing mechanism. Overall correctness is ensured outside the index structure, that is, by indexing only terms in βη normal form. 3

The Index The indexing mechanism we describe here supports fast access to indexed terms and its subterms. The index will be used in LEO-II to store intermediate results (consisting of clauses, literals, and terms in literals) of LEO-II’s resolution based proof procedure. The idea is that these intermediate results are always kept in βη normal form (that is, η short and β normal; for example, the term (λ.(λ.(x1x0))))((λ.gx0)c) has the βη normal form (gc)). Hence, βη normalisation is an invariant of our approach and we assume that we never insert non-normal terms to an index.

Key features of our indexing mechanism are a shared representation of terms (see Section 3.1), the use of partial syntax trees to speedup logical computations (see Section 3.2) and the indexing of subterm occurrences (see Section 3.3). For the implementation of explicit substitutions [FKP96, ACCL90], additional support is offered by indexing of bound variable occurrences (see Section 3.4).

Partial syntax trees are used to index occurrences of symbols and subterms within a term. They help to avoid occurs checks and to early prune superfluous branches in the implementation of operations like replacement, substitution or β-reduction. Checking whether or not a symbol occurs within a term or in a given branch of its syntax tree requires only constant time.

The implementation of the index is furthermore based on the use of cascaded hashtables, for example, to index application terms according to their function or argument term. This allows requests for terms in a style similar to SQL [Ame92]. For example, indexing of applications is realised similar to an SQL table Applications featuring the columns appl for application terms, func for their respective function terms and arg for their argument terms. Retrieval of terms is similar to SQL queries like “select appl from Applications where func=t”, which returns terms whose function term is t. 3.1

Shared Representation of Terms Terms in LEO-II have a perfectly shared representation, that is, all occurrences of syntactically equal terms (in de Bruijn notation) are represented by a single instance. An exception are bound variables, where instances of the same variable may have different de Bruijn indices. The treatment of bound variables is described further in Section 3.4.

Terms are represented as term nodes. Term nodes are numbered by n ∈ {1, 2, . . .} in the order they are created. In the following, term nodes are referred to either by their number or by their graph representation, which is defined as follows: • For each symbol s ∈ Σ occurring in some term, a term node symbol(s) is created. • For each bound variable xi occurring in some term, a term node bound(i) is created. • If an application s@t occurs in some term, where s is represented by term node i and t by term node j, a term node application(i, j) is created. • If an abstraction λt occurs in some term, where t is represented by term node i, a term node abstraction(i) is created.

This graph representation of terms is implemented using hashtables: • Hashtable abstr with scope : IN → IN is used to lookup abstractions with a given scope i. • Hashtable appl with func : IN → IN → IN is used to lookup an application with a given function i and argument j. • Hashtable appl with arg : IN → IN → IN is used to lookup an application with a given argument j and function j. This is similar to appl with func, but the hashtable keys are used here in reversed order.

This hashtable system can be employed to retrieve term nodes in a similar way as in a relational database. It can be used to retrieve single terms as well as sets of terms, for example, all application term nodes whose function term is represented by node i. 3.2

Partial Syntax Trees Term indexing in LEO-II is based on partial syntax trees (PST), a concept that is newly introduced in this paper. Partial syntax trees are used to indicate positions of a symbol or a particular subterm within a term. PSTs are called partial because they only represent relevant parts of a term. Examples are PSTs recording symbol occurrences in a term, where relevant part means, that the term part in question actually contains an occurrence of that symbol. Such PSTs allow for early detection of branches in a term’s syntax tree with no occurrences of a specific symbol, since these branches are not represented in the PST for this symbol.

In LEO-II’s term system (remember that this is based on simply typed λ-calculus with Currying) a term position is defined as follows: • While symbol nodes and bound variable nodes have no children in a term’s syntax tree, abstraction nodes respectively application nodes have exactly one child respectively exactly two children. The relative position of these children to their parent node is described by either abstr (the relation between an abstraction node and its scope), func or arg (the relation between an application node and its function term respectively its argument term). • A position is defined as a (possibly empty) sequence of relative positions. Starting from the top position in a term, each entry of the sequence describes one traversal step in the term’s syntax tree. • An empty sequence of relative positions represents the root position or empty position, which is the topmost position in a term.

Consider, for example, the term (λ.x0)@(f @a). Its subterms occur at the following positions:

Based on this notion of positions, we introduce the notion of partial syntax trees. As an example3, consider the term a = 0 · a, which translates in Curried form to (= @a)@((·@0)@a). The example term’s syntax tree is given by: = a

A partial syntax tree (PST) is a tree of nodes corresponding to positions in a term. Each term position which occurs in a PST is represented as a node which • has up to three child trees4 (these children are partial syntax trees which correspond to the terms at one of the relative positions abstr, func or arg), and • may be annotated by some data.

A partial syntax tree t is denoted by pst (tabstr , tfunc , targ ), where tabstr is the PST of the scope of t if t is an abstraction, and where tfunc and targ are the PSTs of the function term and the argument term of t if t is an application. If no position in a branch of the syntax tree is annotated by some data, this branch is empty and is denoted by an underscore ( ).

The PST corresponding to the whole term in the above example and its annotations is thus given by:

3We present a simple first order example here, since the the treatment of bound variables is special and is described later in Section 3.4.

4The data structure of PSTs can not only be partial, its structure can also exceed the syntax tree of terms as defined in Section 2 if all three children of a node are nonempty. This may be the case when using PSTs to represent coordinates, that is term positions which occur in some term. This is used when building the index as described in Section 3.3, which is similar to Stickel’s path indexing and coordinate indexing methods [Sti89, McC92]. with annotation = with annotation a with annotation · with annotation 0 with annotation a

When the term is added to the index, however, not the PST of the entire term is recorded, but the PSTs of each of the occurring symbols (and subterms). For example, the PST of all occurrences of the symbol a in a = 0 · a is given by: PST for a ·

If a symbol occurs at a term position, the corresponding PST entry is annotated by that symbol. If a branch of a term’s syntax tree has no occurrences of the symbol in question, the PST contains no entry for this branch. The PST for occurrences of symbol a in the above example is thus given by: pa1 = pst ( , pa2, pa4) pa2 = pst ( , , pa3) pa3 = pst ( , , ) pa4 = pst ( , , pa5) pa5 = pst ( , , ) with annotation a with annotation a Similarly, the PSTs for the remaining symbols are recorded:

PST for =

PST for ·

PST for 0

If the PST of all occurrences of a symbol (or subterm) t! in a given term t is available, this provides a basis for speeding up replacements of t!. Also a costly occurs check is avoided, since the existence or non-existence of a PST for a symbol can be used as = criterion. The nodes of the PST for t! determine the nodes in t that have to be modified when performing the replacement operation, and all nodes in t that are not represented in the PST for t! remain unchanged (i.e., the recursion over the term structure for replacement operations is pruned early).

When replacing a by (f @b) in the above example, the operation proceeds as follows: • The operations starts at root position with term (= @a)@((·@0)@a) and with the corresponding PST for a, pa1 = pst ( , pa2, pa4). As both the function child pa2 and the argument child pa3 of the PST are nonempty, the replacement operation recurses over both the function term (= @a) and the argument term ((·@0)@a): • To replace a in (= @a) with corresponding PST pa2 = pst ( , , pa3), only the argument term has to be processed. The child PST corresponding to the function term in pa2 is empty, indicating that there are no further occurrences of a in this term. Thus we have: • Finally a is replaced in the term a with corresponding PST pa3 respectively pa5.

Both pa3 and pa5 have no child nodes and are annotated with a, so the result is in both cases the replacement term (f @b): The result of this operation is thus:

During the operation, only those branches of the syntax tree with an actual occurrence of a are processed and branches with no occurrences of a, here the terms = and (·@0), are avoided. In this example, only five out of nine term nodes have to be processed due to the guidance provided by the PST.

As a probably useful indicator for the speedup for replacements obtainable this way we therefore investigate the ratio of term size to PST size counted in nodes of the tree, that is, the number of abstractions, applications, symbols and bound variables. As is illustrated above, this ratio is a measure for the speedup which we expect for replacement operations.

In the above example, the term size is 9 (we have 9 nodes), which gives the following rates for the occurring symbols: Symbol a = · 0

PST size 5 3 4 4

We examined an excerpt of Jutting’s Automath encoding of Landau’s book Grundlagen der Analysis [vBJ77, Lan30] with over 900 definitions and theorems (see Section 4 for details) and found an average PST size/term size rate of 0.21 for symbol occurrences. When indexing nonprimitive terms, too (that is, applications and abstractions), this rate dropped to 0.12. The index records whether and at which positions a subterm5 occurs in a term. Similar to relational databases, both subterms occurring in a given term and terms in which a given subterm occurs are indexed. Thus, the index can be used to find terms in the database with occurrences of particular subterms and also to speed up logical operations such as substitution by avoiding occurs checks.

The index is built recursively, starting from symbols, which are the leaf nodes in a term’s syntax tree. The only term which occurs in a symbol is the symbol itself at root position. Nonprimitive terms, that is abstractions and applications are built up as follows: • The subterms occurring in an abstraction are all subterms which occur in the scope of the abstraction. For a symbol whose occurrences in a term A are recorded in the PST t!, its occurrences in the abstraction λ.A are given by t = pst (t!, , ). • The subterms occurring in an application are all subterms which occur in its function term or in its argument term. For a symbol whose occurrences in the function and argument term are recorded in the PSTs t!func and t!arg , !the PST t ! recording its occurrences in the application is given by t = pst ( , tfunc, targ ). If the term occurs only in the argument respectively in the function of an application, tfunc respectively targ is empty.

Furthermore each primitive and nonprimitive term is recorded to occur as a subterm of itself at root position.

The result is a PST for each subterm of the term to be indexed, describing the occurrences of this subterm. These PSTs are added to the hashtable occurrences. Additionally, terms are indexed according to their subterms in a second hashtable occurs in. A third hashtable is occurrs at, which is used to index terms according to subterms at a given term position. Thus the core of the index consists of:

5In the current implementation, both occurring symbols and nonprimitive subterms are indexed. However, we plan to further evaluate the tradeoff between the speedup gained this way and the cost for maintenance of the index. Depending on this evaulation, we may want to restrict the nonprimitive subterms to be indexed, for example, by using their size as a criterion. Similar ideas have been examined by McCune [McC92], for example, the effect of limitations on the length of paths used in path indexing. • Hashtable occurrences : IN → IN → PST indexes occurrences of subterms (the second key) in a given term (the first key). The indexed value is a PST of the positions where the subterm occurs. If a subterm does not occur, then there is no entry in the hashtable. • Hashtable occurs in : IN → IN ∗ is used to index a list of all terms in which a given subterm (the key) occurs. • Hashtable occurrs at : pos → IN → IN ∗ is a hashtable to index all terms in which a given subterm (the second key) occurs at a given position (the first key).

For example, occurrences of symbol a in the example term (= @a)@((·@0)@a) are indexed by the following hashtable updates (we assume that a is represented by term node i and (= @a)@((·@0)@a) by term node j): • add pst a with first key j and second key i in occurrences • add j with key i in occurs in • add j to the set hashed in occurs at with first key [func; arg] and second key i; if no such set exists in the hashtable, add the singleton {j} • add j to the set hashed in occurs at with first key [arg; arg] and second key i; if no such set exists in the hashtable, add the singleton {j}

The basic operations of adding a term to the index take constant time (except for rehashing). The indexing of a term of length n takes time O(n). 3.4

Bound Variables Bound variables play a special role in the term system. To see this, remember our example from the beginning, that is, the term λa.λb.((= b)((λc.(cb))a)) or, with de Bruijn indices, λλ.((= x0)((λ.(x0x1))x1)). This example shows that two occurrences of the same bound variable may have syntactically different de Bruijn indices and that the de Bruijn indices of occurrences of different variables may be syntactically equal. It is desirable to provide quick access to all variables bound by a given binder to speedup β-reduction and related operations such as raising or lowering of bound variable indices. We will now illustrate our solution to this issue. Remember that indexed terms are always kept in βη normal form, hence, normalisation is mandatory after instantiation of existential variables or expansion of defined constants (if the modified terms shall be indexed again).

The syntax tree of our example term in de Bruijn notation is

In this case the bound variable b has two instances which are denoted by x0 and x1, while x0 (resp. x1) can denote both c or b (resp. b or a). Since bound variables are indexed as described in Section 3.3, this gives a somewhat scattered information on where to find the variables that are bound by one particular λ-binder. This kind of information, however, is important in practice, for example, to support efficient βreduction. We therefore once more employ PSTs to describe the occurrences of variables bound by one and the same λ-binder: λ1 λ2 Variables bound by λ1

Variables bound by λ2

Variables bound by λ3

For example, the PST indicating occurrences of variables bound by λ2 in the above example is given by: p1 = pst (p2, , ) p2 = pst (p3, , ) p3 = pst ( , p4, p6) p4 = pst ( , , p5) p5 = pst ( , , ) p6 = pst ( , p7, ) p7 = pst (p8, , ) p8 = pst ( , , p9) p9 = pst ( , , ) with annotation 0 with annotation 1 λ1 λ2 Input: Two fluent terms Z1, Z2.

Output: A complete set of substitutitons θ such that Z1 #θAC1 Z2.

1. Deterministically match as many fluents of Z1 as possible to fluents of Z2. Substitute Z1 with the substitution found. If some fluent of Z1 does not match any fluent of Z2, decide Z1 !θAC1 Z2. 2. ObjCon-based deterministically match as many fluents of Z1 as possible to fluents of Z2.

Substitute Z1 with the substitution found. If some fluent of Z1 does not match any fluent of Z2, decide Z1 !θAC1 Z2. 3. Build the substitution graph (V, E) for Z1 and Z2 with nodes v = (µ, i) ∈ V , where µ is a matching candidate for Z1 and Z2, i.e., matches some fluent at position i in Z1 to some fluent in Z2 and i ≥ 1 is referred to as a layer of v. Two nodes (µ1, i1) and (µ2, i2) are connected with an edge iff µ1µ2 = µ2µ1 and i1 = i2. Delete all nodes (µ, i) with Xµ = o, for some X ∈ Vars (Z1) and o ∈ Obj (Z2), and ObjC&on(X, Z1, d) ! ObjCon(o, Z2, d) for some d. Find all cliques of size |Z1| in (V, E). &

Algorithm 1: ObjCon-alltheta. omit the algorithm for computing all cliques in a substitution graph. However, it can be found in [KRS06]. 6.2

Experimental Evaluation Figure 4 depicts the comparison timing results between the LitCon-based subsumption reasoner, referred to as AllTheta, and its ObjCon-based opponent, referred to as FluCaP. The results were obtained using RedHat Linux running on a 2.4GHz Pentium IV machine with 2GB of RAM.

We demonstrate the advantages of exploiting the object-based context information on problems that stem from the colored Blocksworld and Pipesworld planning scenarios. The Pipesworld domain models the flow of oil-derivative liquids through pipeline segments connecting areas, and is inspired by applications in the oil industry. Liquids are modeled as batches of a certain unit size. A segment must always contain a certain number of batches (i.e., it must always be full). Batches can be pushed into pipelines from either side, leading to the batch at the opposite end “falling” into the incident area. Batches have associated product types, and batches of certain types may never be adjacent to each other in a pipeline. Moreover, areas may never have constraints on how many batches of a certain product type they can hold.

For each problem, there have been done 1000 subsumption tests. The time limit of 100 minutes has been allocated. The results show that FluCaP scales better than AllTheta. It is best to observe on the problems of forteen-, twenty-, and thirtyblocks. As empirical results demonstrate, the optimal value of the depth parameter for Blocksworld and Pipesworld is four.

The main reason for the computational gain of FluCaP is that it is less sensitive to the growth of the depth parameter. Under the condition that the number of objects in a state is strictly less than the number of fluents and other parameters are fixed, the amount of object-based context information is strictly less than the amount of the literal-based context information. Moreover, on the Pipesworld problems, FluCaP requires two orders of magnitude less time than AllTheta. 10 blocks 3Context depth 4

20 blocks 3Context depth 4 6 Areas

FluCaP AlTheta FluCaP

AlTheta 40 s 35 ,m 30 e itm 25 iton 20 pm 15 u sb 10 uS 5 0 2 40 s 35 ,m 30 e itm 25 iton 20 pm 15 u sb 10 uS 5

0 2 s 10000 m it,em 1000 n top 100 i m u sub 10 S 1 2 3 Co4ntext dep5th 5 5 40 s 35 ,m 30 e itm 25 iton 20 pm 15 u sb 10 uS 5 0 2 40 s 35 ,m 30 e itm 25 iton 20 pm 15 u sb 10 uS 5 0 2 100000 ,sm 10000 e tm 1000 i n itsuuobpSm 10100 1 40 s 35 ,m 30 e itm 25 iton 20 pm 15 u sb 10 uS 5 0 2 1000 s iemm 100 , t n o it supm 10 b u S FluCaP

AlTheta 3Context depth 4

30 blocks 3Context depth 4 8 Areas

FluCaP AlTheta 5 5 FluCaP

AlTheta 3Context depth 4 4 Areas

5 1 2 3 Co4ntext dep5th 10 Areas

FluCaP

AlTheta 2 3 Co4ntext depth 5 2 3 Co4ntext depth 5 We follow the symbolic dynamic programming (SDP) approach within Situation Calculus (SC) of [BRP01] in using first-order state abstraction for FOMDPs. In the course of first-order value iteration, a state space may contain redundant abstract states that dramatically affect the algorithm’s efficiency. In order to achieve computational savings, normalization must be performed to remove this redundancy. However, in the original work by [BRP01] this was done by hand. To the best of our knowledge, the preliminary implementation of the SDP approach within SC uses human-provided rewrite rules for logical simplification. In contrast, [HS04] have developed an automated normalization procedure for FOVI brings the computational gain of several orders of magnitude. Another crucial difference is that our algorithm uses heuristic search to limit the number of states for which a policy is computed.

The ReBel algorithm by [KvOdR04] relates to LIFT-UP in that it also uses a representation language that is simpler than Situation Calculus. This feature makes the state space normalization computationally feasible.

All the above algorithms can be classified as deductive approaches to solving FOMDPs. They can be characterized by the following features: (1) they are model-based, (2) they aim at exact solutions, and (3) logical reasoning methods are used to compute abstractions. We should note that FOVI aims at exact solution for a FOMDP, whereas LIFT-UP, due to the heuristic search that avoids evaluating all states, seeks for an approximate solution. Therefore, it would be more appropriate to classify LIFT-UP as an approximate deductive approach to FOMDPs.

In another vein, there is some research on developing inductive approaches to solving FOMDPs, e.g., by [FYG03]. The authors propose the approximate policy iteration (API) algorithm, where they replace the use of cost-function approximations as policy representations in API with direct, compact state-action mappings, and use a standard relational learner to learn these mappings. A recent approach by [GT04] proposes an inductive policy construction algorithm that strikes a middle-ground between deductive and inductive techniques. 8

Conclusions We have proposed a new approach that combines heuristic search and first-order state abstraction for solving first-order MDPs more efficiently. In contrast to existing systems, which start with propositionalizing the decision problem at the outset of any solution attempt, we perform lifted reasoning on the first-order structure of an MDP directly. However, there is plenty remaining to be done. For example, we are interested in the question of to what extent the optimization techniques applied in modern propositional planners can be combined with first-order state abstraction.

Acknowledgements We thank anonymous reviewers for their comments on the previous versions of this paper. We also thank Zhengzhu Feng for fruitful discussions and for providing us with the executable of the symbolic LAO∗ planner. Olga Skvortsova was supported by a grant within the Graduate Programme GRK 334 “Specification of discrete processes and systems of processes by operational models and logics” under auspices of the Deutsche Forschungsgemeinschaft (DFG). [BBS95] [BDH99] [Bel57] [BRP01]

A. G. Barto, S. J. Bradtke, and S. P Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1-2):81–138, 1995.

C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1–94, 1999.

C. Boutilier, R. Reiter, and B. Price. Symbolic Dynamic Programming for First-Order MDPs. In Bernhard Nebel, editor, Proceedings of the Seventeenth International Conference on Artificial Intelligence (IJCAI’2001), pages 690–700. Morgan Kaufmann, 2001. [DKKN95] T. Dean, L. Kaelbling, J. Kirman, and A. Nicholson. Planning under time constraints in stochastic domains. Artificial Intelligence, 76:35–74, 1995. [GT04] [HS90] [HS04] [HZ01] [KL94] [KN86] [KRS06]

Z. Feng and E. Hansen. Symbolic heuristic search for factored Markov Decision Processes. In R. Dechter, M. Kearns, and R. Sutton, editors, Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI’2002), pages 455–460, Edmonton, Canada, 2002. AAAI Press.

A. Fern, S. Yoon, and R. Givan. Approximate policy iteration with a policy language bias. In S. Thrun, L. Saul, and B. Sch¨olkopf, editors, Proceedings of the Seventeenth Annual Conference on Neural Information Processing Systems (NIPS’2003), Vancouver, Canada, 2003. MIT Press.

C. Gretton and S. Thiebaux. Exploiting first-order regression in inductive policy selection. In M. Chickering and J. Halpern, editors, Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI’2004), Banff, Canada, July 2004. Morgan Kaufmann.

S. H¨olldobler and J. Schneeberger. A new deductive approach to planning. New Generation Computing, 8:225–244, 1990.

S. H¨olldobler and O. Skvortsova. A Logic-Based Approach to Dynamic Programming. In Proceedings of the Workshop on “Learning and Planning in Markov Processes–Advances and Challenges” at the Nineteenth National Conference on Artificial Intelligence (AAAI’04), pages 31–36, San Jose, CA, July 2004. AAAI Press.

E. Hansen and S. Zilberstein. LAO*: A heuristic search algorithm that finds solutions with loops. Artificial Intelligence, 129:35–62, 2001.

J.-U. Kietz and M. Lu¨bbe. An efficient subsumption algorithm for inductive logic programming. In Proceedings of the Eleventh International Conference on MAchine Learning, pages 130–138, 1994.

D. Kapur and P. Narendran. NP-completeness of the set unification and matching problems. In J.H.Siekman, editor, Proceedings of the Conference on Automated Deduction, pages 489–495. Springer, Berlin, 1986. Lecture Notes in Computer Science 230.

E. Karabaev, G. Ramm´e, and O. Skvortsova. Efficient symbolic reasoning for first-order MDPs. In Proceedings of the Workshop on ”Planning, Learning and Monitoring with Uncertainty and Dynamic Worlds” at the Seventeenth European Conference on Artificial Intelligence (ECAI’2006), Riva del Garda, Italy, 2006. To appear. [KvOdR04] K. Kersting, M. van Otterlo, and L. de Raedt. Bellman goes relational. In C. E. Brodley, editor, Proceedings of the Twenty-First International Conference in Machine Learning (ICML’2004), pages 465–472, Banff, Canada, July 2004. ACM. [Put94]

M. L. Puterman. Markov Decision Processes - Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, 1994. [Rob65]

R. Reiter. The Frame Problem in the Situation Calculus: A Simple Solution (Sometimes) and a Completeness Result for Goal Regression. In Vladimir Lifschitz, editor, Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of John McCarthy, pages 359–380. Academic Press, San Diego, CA, 1991.

J.A. Robinson. A machine-learning logic based on the resolution principle.

Journal of the Association for Computing Machinery, 12(1):23–41, 1965. [SHW96] [Thi98]

T. Scheffer, R. Herbrich, and F. Wysotzki. Efficient θ-subsumption based on graph algorithms. In Proceedings of the 6th International Workshop on Inductive Logic Programming, pages 212–228, Berlin, August 1996. volume 1314 of LNAI.

M. Thielscher. Introduction to the fluent calculus. Electronic Transactions on Artificial Intelligence, 2(3-4):179–192, 1998. [YLWA05] H.L.S. Younes, M.L. Littman, D. Weissman, and J. Asmuth. The first probabilistic track of the International Planning Competition. Journal of Artificial Intelligence Research, 24:851–887, 2005. Multiple Preprocessing for Systematic SAT Solvers

Anbulagan1, John Slaney1,2 1Logic and Computation Program, National ICT Australia Ltd. 2Computer Sciences Laboratory, Australian National University {anbulagan, john.slaney}@nicta.com.au

High-performance SAT solvers based on systematic search generally use either conflict driven clause learning (CDCL) or lookahead techniques to gain efficiency. Both styles of reasoning can gain from a preprocessing phase in which some form of deduction is used to simplify the problem. In this paper we undertake an empirical examination of the effects of several recently proposed preprocessors on both CDCL and lookahead-based SAT solvers. One finding is that the use of multiple preprocessors one after the other can be much more effective than using any one of them alone, but that the order in which they are applied is significant. We intend our results to be particularly useful to those implementing new preprocessors and solvers. 1 In the last decade, the propositional satisfiability (SAT) has become one of the most interesting research problems within artificial intelligence (AI). This tendency can be seen through the development of a number of powerful SAT solvers, based on either systematic search or stochastic local search (SLS), for solving various hard combinatorial search problems such as automatic deduction, hardware and software verification, planning, scheduling, and FPGA routing.

The power of contemporary systematic SAT solvers derives not only from the underlying Davis-Putnam-Logemann-Loveland (DPLL) algorithm but also from enhancements aimed at increasing the amount of unit propagation, improving the choices of variables for splitting or making backtracking more intelligent. Two of the most important such enhancements are conflict driven clause learning (CDCL), made practicable on a large scale by the watched literal technique, and one-step lookahead. These two tend to exclude each other: the most successful solvers generally incorporate one or the other but not both. The benefits they bring are rather different too, as is clear from the results of recent SAT competitions. For problems in the “industrial” category, CDCL, as implemented in MINISAT [ES03, SE05], siege [Rya04] and zChaff [MMZ+01, ZMMM01] is currently the method of choice. On random problems, however, lookahead-based solvers such as Dew Satz [AS05], Kcnfs [DD01] and March dl [HvM06] perform better.

Lookahead, of course, is expensive at every choice node, while clause learning is expensive only at backtrack points. Since half of the nodes (plus one) in any binary tree are leaves, this difference is significant for lookahead-based solvers which process nodes relatively slowly and gain, if at all, by reducing the search tree size. Looking ahead is an investment in at-node processing which can pay off only if it results in more informed choices with an impact on the total number of nodes visited. Where learnt clauses prune the tree at least as effectively as complex choice heuristics, CDCL must win. This seems to be the case in many classes of highly structured problems such as the “industrial” ones in the SAT competitions. We have no clearer definition than anyone else of “structure”, but are interested to find ways in which lookahead-based solvers might detect and exploit it as well as the clause learners do.

A noteworthy feature of many recent systems is a preprocessing phase, often using inference by some variant of resolution, to transform problems prior to the search. One suggestion we wish to make and explore is that such transformations may help lookahead-based solvers to discover useful structure. That is, much of the reasoning done by nogood inference might be done cheaply “up-front”, provided that the subsequent variable choice heuristics are good enough to exploit it. In what follows, we are therefore concerned mainly with the effects of preprocessing, including those of using multiple preprocessors in series, on the performance of a lookahead-based solver Dew Satz on problems where it does poorly in comparison with a clause learning solver (MINISAT). We also include some results and remarks on benefits to be gained in the opposite direction, where MINISAT is helped by preprocessing to attack problems to which Dew Satz is more suited.

In this paper we propose a multiple preprocessing technique to boost the performance of systematic SAT solvers. The motivation for applying multiple preprocessors prior to the systematic search process is clear: each preprocessor uses a different strategy in objective to simplify clause sets derived from real-world problems that exhibit a great deal structure such as symmetries, variable dependencies, clustering, and the like. Our initial observation showed that each strategy works well for simplifying the structure of some problems, at most of the time, from hard to easy. When a problem exhibits different kinds of structure, then a single preprocessor has difficulty to simplify the structures. In this case, we need to run multiple preprocessors one after the other.

We report performance statistics for the two solvers, Dew Satz and MINISAT, with and without combinations of five (and for one problem set, six) of the best contemporary SAT preprocessors when solving parity, planning, bounded model checking and FPGA routing benchmark problems from SATLIB and the recent SAT competitions. One finding is that the use of multiple preprocessors one after the other can be much more effective than using any one of them alone, but that the order in which they are applied is significant. We intend our results to be particularly useful to those implementing new preprocessors and solvers.

The rest of the paper is organized as follows: section 2 addresses the related work. In sections 3 and 4, we briefly describe the preprocessors and solvers examined in our study. The main part of the paper consists of experimental results, and we conclude with a few remarks and suggestions. Resolution-based SAT preprocessors for CNF formula simplification have a dramatic impact on the performance of even the most efficient SAT solvers on many benchmark problems [LMS01]. The simplest preprocessor consists of just computing length-bounded resolvents and deleting duplicate and subsumed clauses, as well as tautologies and any duplicate literals in a clause.

There are two most directly related works. The first one is that of Anbulagan et al. [APSS06] which examined the integration of five resolution-based preprocessors alone or the combination of them with stochastic local search (SLS) solvers. Their experimental results show that SLS solvers benefit the present of resolution-based preprocessing and multiple preprocessing techniques. And the second one is that of Lynce and Marques-Silva [LMS01]. They only evaluated empirically the impact of some preprocessors developed before 2001 including 3-Resolution, without considering multiple preprocessing, on the performance of systematic SAT solvers. In recent years, many other preprocessors, which are sophisticated, have been applied to modern propositional reasoners. Among them are 2-SIMPLIFY [Bra01], the preprocessor in Lsat [OGMS02] for recovering and exploiting Boolean gates, HyPre [Bac02, BW04], Shatter [ASM03] for dealing with symmetry structure, NiVER [SP05] and SatELite [EB05]. We consider some of these preprocessors plus 3-Resolution in our study. 3

SAT Preprocessors We describe briefly the six SAT preprocessors used in the experiments. The first five are all based on resolution and its variants such as hyper-resolution. Resolution [Qui55, DP60, Rob65] itself is widely used as a rule of inference in first order automated deduction, where the clauses tend to be few in number and contain few literals, and where the reasoning is primarily driven by unification. As a procedure for propositional reasoning, however, resolution is rarely used on its own because in practice it has not been found to lead to efficient algorithms. The sixth preprocessor is a special-purpose tool for symmetry detection, which is important for one problem class in the experiments. 3.1

3-Resolution k-Resolution is just saturation under resolution with the restriction that the parent clauses are of length at most k. The special cases of 2-Resolution and 3-Resolution are of most interest. 3-Resolution has been used in a number of SAT solvers, notably Satz [LA97] and the SLS solver R+AdaptNovelty+ [APSS05] which won the satisfiable random problem category in the SAT2005 competition. Since it is the preprocessor already used by Satz, we expect it to work well with Dew Satz. 3.2

2-SIMPLIFY 2-SIMPLIFY [Bra01] constructs an implication graph from all binary clauses in the problem. Where there is an implication chain from a literal X to X, X can be deduced as a unit which can be propagated. The method also collapses strongly connected components, propagates shared implications, or literals implied in the graph by every literal in a clause, and removes some redundant binary clauses. Experimental results [Bra01, Bra04] show that systematic search benefits markedly from 2-SIMPLIFY on a wide range of problems. 3.3

HyPre HyPre [BW04] also reasons with binary clauses, but incorporates full hyper-resolution, making it more powerful than 2-SIMPLIFY. In addition, unit reduction and equality reduction are incrementally applied to infer more binary clauses. It can be costly in terms of time, but since it is based explicitly on hyper-resolution it avoids the space explosion of computing a full transitive closure. HyPre has been used in the SAT solver, 2CLS+EQ [Bac02], and we consider it a very promising addition to many other solvers. It is generally useful for exploiting implicational structure in large problems. 3.4

NiVER Variable Elimination Resolution (VER) is an ancient inference method consisting of performing all resolutions on a chosen variable and then deleting all clauses in which that variable occurs, leaving just the resolvents. It is easy to see that this is a complete decision procedure for SAT problems, and almost as easy to see that it is not practicable because of exponential space complexity. Recently, Subbarayan and Pradhan [SP05] proposed NiVER (Non increasing VER) which restricts the variable elimination to the case in which there is no increase in the number of literals after elimination. This shows promise as a SAT preprocessor, improving the performance of a number of solvers [SP05]. 3.5

SatELite E´en and Biere [EB05] proposed the SatELite preprocessor, which extends NiVER with a rule of Variable Elimination by Substitution. Several additions including subsumption detection and improved data structures further improved performance in both space and time. SatELite was combined with MINISAT to form SatELiteGTI, the system which dominated the SAT2005 competition on the crafted and industrial problem categories. Since we use MINISAT for our experiments, it is obvious that SatELite should be one of the preprocessors we consider. 3.6

Shatter It is clear that eliminating symmetries is essential to solving realistic instances of many problems. None of the resolution-based preprocessors does this, so for problems that involve a high degree of symmetry we added Shatter [AMS03] which detects symmetries and adds symmetry-breaking clauses. These always increase the size of the clause set and for satisfiable problems they remove some of the solutions, but they typically make the problem easier by pruning away isomorphic copies of parts of the search space. 4.1

MINISAT As noted in Section 1, we concentrate on just two solvers: MINISAT, which relies on clause learning, and Dew Satz, which uses lookahead.

So¨rensson and E´en [ES03, SE05] released the MINISAT solver in 2005. Its design is based on Chaff, particularly in that it learns nogoods or “conflict clauses” and accesses them during the search by means of two watched literals in each clause. MINISAT is quite small (a few hundred lines of code) and easy to use either alone or as a module of a larger system. Its speed in comparison with similar solvers such as zChaff comes from a series of innovations of which the most important are an activity-decay schedule which proceeds by frequent small reductions rather than occasional large ones, and an inference rule for reducing the size of conflict clauses by introducing a restricted subsumption test. The cited paper contains a brief but informative description of these ideas. The solver Dew Satz [AS05] is a recent version of the Satz solver [LA97]. Like its parent Satz, it gains efficiency by a restricted one-step lookahead scheme which rates some of the neighbouring variables every time a choice must be made for branching purposes. Its lookahead is more sophisticated than the original one of Satz, adding a DEW (dynamic equality weighting) heuristic to deal with equalities. This enables the variable selection process to avoid duplicating the work of weighting variables detected to be equivalent to those already examined. Thus, while the solver has no special inference mechanism for propositional equalities, it does deal tolerably well with problems containing them. 5 We present results on four benchmark problem sets chosen to present challenges for one or other or both of the SAT solvers. The experiments were conducted on a cluster of 16 AMD Athlon 64 processors running at 2 GHz with 2 GB of RAM. Ptime in the tables represents preprocessing time, while Stime represents solvers runtime without including Ptime. The timebound of Stime is 15,000 seconds per problem instance. It is worth noting that in our study the results of SatELiteGTI, the solver which dominated the SAT2005 competition on the crafted and industrial problem categories, are represented by the results of SatELite+MINISAT. 5.1

The 32-bit Parity Problem The 32-bit parity problem was listed by Selman et al. [SKM97] as one of ten challenges for research on satisfiability testing. The ten instances of the problem are satisfiable. The first response to this challenge was by Warners and van Maaren [WvM98] who solved the par32-*-c problem (5 instances) using a special-purpose preprocessor to deal with equivalency conditions. Two years later, Li [Li00] solved the ten par32* instances by enhancing Satz’s search process with equivalency reasoning. Ostrowski et

Prep. par32-5 par32-1-c par32-2-c par32-3-c par32-4-c par32-5-c

Orig 3Res Hyp+3Res Niv+3Res 3Res+Hyp Sat+3Res Orig 3Res Hyp+3Res Niv+3Res Orig 3Res Hyp+3Res Niv+3Res Sat+3Res Orig 3Res Sat Hyp+3Res 3Res+Hyp Niv+3Res 3Res+Niv 3Res+Sat Sat+3Res Sat+2Sim Orig Niv Orig 3Res Hyp+3Res Orig 3Res Hyp+3Res Orig 3Res Hyp Hyp+3Res Niv+3Res Sat+3Res Orig 3Res Hyp+3Res Niv+3Res Sat+3Res Orig Niv+3Res 3176/10227/27501 2418/7463/19750 1313/6193/17203 1315/5948/16707 1313/5495/15810 849/5245/18660 3176/10253/27405 2392/7387/19550 1301/5975/16719 1303/5730/16223 3176/10297/27581 2395/7437/19738 1323/5961/16779 1325/5716/16283 848/5284/18878 >15,000

10,651

n/a 17,335,530 17,391,333 13,476,105 17,335,492 34,569,968

n/a 9,341,185 8,186,883 3,889,345

n/a 9,711,576 9,708,520 9,710,552 2,206,369

n/a 10,036,154 18,230,746 17,712,997 10,036,146 10,036,154 25,092,756 7,744,986 7,744,986 18,230,746

n/a 8,440,212 9,669,012 21,738,376 8,013,977 22,878,571 n/a al. [OGMS02], solved the problems with Lsat, which performs a preprocessing step to recover and exploit the logical gates of a given CNF formula and then applies DPLL with a Jeroslow-Wang branching rule. The challenge has now been met convincingly by Heule et al. [HvM04] with their March eq solver, which combines equivalency reasoning in a preprocessor with a lookahead-based DPLL and which solves all of the par32* instances in seconds. Dew Satz is one of the few solvers to have solved any instances of the 32-bit parity problem without special-purpose equivalency reasoning [AS05].

Table 1 shows the results of running the lookahead-based solver Dew Satz and the CDCL-based solver MINISAT on the ten par32 instances, with and without preprocessing. As preprocessors we used 3-Resolution, HyPre, NiVER and SatELite alone and followed by 3-Resolution for the last three. We eliminated 2-SIMPLIFY from this test as it aborted the resolution process of the first five par32* instances presented in the Table 1. We experimented also with all combination of two preprocessors for the problems par32-1 and par32-4. Where lines are omitted from the table (e.g. there is no line for HyPre on par32-1 and for SatELite+3-Resolution on par32-2), this is because no single solver produced a solution for those simplified instances.

It is evident from the table that these problems are seriously hard for both solvers. Even with preprocessing, MINISAT times out on all of them except for par32-2 and par32-5-c. Curiously, on par32-2 instance, preprocessing with 3-Resolution makes its performance degrade a little. This is not a uniform effect: Table 4 below shows examples in which MINISAT benefits markedly from 3-Resolution. Without preprocessing, Dew Satz times out on nine of ten par32 instances, but in every case except par32-5 and par32-5-c 3-Resolution suffices to help it find a solution, and running multiple preprocessors improves its performance.

In general, Table 1 shows that multiple preprocessing contributes significantly to enhance the performance of Dew Satz and the preprocessor 3-Resolution dominates the contribution through either single or multiple preprocessing. 5.2

A Planning Benchmark Problem The ferry planning benchmark problems, taken from SAT2005 competition, are all easy for MINISAT, which solves all of them in about one second without needing preprocessors. Dew Satz, however, is challenged by them. The problems are satisfiable. We show the Dew Satz and MINISAT results on the problems in Table 2. Clearly the original problems contain some structure that CDCL is able to exploit but which is uncovered by one-step lookahead. It is therefore interesting to see which kinds of reasoning carried out in a preprocessing phase are able to make that same structure available to Dew Satz. Most strikingly, reasoning with binary clauses in the manner of the 2-SIMPLIFY preprocessor reduces runtimes by upwards of four orders of magnitude in some cases. HyPre, NiVER and SatELite, especially HyPre, are also effective on these planning problems. In most cases the number of backtracks reduces from million to less than 100 or even zero for ferry8 v01a, ferry9 v01a, and ferry10 ks99a instances which means that the input formula is solved at the root node of the search tree. ferry7 ks99i ferry7 v01i ferry8 ks99a ferry8 ks99i ferry8 v01a ferry8 v01i ferry9 ks99a ferry9 v01a

Orig 3Res Hyp Niv Sat Sat+2Sim Orig 3Res Sat Sat+2Sim Orig 3Res Hyp Sat Sat+2Sim Orig 3Res Sat Sat+2Sim+3Res Sat+2Sim+Hyp+3Res Orig 3Res Hyp Hyp+Sat Hyp+Sat+3Res Hyp+Sat+2Sim Orig 3Res Sat Sat+2Sim+3Res Orig 3Res Hyp Niv Sat 2Sim Hyp+Sat Sat+2Sim Orig 3Res Hyp Hyp+Sat Hyp+Sat+2Sim Orig 3Res Hyp Niv Sat 2Sim Sat+2Sim 3Res+2Sim+Niv Niv+Hyp+2Sim+3Res Orig 3Res Hyp Hyp+Sat+3Res Hyp+Sat+2Sim 1946/22336/45706 1930/22289/45621 1881/32855/66732 1543/21904/45243 1286/21601/50644 1279/56597/120318 1329/21688/50617 1329/21681/50505 1286/21609/50803 1286/64472/136299 1272/62208/131357 1259/15259/31167 1241/15206/31071 813/14720/34687 813/35008/75263 2547/32525/66425 2529/32472/66329 2473/48120/97601 1696/31589/74007 1683/83930/178217 854/14819/34624 854/14811/34480 846/38141/81268 813/38044/81364 813/38028/81196 813/36583/78442 1745/31688/73934 1745/31680/73790 1696/31598/74202 1696/96092/202904 1598/21427/43693 1578/21368/43586 1542/29836/60522 1244/21046/43264 1042/20765/48878 1569/20563/41976 1056/26902/72553 1042/50487/108322 1088/20878/48771 1088/20869/48591 1079/55371/117757 1042/55256/117861 1042/53394/114135 1977/29041/59135 1955/28976/59017 1915/40743/82551 1544/28578/58619 1299/28246/66432 1945/27992/57049 1299/69894/149728 1793/21099/43369 1532/24524/50463 1350/28371/66258 1350/28361/66038 1340/77030/163576 1299/76874/163442 1299/74615/159134 n/a 0.09 0.21 0.01 0.33 0.49 n/a 0.14 0.17 0.95 1.26 n/a n/a n/a n/a n/a n/a 1 0 3,949 bmc-ibm-3 bmc-galileo-8 bmc-galileo-9 bmc-ibm-10 bmc-ibm-11

Orig Hyp Niv 3Res Sat Hyp+3Res Orig Hyp Niv 3Res Sat Hyp+3Res Sat+3Res Orig Hyp Niv 3Res Sat Hyp+3Res Orig Hyp Niv 3Res Sat Orig Hyp Niv 3Res Sat Orig Hyp Niv 3Res Sat Niv+Hyp+ 3Res Orig Hyp Niv 3Res Sat 3Res+Niv+ Hyp+3Res Orig Sat Sat+Hyp Sat+Niv Sat+3Res Sat+2Sim Orig Sat Sat+Hyp Sat+Niv Sat+3Res Sat+2Sim Sat+2Sim+ 3Res 12001/100114/253071 13215/65728/174164 5010/27248/78059 9226/57332/161962 10426/49594/129998 4549/34273/110676 3529/22589/62633 663443/3065529/7845396 12408/76025/247622 9091/61789/203593 12356/75709/246367 12404/77805/249192 10457/71128/229499 1080015/3054591/7395935 23657/112343/364874 13235/88976/263053 22983/108603/351369 23657/117795/380389 17470/129245/375444 16837/98726/305057 n/a 3.72 0.47 0.41 1.32 3.98 n/a 416 1.06 5.91 6.46 417 10.12 n/a 129 566 130 130 131 n/a 4,662,067 1,738

379 2,088 2,374 1,744 206 462

2 1,148 1,182 762

2 799 1,186

2 1,148 1,060 1,009

2 4,277

0 3,532 3,276 2,502 6,422

160 5,607 7,875 5,481 11,887 1,513 8,702 10,243 6,219 1,937 8,088 1,018 9,181 30,687 9,324

01-k15 01-k20 26-k70 26-k75 26-k85 26-k90

Orig Hyp Niv 3Res Sat 2Sim Orig 3Res+Sat+ Niv+Hyp 3Res+Hyp+ Niv+3Res 3Res+Hyp+ 3Res Hyp Niv 3Res Sat Orig 3Res+Hyp+ Niv+3Res Hyp Niv 3Res Sat 2Sim Orig 3Res Hyp Niv Sat Orig 3Res Hyp Niv Sat Orig 3Res Hyp Niv Sat Orig Niv+3Res Hyp+3Res 3Res Hyp Niv Sat 9275/38802/98468

n/a 6662/33394/90715 6498/27318/70158 3418/19648/62925 4379/59765/133585 11524/48585/123966 3382/25936/79364 4203/23731/60639 4732/24972/63133 4889/27056/71819 8068/41368/113317 9403/40059/103137 5198/30697/97961 15069/63760/163081 6382/34846/89807 7323/39150/104635 10533/54293/149192 12948/55490/142966 7179/42837/136537 9370/93921/217635 346561/1752741/4579945 346561/1756001/4588705 243461/1569549/4182061 155221/1354556/4075072 132670/1300914/4980854 371091/1877066/4904440 371091/1880536/4913780 260621/1680704/4477966 166195/1450679/4364543 141870/1392526/5327557 420151/2125716/5553430 420151/2129606/5563930 294941/1903014/5069776 187631/1641901/4941437 160270/1576770/6039510 n/a 150 338 479 109 n/a 161 364 4.95 117 n/a 183 417 5.69 132 n/a 121 600 195 446 5.91 140 1,420

190 1,472 n/a 366 2,860 140

1 130,013 13,449 19,701 178,245 n/a n/a n/a n/a 29,629 n/a n/a n/a n/a n/a n/a

1 n/a n/a n/a n/a

1 n/a n/a n/a 3,743 243 603 3,548 3,783 3,655 16,658 1,261 5,182 2,069 9,341 8,705 1,977 2,654,614 10 642 492 1,503,271 2,880,376 11 654 474 2,067,948 n/a 5 5 10 583 429 3,311,629

Bounded Model Checking Problems Another domain providing benchmark problem sets which appear to be easy for MINISAT but sometimes hard for Dew Satz is bounded model checking. In Table 3 we report results on five of eleven BMC-IBM problems, two BMC-galileo problems and two of four BMC-alpha problems. All other benchmark problems in the BMC-IBM class are easy for both solvers and so are omitted from the table. The other two BMC-alpha instances are harder than the two reported even for MINISAT before and after preprocessing. The problems presented in Table 3 are satisfiable.

Each of these bounded model checking problems is brought within the range of Dew Satz by some form of preprocessing. In general, HyPre and 3-Resolution are the best for this purpose, especially when used together, though on problem BMC-IBM-13 they are ineffective without the additional use of NiVER. The column showing the number of times Dew Satz backtracks is worthy of note. In many cases, preprocessing reduces the problem to one that can be solved without backtracking. Solving “without backtracking” has to be interpreted with care here, of course, since a nontrivial amount of lookahead may be required in a “backtrack-free” search. The results for BMC-galileo-9 furnish a good example of this: HyPre takes 407 seconds to refine the problem, following which Dew Satz spends 90 seconds on lookahead reasoning while constructing the first (heuristic) branch of its search tree, but then that branch leads directly to a solution. Adding 3-Resolution to the preprocessing step does not change the number of variables, and only slightly reduces the number of clauses, but it roughly halves the time subsequently spent on lookahead.

The instance BMC-alpha-4408 is hard for Dew Satz even after preprocessing. While MINISAT with multiple preprocessing solves the problem instance with an order of magnitude faster. We can also observe that HyPre brings more benefit than SatELite,

Table 4 shows results for both solvers on a related problem set consisting of formal verification problems taken from the SAT2005 competition. The IBM-FV-01 problems are satisfiable except for the problem IBM-FV-01-k10; the IBM-FV-26 problems are unsatisfiable. Most of these satisfiable problems are easy for MINISAT, but the unsatisfiable cases show that the SatELite preprocessor (with which MINISAT was paired in the competition) is by far the least effective of the four we consider for MINISAT on these problems. The preprocessor HyPre proved the unsatisfiability of IBM-FV-01-k10 in 1.27 seconds. 2-SIMPLIFY was not used to simplify the IBM-FV-26 problems, because it is limited for input formula with maximum 100,000 variables. Again there are cases in which Dew Satz is improved from a 15,000 second timeout to a one-branch proof of unsatisfiability. Note that the numbers of clauses in these cases are actually increased by the preprocessor 3-Resolution, confirming that the point of such reasoning is to expose structure rather than to reduce problem size. 5.4

A Highly Symmetrical Problem FPGA routing problem is a higly symmetrical problem that model the routing of wires in the channels of field-programmable integrated circuits [AMS03]. The problem instances used in the experiment, which were artificially designed by Fadi Aloul, are taken from SAT2002 competition.

Without preprocessing to break symmetries, many of the FPGA routing problems are hard—harder for CDCL solvers than for lookahead-based ones. Not only do they have many symmetries, but the clause graphs are also disconnected. Lookahead techniques with neighbourhood variables ordering heuristic seem able to choose inferences within one graph component before moving to another, whereas MINISAT jumps frequently between components. Table 5 shows performances of both solvers on FPGA routing problem set. Of 21 selected satisfiable (bart) problems, MINISAT solves 8 in some 2 hours. It manages better with the unsatisfiable (homer) instances, solving 14 of 15 in a total time of around 6 hours. Dew Satz solves all of the bart problems in 17.5 seconds and the homer ones in 45 minutes.

The detailed results for two of the satisfiable problems and two unsatisfiable ones (Table 6) are interesting. The resolution-based preprocessors do not give any modification to the size of the input formula except when using SatELite. The Shatter preprocessor, which removes certain symmetries, is tried on its own and in combination with the five resolution-based preprocessors. It should be noted that the addition of symmetry-breaking clauses increases the sizes of the problems, but of course it greatly reduces the search spaces in most cases.

The performance of Dew Satz after preprocessing is often worse in terms of time than it was before, though there is always a decreases in the size of its search tree. This is because of the increase in the problem size which increases the amount of lookahead process. MINISAT, by contrast, sometimes speeds up by several orders of magnitude after preprocessing. bart (21 SAT) homer (15 UNSAT)

Dew Satz Stime 17.52 2,662 #Solved

8 14

MINISAT Stime 7,203 22,183

#Conflict 119,782,466 143,719,166

Order of Preprocessors

Prep. bart28 bart30 homer19 homer20

Orig Sat Sha Sha+3Res Sha+Hyp Sha+Niv Sha+Sat Sha+2Sim Orig Sat Sha Sha+3Res Sha+Hyp Sha+Niv Sha+Sat Sha+2Sim Orig Sat Sha Sha+3Res Sha+Hyp Sha+Niv Sha+Sat Sha+2Sim Orig Sat Sha Sha+3Res Sha+Hyp Sha+Niv Sha+Sat Sha+2Sim 428/2907/7929 413/2892/11469 1825/8407/27003 1764/7702/24400 1764/8349/26138 1781/8358/26759 1728/8254/30422 1750/7892/24682 485/3617/9954 468/3600/14544 2017/9649/30874 1945/8686/27492 1945/9348/29218 1969/9599/30625 1776/8830/33533 1919/8758/27287 330/2340/4950 300/2310/8400 1460/6764/20242 1388/5748/16914 1387/6547/18865 1412/6715/19993 1201/5846/19288 1348/5639/16110 440/4220/8800 400/4180/15200 1999/10340/29988 1907/8793/25027 1905/10527/29129 1941/10276/29671 1723/9420/30986 1879/9419/26188 0 0 9 1 20 6 0 17 20,160 0 96 224 28,270,212 1 1 9 19,958,400

n/a 4,828,639 7,189,966 9,611,768 5,202,084 6,236,966

678,425

Hyp+3Res+Niv Niv+Hyp+3Res 3Res+Hyp+Niv 3Res+Niv+Hyp 2Sim+Niv+Hyp+3Res Niv+3Res+2Sim+Hyp 3Res+2Sim+Niv+Hyp Niv+Hyp+2Sim+3Res 2Sim+Niv 2Sim+Niv+2Sim 2Sim+Niv+2Sim+Niv 2Sim+Niv+2Sim+Niv+2Sim 10805/83643/204679 12001/100114/253071 10038/82632/221890 11107/99673/269405 1518/32206/65806 1532/25229/51873 1793/20597/42365 1532/24524/50463 1518/27554/56565 1518/18988/39433 1486/18956/39429 1486/23258/48033

Ptime 96.11 85.81 89.56 58.38 0.43 0.49 0.56 0.54 0.08 0.27 0.29 0.48 >15,000 >15,000

198 2,458 >15,000 5.46 115 19.12 >15,000 >15,000 >15,000

4,149 >15,000 >15,000 >15,000 >15,000 10,233 5,621 2.26 1.14 1.90 3.15 1.48 0.70 >15,000 11,448 1.83 1.41 1.10 0.91 1.00 0.40

Stime >15,000

106 >15,000 >15,000 >15,000 11,345 907 5.19 n/a n/a 775,639 7,676,459

n/a 53,683 684,272 150,838 n/a n/a n/a 7,594,231 n/a n/a n/a n/a 51,960,410 54,469,568 33,492 21,669 30,418 45,484 26,517 14,682 n/a

6 n/a n/a

n/a 6,066,241

290,871 8,216,100 with 2-SIMPLIFY followed by NiVER is insufficient to allow solution before the timeout. Simplifying again with 2-SIMPLIFY brings the runtime down to under an hour; adding NiVER again brings it down again to a couple of minutes; repeating 2-SIMPLIFY, far from improving matters, causes the time to blow out to two hours.

Conclusions We performed an empirical study of the effects of several recently proposed SAT preprocessors on both CDCL and lookahead-based SAT solvers. We describe several outcomes from this study as follow.

1. High-performance SAT solvers, whether they depend on clause learning or on lookahead, benefit greatly from preprocessing. Improvements of four orders of magnitude in runtimes are not uncommon. 2. It is unlikely to equip a SAT solver with just one preprocessor of the kind considered in this paper. Very different preprocessing techniques are appropriate to different problem classes. 3. There are frequently benefits to be gained from running two or more preprocessors in series on the same problem instance. 4. Both clause learning and lookahead need to be enhanced with techniques specific to reasoning with binary clauses, in order to exploit dependency chains, and with techniques for equality reasoning. 5. Lookahead-based solvers also benefit greatly from resolution between longer clauses, as in the 3-Resolution preprocessor. This seems to capture ahead of the search some of the inferences which would be achieved during it by learning clauses. CDCL solvers can also benefit from 3-Resolution preprocessor—dramatically in certain instances—but the effects are far from uniform. 6.1

Future work The following lines of research are open: 1. It would, of course, be easy if tedious to extend the experiments to more problem sets, more preprocessors and especially to more solvers. We shall probably look at some more DPLL solvers, but do not expect the results to add much more than detail to what is reported in the present paper. One of the more important additions to the class of solvers will be a non-clausal (Boolean circuit) reasoner. We have not yet experimented with such a solver. We have already investigated preprocessing for several state of the art SLS (stochastic local search) solvers, but that is such a different game that we regard it as a different experiment and do not report it here. 2. The more important line of research is to investigate methods for automatically choosing among the available preprocessors for a given problem instance, and for automaticallly choosing the order in which to apply successive preprocessors. Machine learning may help here, though it would be better, or at least more insightful, to be able to base decisions on a decent theory about the interaction of reasoning methods. 3. Another interesting project is to combine preprocessors not as a series of separate modules but as a single reasoner. For example, it would be possible to saturate under 3-Resolution and hyper-resolution together, in the manner found in resolution-based theorem provers. Whether this would be cost-effective in terms of time, and whether the results would differ in any worthwhile way from those obtained by ordering separate preprocessors, are unknown at this stage.

As SAT solvers are increasingly applied to real-world problems, we expect deductive reasoning by preprocessors to become increasingly important to them. Acknowledgments This work was funded by National ICT Australia (NICTA). National ICT Australia is funded through the Australian Government’s Backing Australia’s Ability initiative, in part through the Australian Research Council.

References

Fadi A. Aloul, Igor L. Markov, and Karem A. Sakallah. Shatter: Efficient symmetry-breaking for boolean satisfiability. In Design Automation Conference, pages 836–839. ACM/IEEE, 2003.

Anbulagan, Duc Nghia Pham, John Slaney, and Abdul Sattar. Old resolution meets modern SLS. In Proceedings of 20th AAAI, pages 354–359, 2005.

Anbulagan, Duc Nghia Pham, John Slaney, and Abdul Sattar. Boosting SLS performance by incorporating resolution-based preprocessor. In Proceedings of Third International Workshop on Local Search Techniques in Constraint Satisfaction (LSCS), in conjunction with CP-06, pages 43–57, 2006.

Anbulagan and John Slaney. Lookahead saturation with restriction for SAT. In Proceedings of 11th CP, pages 727–731, 2005.

Fadi A. Aloul, Karem A. Sakallah, and Igor L. Markov. Efficient symmetry breaking for boolean satisfiability. In Proceedings of 18th IJCAI, Mexico, 2003.

Fahiem Bacchus. Enhancing Davis Putnam with extended binary clause reasoning. In Proceedings of 18th AAAI, pages 613–619, Edmonton, Canada, August 2002. AAAI Press.

Ronen I. Brafman. A simplifier for propositional formulas with many binary clauses. In Proceedings of 17th IJCAI, pages 515–522, 2001.

Ronen I. Brafman. A simplifier for propositional formulas with many binary clauses. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 34(1):52–59, 2004.

Fahiem Bacchus and Jonathan Winter. Effective preprocessing with hyperresolution and equality reduction. In Revised Selected Papers of SAT 2003, LNCS 2919 Springer, pages 341–355, 2004.

Olivier Dubois and Gilles Dequen. A backbone-search heuristic for efficient solving of hard 3-SAT formulae. In Proceedings of 17th IJCAI, pages 248– 253, Seattle, Washington, USA, 2001.

M. Davis and H. Putnam. A computing procedure for quantification theory. Journal of the ACM, 7:201–215, 1960.

Niklas E´en and Armin Biere. Effective preprocessing in SAT through variable and clause elimination. In Proceedings of 8th SAT, LNCS Springer, 2005.

Niklas E´en and Niklas Sorensson. An extensible SAT-solver. In Proceedings of 6th SAT, 2003.

Marijn Heule and Hans van Maaren. Aligning CNF- and equivalencereasoning. In Proceedings of 7th SAT, Vancouver, Canada, 2004.

Marijn Heule and Hans van Maaren. March dl: Adding adaptive heuristics and a new branching strategy. Journal on Satisfiability, Boolean Modeling and Computation, (2):47–59, 2006.

Chu Min Li and Anbulagan. Look-ahead versus look-back for satisfiability problems. In Proceedings of 3rd CP, pages 341–355, Schloss Hagenberg, Austria, 1997.

Chu Min Li. Integrating equivalency reasoning into Davis-Putnam procedure. In Proceedings of 17th AAAI, pages 291–296, USA, 2000. AAAI Press.

I. Lynce and J. Marques-Silva. The interaction between simplification and search in propositional satisfiability. In Proceedings of CP’01 Workshop on Modeling and Problem Formulation, 2001. [MMZ+01] M. W. Moskewicz, C. F. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: Engineering an efficient SAT solver. In Proceedings of DAC, pages 530–535, 2001. [OGMS02] Richard Ostrowski, E´ric Gr´egoire, Bertrand Mazure, and Lakhdar Sais.

Recovering and exploiting structural knowledge from CNF formulas. In Proceedings of 8th CP, pages 185–199, 2002.

Lawrence O. Ryan. Efficient Algorithms for Clause Learning SAT Solvers. PhD thesis, Simon Fraser University, Burnaby, Canada, 2004.

Niklas Sorensson and Niklas E´en. MINISAT v1.13 - A SAT solver with conflict-clause minimization. In 2005 International SAT Competition website: http://www.lri.fr/∼simon/contest05/results/descriptions/solvers/minisat static.pdf, 2005.

Bart Selman, Henry Kautz, and David McAllester. Ten challenges in propositional reasoning and search. In Proceedings of 15th IJCAI, pages 50–54, Nagoya, Aichi, Japan, 1997.

Sathiamoorthy Subbarayan and Dhiraj K. Pradhan. NiVER: Non increasing variable elimination resolution for preprocessing SAT instances. In Revised Selected Papers of SAT 2004, LNCS 3542 Springer, pages 276–291, 2005. Joost P. Warners and Hans van Maaren. A two-phase algorithm for solving a class of hard satisfiability problems. Operations Research Letters, 23:81– 88, 1998.

[SAHB00]

St-Aubin ,

Hoey , and

Boutilier . APRICODD: Approximate policy construction using decision diagrams . In T. K. Leen,

T. G.

Dietterich , and V. Tresp, editors, Proceedings of the Fourteenth Annual Conference on Neural Information Processing Systems (NIPS' 2000 ), pages 1089 - 1095 , Denver, 2000 . MIT Press.

W. V.

Quine . A way to simplify truth functions . American Mathematical Monthly , 62 : 627 - 631 , 1955 .

Journal of the ACM , 12 : 23 - 41 , 1965 .