=Paper= {{Paper |id=Vol-192/paper-2 |storemode=property |title=SPASS+T |pdfUrl=https://ceur-ws.org/Vol-192/paper02.pdf |volume=Vol-192 }} ==SPASS+T== https://ceur-ws.org/Vol-192/paper02.pdf
                                               SPASS+T
                              Virgile Prevosto1∗ and Uwe Waldmann2
                                      1
                                   CEA – LIST, Centre de Saclay
                               Software Reliability Laboratory (LSL)
                                91191 Gif-sur-Yvette Cedex, France
                                   virgile.prevosto@m4x.org
                                 2
                                     Max-Planck-Institut für Informatik
                                         Stuhlsatzenhausweg 85
                                      66123 Saarbrücken, Germany
                                          uwe@mpi-inf.mpg.de



                                                  Abstract
               Spass+T is an extension of the superposition-based theorem prover Spass that
           allows us to enlarge the reasoning capabilities of Spass using an arbitrary SMT
           procedure for arithmetic and free function symbols as a black-box. We discuss the
           architecture of Spass+T and the capabilities, limitations, and applications of such
           a combination.


     1    Introduction
     Spass (Weidenbach et al. [10]) is a saturation-based theorem prover for first-order logic
     based on the superposition calculus. Such provers are notoriously bad at dealing with
     integer or real arithmetic – encoding numbers in binary or unary is not really a viable
     solution in most application contexts. We have extended Spass in such a way that we
     can link it to a rather arbitrary SMT (Satisfiability Modulo Theories) procedure for
     arithmetic and free function symbols, for instance our own ModuProve system (based
     on the DPLL(T) framework of Ganzinger et al. [5]) or the CVC Lite prover by Barrett
     and Berezin [2]. Briefly, the resulting system, called Spass+T, works as follows: Spass
     uses its deduction rules to generate formulas as usual. In addition, it passes to the
     SMT procedure all the formulas that can be handled by the procedure, i. e., all ground
     formulas. As soon as one of the two systems encounters a contradiction, the problem is
     solved.
         The scenario described so far is a special case of hierarchic theorem proving: a combi-
     nation of a base prover that deals with some subclass of formulas, say, formulas in linear
     arithmetic, and an extension prover that deals with formulas over the complete signature
        ∗
          This work was partially funded by the German Federal Ministry of Education, Science, Research and
     Technology (BMBF) in the framework of the Verisoft project under grant 01 IS C38. The responsibility
     for this article lies with the authors. Project website: http://www.verisoft.de.




18                                                                      Empirically Successful Computerized Reasoning
     and passes all formulas to the base prover that the latter can handle. The correctness
     of such a hierarchic combination of deductive systems is obvious. In theory, conditions
     for completeness have been given by Bachmair, Ganzinger, Sofronie-Stokkermans, and
     Waldmann [1, 6]. For instance, one can get a complete hierarchic combination of provers
     if base and non-base vocabulary are separated using abstraction, the term ordering is
     chosen in such a way that base terms/atoms are smaller than non-base terms/atoms,
     the base prover can deal with arbitrary formulas over the base vocabulary, the extension
     is sufficiently complete,1 and the base theory is compact [1].
         In practice, however, sufficient completeness need not hold (and cannot be checked
     automatically), abstraction enlarges the search space, SMT procedures for some useful
     theories can only deal with ground formulas (this is for instance the case for Nelson-
     Oppen combinations of arithmetic and free function symbols), and even compactness
     may be an issue. As an example, consider the two clauses ∀x, y. (f (x) 6= y + y) and
     ∀x, y. (f (x) 6= y + y + 1). If the base theory is Presburger arithmetic (linear integer
     arithmetic), then the conjunction of these two clauses (expressing the fact that f (x) is
     neither even nor odd) is inconsistent. Still any finite set of ground instances of these two
     formulas is consistent. Even unification modulo the base theory would not help.
         An integration of first-order theorem provers and SMT procedures can therefore
     only be a pragmatic one: Completeness can be achieved for special classes of inputs,
     but not in general. If we work with non-ground problem description but ground goal
     formulas, then there is some hope that we can ultimately produce sufficiently many
     ground formulas so that the SMT procedure can find the contradiction. If we want
     to solve non-ground problems, in particular if we want to find solutions for variables,
     then the purely hierarchic approach must be supplemented by some knowledge about
     arithmetic that is built-in directly into Spass.

     2       SPASS
     Spass (Weidenbach et al. [10]) is known to be one of the most advanced implementations
     of the superposition calculus. The superposition calculus is a refutationally complete pro-
     cedure for arbitrary first-order clauses with equality, that is, it provides a semi-decision
     procedure for the unsatisfiability of sets of clauses. Theorem proving methods such as
     resolution or superposition aim at deducing a contradiction from a set of formulas by
     recursively inferring new formulas from given ones. New formulas are derived according
     to a set of inference rules. In the case of superposition, these rules are restricted versions
     of paramodulation, resolution, and factoring, parameterized by a reduction ordering ≻
     that is total on ground expressions (that is, ground atoms and ground terms) and by
     a selection function S, which assigns to each clause a (possibly empty) multiset of (oc-
     currences of) negative literals. The literals in S(C) are called selected. Selected literals,
     besides being negative, can be arbitrarily chosen. Ordering and selection function impose
     various restrictions on the possible inferences, which are crucial for the efficiency of the-
     orem provers like Spass. Let us consider one of the inference rules of the superposition
     calculus as an example:
         1
         Intuitively, sufficient completeness means that every ground term of a base sort is provably equal
     to a ground term consisting only of base symbols.




Empirically Successful Computerized Reasoning                                                                 19
                                  D ′ ∨ t = t′     C ′ ∨ ¬ s[u] = s′
     Negative superposition
                                      (D ′ ∨ C ′ ∨ ¬ s[t′ ] = s′ )σ

                     if (i) the literal t = t′ is strictly maximal in the first premise, (ii) no literal
                     is selected in the first premise, (iii) either the literal ¬ s[u] = s ′ is selected
                     in the second premise or it is maximal and no literal is selected, (iv) t 6 t ′ ,
                     (v) s[u] 6 s′ , (vi) u is not a variable, and (vii) σ is a most general unifier of
                     t and u.

         This inference rule combines the unification of t and the subterm u of s with subse-
     quent replacement of uσ by t′ σ. When we speak of a superposition inference we mean
     an arbitrary rule of the calculus.
         Note that a clause with selected literals cannot serve as the left premise of a su-
     perposition inference, and that maximal terms are superposed either on maximal or
     selected literals, and a smaller term is never replaced by a larger one. Moreover, only
     the maximal sides of equations are replaced. The pattern of interplay between ordering
     restrictions and the selection function is the same for all inference rules of the calculus.
         The local restrictions of the superposition inference rules are supplemented by a
     global redundancy criterion that allows us to discard formulas that are provably unnec-
     essary for deriving a contradiction.


     3        System Architecture
     Spass+T consists of three modules: the theorem prover Spass, a (rather arbitrary)
     SMT procedure for integer or real arithmetic and free function symbols, and a module
     SMT-Control connecting both. The proof systems are only loosely coupled; they wait
     for each other only if they have completed their own task.
         Spass sends three kinds of messages to SMT-Control:

             – all the ground formulas present in the input or derived during the saturation;

             – proof found (signaling that Spass has derived the empty clause);

             – end of file (signaling that Spass has saturated its input).

         SMT-Control collects the formulas sent by Spass and repeatedly sends portions of
     the set of formulas to the SMT procedure,2 until one of the following events occurs:

             – Spass sends proof found;

             – the SMT procedure detects unsatisfiability;

             – the SMT procedure detects satisfiability after Spass has sent end of file and the
               SMT procedure has seen all the output of Spass.

        The result is “proof found” in the first two cases, “no proof found” in the third one.
     Lacking completeness, “no proof found” does in general not imply satisfiability.
         2
         Under the assumption that the SMT procedure works incrementally. Alternatively the SMT proce-
     dure can be restarted with increasing subsets of the set of formulas.




20                                                                     Empirically Successful Computerized Reasoning
     4      The Simple Case
     There are applications where we can guarantee that every base theory formula generated
     by Spass is ground. In this case, the very simple setup described above is already
     sufficient. An example is pointer data structure verification à la McPeak and Necula [7].
     In this scenario, we consider recursive data structures involving pointers to records or to
     nil. Record fields can either be scalar values or pointers. In the axioms used to describe
     the behaviour of a data structure, all variables range over pointer values (i. e., there is
     no quantification over scalar values). Record fields are encoded as (partial) functions, so
     x.data, i. e., the data field of the record that x points to, is written as data(x). McPeak
     and Necula require that every occurrence of such a function must be guarded by a
     condition that ensures that the argument is non-nil. For instance, the formula stating
     that the prev pointer is the inverse of the next pointer in a doubly linked list looks like

                              ∀x. (x 6= nil ∧ next(x) 6= nil → prev (next(x)) = x),

     and the formula stating that the data field of any record is positive looks like

                                                ∀x. (x 6= nil → data(x) > 0).

     To avoid positive disjunctions, we replace the guard formula x 6= nil by isrecord (x),
     obtaining

                         ∀x. (isrecord (x) ∧ isrecord (next(x)) → prev (next(x)) = x),
                                        ∀x. (isrecord (x) → data(x) > 0).

     Since the SMT procedure accepts not only the symbols of the arithmetical theory, such as
     +, −, · , <, or ≤, but also free function or predicate symbols, the distinction between base
     and non-base symbols is somewhat arbitrary – if it is useful, we may treat both theory
     symbols and a subset of the free symbols as base symbols. Let us therefore consider
     the guard predicate isrecord as the only non-base symbol in the signature. If we select
     the occurrences of isrecord (t) in the antecedent, we ensure that the only inferences that
     are possible with such clauses are (repeated) resolution steps with positive occurrences
     of isrecord (s). Since isrecord (s) occurs positively only in goal formulas (or formulas
     recursively derived from goal formulas), and since the term s is always ground in these
     formulas, the base formulas resulting from resolution are ground and can be passed to
     the SMT procedure. In effect, Spass+T mimics the derivation steps of McPeak and
     Necula, and the completeness result of McPeak and Necula carries over to Spass+T.


     5      Theory Instantiation
     If non-ground theory literals are not guarded, then it can easily happen that the usual
     inference rules of Spass do not produce those ground instances that the SMT procedure
     would need to find a contradiction. Consider the following example from Boyer and
     Moore [3]: Let min and max be non-theory function symbols denoting the minimum
     and the maximum element of a list of numbers. Suppose that we want to refute ¬ (l <
     max (a) + k) using the assumptions l ≤ min(a) and 0 < k and the universally quantified




Empirically Successful Computerized Reasoning                                                       21
     lemma ∀x. (min(x) ≤ max (x)). Analogously to Boyer and Moore, we need an additional
     inference rule that computes the required ground instance of this lemma. For efficiency
     reasons, we restrict ourselves to instantiations where a term headed by a non-theory
     symbol occurs in at least one other ground clause – if a non-theory term occurs in only
     one clause, this clause is very unlikely to be useful to the SMT procedure, at least in
     linear arithmetic. We obtain the following inference rule:
                               C[s]           L[t] ∨ D
     Theory instantiation
                                      (L[t] ∨ D)σ
                if L[t] ∨ D is not ground, t is headed by a non-theory function symbol and
                occurs in a maximal literal L[t] immediately below a theory symbol, all
                function or predicate symbols occurring above t in L[t] are theory symbols
                or equality, C[s] is ground, σ is an mgu of s and t, and (L[t] ∨ D)σ is ground.

     In the example above, the theory instantiation inference
                               l ≤ min(a)        min(x) ≤ max (x)
                                        min(a) ≤ max (a)
     yields the ground clause that the SMT procedure needs to derive a contradiction. There
     is one problem with this rule, though: The generated formula (L[t] ∨ D)σ is subsumed
     by the second premise L[t] ∨ D; therefore Spass will delete it again. We export it to the
     SMT procedure before the redundancy check, but we must also ensure that (L[t] ∨ D)σ
     can again be used as a first premise in a theory instantiation inference. To this end, we
     introduce a special new predicate symbol ground and split the inference rule into two
     parts:
                                  C[s]
     Theory instantiation I
                               ground (s)
                if C[s] is ground and s is headed by a non-theory function symbol.
                                     ground (s)             L[t] ∨ D
     Theory instantiation II
                               ground (u1 ) . . . ground(un ) export(D ′ )
                if L[t] ∨ D is not ground, t is headed by a non-theory function symbol and
                occurs in a maximal literal L[t] immediately below a theory symbol, all
                function or predicate symbols occurring above t in L[t] are theory symbols
                or equality, σ is an mgu of s and t, D ′ = (L[t] ∨ D)σ is ground, and u1 , . . . , un
                are the ground terms occurring in D ′ that are different from s and headed
                by a non-theory function symbol.

     In the example above, the theory instantiation I inference
                                               l ≤ min(a)
                                            ground (min(a))
     and the theory instantiation II inference
                             ground (min(a))    min(x) ≤ max (x)
                         ground (max (a))    export(min(a) ≤ max (a))




22                                                                  Empirically Successful Computerized Reasoning
     cause min(a) ≤ max (a) to be exported to the SMT procedure and ensure that the clause
     ground (max (a)) is available for further theory instantiation II inferences within Spass.
         If we select guard literals whenever possible, and if we choose the atom ordering so
     that non-theory predicates are larger than theory predicates, then the theory instantia-
     tion rule is used only as a last resort: It is applied only if there are no non-theory guard
     literals that might produce ground instances in a more restricted manner.


     6      Arithmetic Simplification
     There are several reasons to supplement the external SMT procedure in Spass+T by
     simplification techniques that are built-in directly into Spass+T. First of all, such sim-
     plification techniques offer the chance to find solutions for variables, say, to replace a
     clause ∀x. (¬ x + 2 = 5 ∨ p(x)) by ∀x. (¬ x = 3 ∨ p(x)) and then by p(3). Second,
     they allow us to reduce different numeric subexpressions to the same number, so that
     the search space of Spass+T is not cluttered by different, but numerically equivalent
     clauses, such as p(2 + 1), p(1 + 2), p(1 + (1 + 1)), etc. Third, by applying arithmetic
     simplification in advance, an equation like ∀x. (x + 0 = x) can be used to simplify a
     formula such as p(((a + 2) − 1) − 1) to p(a).
          In Spass+T, we use a combination of additional input axioms encoding some frag-
     ment of arithmetic and specialized simplification rules dealing with the numeric part.
     The latter include rules for evaluation of constant numeric subexpressions, such as

                                                    C[c1 + c2 ]
                                                      C[c0 ]

     and
                                                 C[(t + c1 ) + c2 ]
                                                    C[t + c0 ]
     where c1 and c2 are numeric constants and c0 = c1 + c2 , and rules for elementary
     (in-)equation solving, for instance

                                                C ∨ [¬] t + c1 ∼ c2
                                                  C ∨ [¬] t ∼ c0

     where c0 = c2 − c1 and ∼ is equality or an ordering relation.
         All the arithmetic simplification rules we have implemented have the property that
     the resulting formula is smaller than the original one in any Knuth-Bendix ordering,
     provided that all constants have the same weight. Still it should be clear that there is
     some risk of losing proofs by applying arithmetic simplification. As a trivial example
     consider the two clauses ∀x. (p(3 · x + 4)) and ¬ p(3 · 5 + 4). These clauses are obviously
     contradictory, but lacking theory unification, there is no way to derive a contradiction
     as soon as the second clause has been simplified to ¬ p(19).
         In contrast to formulas that describe the relationships between concrete numerical
     constants, equations such as ∀x. (x + 0 = x), ∀x. (x − x = 0), or ∀x, y. ((x − y) + y = x)
     are preferably added to the input as ordinary axioms. In this way they are not only avail-
     able for simplification, but also for superposition or resolution inferences. In our experi-
     ments, this capability turned out to be extremely useful for refuting non-ground queries.




Empirically Successful Computerized Reasoning                                                       23
        The integer ordering expansion rule, which is essential for some kinds of inductive
     proofs, is in some way a hybrid between the two cases above.
                                                  C ∨s≤t
     Integer ordering expansion
                                         C∨s=t∨s≤t−1  C ∨s = t∨s+1≤ t
     It is an inference rule, rather than a simplification rule, and it corresponds to resolution
     inferences with the axioms ∀x, y. (¬ x ≤ y ∨ x = y ∨ x ≤ y − 1) and ∀x, y. (¬ x ≤ y ∨ x =
     y ∨ x + 1 ≤ y). These are unordered resolution inferences, however, so Spass would
     not perform them. The integer ordering expansion rule fills this gap, but it has to be
     restricted, for instance by limiting applications to clauses with only one positive literal;
     otherwise it turns out to be too productive.


     7        Experiments
     We have tested Spass+T both on a list of sample conjectures [8] (theorems and non-
     theorems) from the TPTP library [9] and on our own list of benchmark problems, mostly
     combining arithmetic and data structures. We used Spass+T with CVC Lite 2.5 as ex-
     ternal decision procedure for the union of the theory of uninterpreted functions and in-
     teger arithmetic. Spass+T options were chosen identically for all benchmark problems;
     in particular, arithmetic simplification, theory instantiation, and (a very restricted form
     of) integer ordering expansion were switched on, and precedences were chosen uniformly
     in such a way that non-theory predicates (notably the containment relation for collec-
     tions) became larger than theory predicates. 3 Moreover, the following set of auxiliary
     axioms was supplied:

             INT01    ∀X, Y. ((X − Y ) + Y = X).
             INT02    ∀X. (X + 0 = X).
             INT03    ∀X. (0 + X = X).
             INT04    ∀X. (X < X + 1).
             INT05    ∀X, Y. (X < Y → X ≤ Y ).
             INT06    ∀X. (X ≤ X).
             INT07    ∀X, Y. (X < Y ↔ Y > X).
             INT08    ∀X, Y. (X ≤ Y ↔ Y ≥ X).
             INT09    ∀X, Y. (¬ (X < Y ∧ X ≥ Y )).
             INT10    ∀X, Y. (¬ (X ≤ Y ∧ X > Y )).
             INT11    ∀X, Y. (X ≤ Y ∧ Y ≤ X → X = Y ).
             INT12    ∀X, Y. (X − Y = X + (−Y )).
             INT13    ∀X. (−(−X) = X).
             INT14    ∀X. (X · 0 = 0).
             INT15    ∀X. (0 · X = 0).
             INT16    ∀X. (X · 1 = X).
             INT17    ∀X. (1 · X = X).
             INT18    ∀X. (X/1 = X).
             INT19    ∀X. (X mod 1 = 0).
         3
             Using the set DomPred option of Spass.




24                                                                Empirically Successful Computerized Reasoning
     The experiments were carried out on an office PC with a 2.40 GHz Intel Pentium 4
     CPU and 512 MB RAM running Debian Linux. All times reported are wall-clock time
     in seconds.

     7.1      TPTP Integer Arithmetic Problems
     The list of integer arithmetic problems in the TPTP library contains 147 theorems and
     36 non-theorems [8]. Most of the problems are rather easy. The results are summarized
     in the following table:

                  Category                                     Theorems Non-theorems
                  Number of problems:                               147             36

                     Proof found:                                   140              0
                       Termination in less than 1 second:           136              0
                       Termination in 1 to 8 seconds:                 4              0

                     No proof found:                                 7              36
                       Termination in less than 1 second:            6              31
                       Termination in 1 to 8 seconds:                1               3
                       Non-termination:                              0               2

     7.2      Further Problems
     Integer arithmetic. In addition to the TPTP list, we have developed a collection
     of 75 theorems mostly combining arithmetic and data structures to test Spass+T. In
     the following list, the second column gives the runtime of Spass+T (wall clock time in
     seconds) and the result, where “+” means “Proof found”, “-” means “No proof found”,
     and “∞” means that the time limit of 600 seconds was exceeded.

        (1)        1.82 -      ∀X, Z. (∃Y. (X + Y = Z))

        (2)        0.16 +      ∀Y, Z. (∃X. (X + Y = Z))

        (3)        0.68 -      ∀X, Z. (∃Y. (X − Y = Z))

        (4)        0.16 +      ∀Y, Z. (∃X. (X − Y = Z))

        (5)        0.33 -      ∃X. (X + X < −10)

        (6)        0.19 +      ∃X. (X · 3 + (−5) < −12)

        (7)        0.23 +      ∃X, Y. (3 · X + 5 · Y = 24)

        (8)        0.22 +      ∃X, Y. (3 · X + 5 · Y = 23)

        (9)        0.38 -      ∃X, Y. (3 · X + 5 · Y = 22)

      (10)         0.69 +      ∀X, Y. (¬ 4 · X + 6 · Y = 21)

      (11)         1.01 +      ∀X, Y, Z. (2 · X + Y + Z = 10 ∧ X + 2 · Y + Z = 10 → ¬ Z = 0)




Empirically Successful Computerized Reasoning                                                  25
      (12)      1.65 +   ∀X, Y, Z. (X < 5 ∧ Y < 3 ∧ X + 2 · Y > 7 → Y = 2)

      (13)      4.77 +   ∀X. (∃Y. (Y < X ∧ ¬ ∃Z. (Y < Z ∧ Z < X)))

      (14)      0.21 +   ¬ ∃X. (0 < X ∧ ∀Y. (Y < X → Y + 1 < X))

     Integer arithmetic plus free function or predicate symbols.

      (15)      0.18 +   ∀X, Y. (X + Y = f (X) ∧ Y − f (X) = 0 → Y = f (0))

      (16)      0.91 +   ∀X. (f (X) > X) → f (f (f (6))) ≥ 9

      (17)      3.75 +   ∀X. (f (X) > X) → ∀Y, Z. (f (f (Y ) + Z) ≥ Y + Z + 2)

      (18)      1.23 +   ∀X, Y. (X < Y → f (X) < f (Y )) → ∀Y. (f (f (Y ) + 2) > f (f (Y )))

      (19)      1.40 -   ∀X, Y. (X < Y → f (X) < f (Y )) → ∀Y. (f (f (Y ) + 2) > f (f (Y )) + 1)

      (20)      0.18 +   ∀X. (f (X) > 1) → 7 − 2 · f (3) < 4

      (21)      1.01 +   ∀X. (f (X) > X) → ∀X. (X − f (X) < 0)

      (22)      1.17 -   ∀X, Y. (g(X, Y ) = g(X, Y + 2)) ∧ g(3, 3) = g(3, 4) → g(3, 2) = g(3, 5)

      (23)      0.20 +   p(14 · 3 + 8) → p(50)

      (24)      0.24 +   ∀X. (p(X + 3)) → p(5)

      (25)      0.46 -   ∀X. (p(2 · X)) → p(10)

      (26)      0.59 -   p(0) ∧ ∀X. (p(X) → p(X + 1)) ∧ ∀X. (p(X) → p(X − 1)) → ∀X. (p(X))

     Integer arithmetic and arrays. The formulas

        ARR01     ∀A, I, X. (read (write(A, I, X), I) = X).
        ARR02     ∀A, I, J, X. (I = J ∨ read (write(A, I, X), J) = read (A, J)).

     were used to axiomatize arrays for the following problems:

      (27)      0.29 +   ∀A, A′ . (A = write(write(write(write(A′ , 3, 33), 4, 444), 5, 55), 4, 44)
                                   → read (A, 3) = 33)

      (28)      0.61 +   ∀A, A′ , I. (A = write(write(write(write(A′ , 3, 33), 4, 44), 5, 55), I, 66)
                                      → read (A, 4) = 44 ∨ read (A, 4) = 66)

      (29)      1.43 +   ∀A, A′ , I.(A = write(write(write(write(A′ , 3, 33), 4, 444), 5, 55), 4, 44)
                                       ∧3≤ I ∧I ≤ 4
                                     → 33 ≤ read (A, I) ∧ read (A, I) ≤ 44)

      (30)      0.25 +   ∀A, A′ . (A = write(write(write(write(A′ , 3, 33), 4, 444), 5, 55), 4, 44)
                                   → ∃I. (read (A, I) = 33))




26                                                                  Empirically Successful Computerized Reasoning
      (31)         0.15 +      ∀A, A′ . (A = write(write(write(write(A′ , 3, 33), 4, 444), 5, 55), 4, 44)
                                         → ∃I. (read (A, I) < 40 ∧ 30 < read (A, I)))

      (32)         0.39 +      ∀A, A′ . (∀I. (read (A′ , I) > I)
                                            ∧ A = write(write(A′ , 3, 5), 7, 9)
                                         → ∀I. (read (A, I) > I))

      (33)         3.78 +      ∀A, A′ , J. (∀I. (read (A′ , I) > I)
                                               ∧ A = write(A′ , J, J + 3)
                                            → ∀I. (read (A, I) > I))

      (34)         0.37 +      ∀A, A′ . (A = write(write(write(write(A′ , 3, 33), 4, 444), 5, 55), 4, 44)
                                           ∧ ∀I. (read (A′ , I) < 100)
                                         → ∀I. (read (A, I) < 100))

      (35)         0.30 +      ∀A, A′ , J, K. (∀I. (J ≤ I ∧ I ≤ K → read (A′ , I) > 0)
                                                  ∧ A = write(A′ , K + 1, 3)
                                               → ∀I. (J ≤ I ∧ I ≤ K + 1 → read (A, I) > 0))

      (36)         5.65 +      ∀A, A′ , J, K. (∀I. (J ≤ I ∧ I ≤ K → read (A′ , I) > 0)
                                                  ∧ A = write(A′ , J + 2, read (A′ , J + 1) + 1)
                                               → ∀I. (J ≤ I ∧ I ≤ K → read (A, I) > 0))

      (37)         3.52 -      ∀A, J, K. (∀I. (J ≤ I ∧ I ≤ K → read (A, I) > 0)
                                          → ∀I. (J + 3 ≤ I ∧ I ≤ K → read (A, I) > 0))

      (38)       31.75 +       ∀A. (∀I. (1 ≤ I ∧ I ≤ 10 → read (A, I) > I)
                                       ∧ ∀I. (11 ≤ I ∧ I ≤ 20 → read (A, I) > 20 − I)
                                    → ∀I. (6 ≤ I ∧ I ≤ 15 → read (A, I) > 5))

      (39)         0.23 +      ∀A. (∀I. (20 ≤ I ∧ I ≤ 30 → read (A, I) = I + 25)
                                    → ∃I. (read (A, I) = 50))

      (40)         0.56 -      ∀A. (∀I. (20 ≤ I ∧ I ≤ 30 → read (A, I) = 2 · I + 3)
                                    → ∃I. (read (A, I) = 53))

      (41)         0.27 +      ∀A, A′ , I, J, K. (¬ read (A′ , I) = read (A′ , J)
                                                     ∧ A = write(write(write(A′ , J, 0),
                                                                           K, read (A′ , K) + 1), I, 0)
                                                  → ∃L. (¬ read (A, L) = read (A′ , L)))

      (42)         9.90 +      ∀A, A′ , I, J, K. (A = write(write(write(A′ , I, 3), I + 2, 2), I + 4, 1)
                                                    ∧I ≤J ∧J ≤I +3
                                                  → ∃K. (J ≤ K ∧ K ≤ J + 3 ∧ read (A, K) ≤ 3))




Empirically Successful Computerized Reasoning                                                               27
     Integer arithmetic and collections.          The formulas

        COL01     ∀I. (¬ I ∈ ∅).
        COL02     ∀I, S. (I ∈ add (I, S)).
        COL03     ∀I, S. (¬ I ∈ remove(I, S)).
        COL04     ∀I, S, J. (I ∈ S ∨ I = J ↔ I ∈ add (J, S)).
        COL05     ∀I, S, J. (I ∈ S ∧ ¬ I = J ↔ I ∈ remove(J, S)).

     were used to axiomatize collections of integers for the following problems:

      (43)      0.31 +   ¬ 4 ∈ add (1, add (3, add (5, ∅)))
      (44)      0.56 +   ∀S, I, J. (S = add (5, add (3, add (1, ∅)))
                                      ∧I ∈S∧J ∈S∧¬I =J
                                    → I + J < 9)

      (45)      0.23 +   ∀S, S ′ . (∀I. (I ∈ S ′ → I > 0)
                                       ∧ S = add (2, remove(7, S ′ ))
                                    → ∀I. (I ∈ S → I > 0))

      (46)      0.27 +   ∀S, S ′ . (∀I. (I ∈ S ′ → I > 0)
                                       ∧ S = remove(4, remove(1, remove(2, S ′ )))
                                    → ∀I. (I ∈ S → I > 2))

      (47)      0.64 +   ∀S. (∀I. (I ∈ S → I ≥ 0)
                                 ∧¬0 ∈ S ∧¬1∈ S
                              → ∀I. (I ∈ S → I ≥ 2))

      (48)      0.23 +   ∀S. (∀I. (I ∈ S → I ≥ 0)
                                 ∧ ∀I. (I ∈ S ↔ I ∈ remove(0, remove(1, S)))
                              → ∀I. (I ∈ S → I ≥ 2))

      (49)      1.92 +   ∀S, S ′ , J. (∀I. (I ∈ S ′ → I > 0)
                                          ∧ J ∈ S ′ ∧ S = add (J + 2, remove(J, S ′ ))
                                       → ∀I. (I ∈ S → I > 0))

      (50)      2.29 +   ∀S, S ′ , J, K. (∀I. (I ∈ S ′ → I > 0)
                                             ∧ J ∈ S ′ ∧ K ≥ 0 ∧ S = add (J + K, remove(J, S ′ ))
                                          → ∀I. (I ∈ S → I > 0))

      (51)      1.37 +   ∀S, S ′ . (∀I. (I ∈ S ′ → I > 0)
                                       ∧ ∀I. (I ∈ S → ∃J. (J ∈ S ′ ∧ I > J))
                                    → ∀I. (I ∈ S → I > 1))

      (52)      1.95 +   ∀S, S ′ . (∀I. (I ∈ S ′ → I > 0)
                                       ∧ ∀I. (I ∈ S → ∃J. (J ∈ S ′ ∧ 2 · I − 5 · J > 0))
                                    → ∀I. (I ∈ S → I > 2))

      (53)      1.90 +   ∀S, S ′ . (∀I. (I ∈ S ′ → I > 0)
                                       ∧ ∀I. (I ∈ S → ∃J, K. (J ∈ S ′ ∧ K ∈ S ′ ∧ I = J + K))
                                    → ∀I. (I ∈ S → I > 1))




28                                                                      Empirically Successful Computerized Reasoning
      (54)         0.36 +      ∀S. (S = add (10, add (30, add (50, ∅)))
                                    → ∃I. (20 ≤ I ∧ I ≤ 40 ∧ I ∈ S))

     Integer arithmetic and collections with counting.                    The formulas

          CCO01       ∀I. (¬ I ∈ ∅).
          CCO02       ∀I, S. (I ∈ add (I, S)).
          CCO03       ∀I, S. (¬ I ∈ remove(I, S)).
          CCO04       ∀I, S, J. (I ∈ S ∨ I = J ↔ I ∈ add (J, S)).
          CCO05       ∀I, S, J. (I ∈ S ∧ ¬ I = J ↔ I ∈ remove(J, S)).
          CCO06       ∀S. (count (S) ≥ 0).
          CCO07       ∀S. (S = ∅ ↔ count (S) = 0).
          CCO08       ∀I, S. (¬ I ∈ S ↔ count (add (I, S)) = count (S) + 1).
          CCO09       ∀I, S. (I ∈ S ↔ count (add (I, S)) = count(S)).
          CCO10       ∀I, S. (I ∈ S ↔ count (remove(I, S)) = count(S) − 1).
          CCO11       ∀I, S. (¬ I ∈ S ↔ count (remove(I, S)) = count (S)).
          CCO12       ∀I, S. (I ∈ S → S = add (I, remove(I, S))).

     were used to axiomatize collections of integers with a counting operation for the following
     problems:

      (55)       22.09 +       ∀S. (count (remove(5, S)) ≥ 7
                                    → count(remove(4, S)) ≥ 6)

      (56)         3.71 +      ∀S. (count (add (5, S)) = count (add (3, S))
                                    → count(remove(5, S)) = count (remove(3, S)))

      (57)         2.20 +      ∀S, I. (count (S) + 1 ≥ count(add (I, S)))

      (58)       11.95 +       ∀S, I, K. (K > 0 → count(S) + K ≥ count(add (I, S)))

      (59)       19.49 +       ∀S, I, K. (K > 0 → count(S) + K ≥ count(add (I, add (I, S))))

      (60)         0.42 +      ∀S, I. (count (remove(2, S)) = 0
                                          ∧ count(remove(3, S)) = 0
                                       → count (remove(I, S)) = 0)

      (61)         0.32 +      ∀S. (2 ∈ S ∧ count (S) = 1 → ¬ 5 ∈ S)

      (62)       23.22 +       ∀S. (2 ∈ S ∧ 3 ∈ S ∧ count(S) = 2 → ¬ 5 ∈ S)

      (63)       28.16 +       ∀S, I. (2 ∈ S ∧ 3 ∈ S ∧ count (S) = 2 ∧ I > 3 → ¬ I ∈ S)

      (64)      118.41 +       ∀S, I, J. (I < 3 ∧ 6 < J
                                             ∧I ∈S∧J ∈S
                                             ∧ count(S) = 2
                                          → ¬ 5 ∈ S)

      (65)         0.34 +      ∃S, I. (count (S) + 1 > count(add (I, S)))




Empirically Successful Computerized Reasoning                                                      29
      (66)   309.91 +    ∃S. (count (S) = 3)

      (67)   269.01 +    ∀S, I, J, K. (I > J ∧ J > K
                                          ∧I ∈S∧J ∈S∧K ∈S
                                       → count(S) > 2)

      (68)      0.53 +   ∀S, J. (∀I. (I ∈ S → J > I)
                                 → count (add (J, S)) > count(S))

      (69)      4.04 +   count (add (1, add (3, ∅))) = 2

      (70)      1.02 +   count (add (5, remove(3, add (3, ∅)))) = 1

      (71)       ∞ -     count (add (1, add (5, remove(3, add (2, ∅))))) = 3

      (72)      3.49 +   ∀S, I. (count (remove(I, add (I, S))) = count (remove(I, S)))

      (73)       ∞ -     ∀S, I. (count (add (0, remove(I, add (I, S))))
                                                               = count (add (0, remove(I, S))))

     Integer arithmetic and pointer data types.            The formulas

        PNT01     ∀X. (¬ isrecord (X) → length(X) = 0)
        PNT02     ∀X. (isrecord (X) → length(X) ≥ 1)
        PNT03     ∀X. (isrecord (X) → length(X) = length(next (X)) + 1)
        PNT04     ∀X. (¬ isrecord (X) → ¬ isrecord (split1 (X)))
        PNT05     ∀X. (isrecord (X) → isrecord (split1 (X)))
        PNT06     ∀X. (isrecord (X) → data(split1 (X)) = data(X))
        PNT07     ∀X. (isrecord (X) ∧ ¬ isrecord (next(X))
                         → ¬ isrecord (next(split1 (X))))
        PNT08     ∀X. (isrecord (X) ∧ isrecord (next (X))
                         → next(split1 (X)) = split1 (next(next(X))))
        PNT09     ∀X. (¬ isrecord (X) → ¬ isrecord (split2 (X)))
        PNT10     ∀X. (¬ isrecord (next(X)) → ¬ isrecord (split2 (X)))
        PNT11     ∀X. (isrecord (X) ∧ isrecord (next (X)) → isrecord (split2 (X)))
        PNT12     ∀X. (isrecord (X) ∧ isrecord (next (X))
                         → data(split2 (X)) = data(next(X)))
        PNT13     ∀X. (isrecord (X) ∧ isrecord (next (X))
                         → next(split2 (X)) = split2 (next(next(X))))

     were used to axiomatize singly linked lists with pointers and two recursive functions
     split1 and split2 (computing the sublists of odd-numbered and even-numbered elements
     of a list) for the following problems:

      (74)    20.02 +    ∀X, Y. (isrecord (X) ∧ isrecord (next(X)) ∧ Y = next(next(X))
                                    ∧ (2 · length(split2 (Y )) = length(Y ) − 1
                                           ∨ 2 · length(split2 (Y )) = length(Y ))
                                 → (2 · length(split2 (X)) = length(X) − 1
                                            ∨ 2 · length(split2 (X)) = length(X)))




30                                                                Empirically Successful Computerized Reasoning
      (75)         17.18 +     ∀X, Y. (isrecord (X) ∧ isrecord (next(X)) ∧ Y = next(next(X))
                                          ∧ length(split1 (Y )) + length(split2 (Y )) = length(Y )
                                       → length(split1 (X)) + length(split2 (X)) = length(X))

     7.3      Discussion
     Summarizing the previous section, we see that Spass+T solves 63 out of the 75 problems
     in our list. In order to check to what extent the mechanisms implemented in Spass+T
     contribute to these proofs, we have gradually disabled various features and re-run the
     tests. The following table shows the number of proofs found for several combinations of
     features:

         SMT                   Theory           Arithmetic       Auxiliary        Problems
         procedure             instantiation    simplification   axioms           solved (out of 75)
               +                     +              +                +                   63
               +                     +              -                +                   52
               +                     -              +                +                   56
               -                     -              +                +                   39
               -                     -              -                +                   12
               +                     +              +                -                   53

         How does Spass+T compare to a system like Simplify (Detlefs, Nelson, and Saxe [4])?
     The automatic theorem prover Simplify was developed as the proof engine of ESC/Java
     and ESC/Modula-3. It handles universal quantifiers in the input by computing (hope-
     fully) relevant instances in a similar way as the theory instantiation rule of Spass+T
     and analyzing the resulting formulas using SAT checking and a Nelson-Oppen combina-
     tion of decision procedures. Simplify (version 1.5.4) solves 40 out of the 75 problems in
     our list. The difference to Spass+T is partially due to the fact that Simplify solves only
     four out of 23 problems that involve existential quantifiers, namely (41) and (51)–(53).
     Somewhat surprisingly, however, Simplify fails also for (55), (56), (59), and (68)–(74).
     On the other hand, due to the highly optimized implementation and the reduced search
     space, all problems that Simplify does solve are solved in less than one second.
         It is perhaps illustrative to have a closer look at those problems that Spass+T failed
     to prove. Spass+T has some support for solving non-ground equations, but it makes
     no attempt to be complete in this domain, and in particular it does not use any kind of
     theory unification. This accounts for most of the missed proofs in the TPTP list and in
     theorems (1)–(14), as well as for (22), (25), and (40). It may look strange that Spass+T
     succeeds for (2) and (4), but fails for (1) and (3), but the explanation is easy: (2) and (4)
     are proved by superposition with INT01. We could add the symmetric version of INT01,
     so that (1) and (3) become provable as well, but having both INT01 and its symmetric
     version in the clause set destroys termination and seems to create more problems than
     it solves. Similarly, (7) and (8) can be handled by superposition with INT14 or INT16
     plus elementary equation solving, whereas (9) would require a fully-fledged Diophantine
     equation solver.
         Theorem (19) is rather difficult for an automated system because for proving it one
     has to “invent” a term that does not occur in the problem, namely f (f (Y ) + 1), and




Empirically Successful Computerized Reasoning                                                          31
     then use transitivity. Moreover, as one might expect, Spass+T has no chance to prove
     the induction theorem (26). On the other hand, an extension of the theory instantiation
     rule would enable Spass+T to prove the pigeonhole-like formula ∀X, Y. (f (X) = f (Y ) →
     X = Y ) ∧ 6 < f (3) ∧ f (3) < 9 ∧ 6 < f (4) ∧ f (4) < 9 → f (5) < 6 ∨ 9 < f (5) from the
     TPTP list. Two changes are required: The inference rule theory instantiation II has to
     take an arbitrary number of premises

                        ground (s1 ) . . . ground(sk )       L[t1 , . . . , tk ] ∨ D
                             ground (u1 ) . . . ground(un ) export(D ′ )

     moreover it has to be applicable even if the terms t i occur below a negated equation
     symbol. So far, we have not implemented this modification, since the extended rule
     would be extremely prolific.
          Even though the CCO axioms have a simple operational reading, it seems to be
     difficult for automated provers to detect this fact, as one can see from the poor results
     for straightforward computation tasks like (71) or (73). Simplify even exceeds the 600
     second time limit for the simpler variants (69), (70), and (72).
          There is one odd case left: Surprisingly, Spass+T succeeds to prove formula (37),
     if theory instantiation or integer ordering expansion are switched off, but fails to find
     the proof if both inference rules are switched on. It turns out that, in the latter case,
     Spass+T triggers a bug in CVC Lite.

     7.4    Download
     A current snapshot of Spass+T for Linux including the benchmark problems can be
     downloaded from http://www.mpi-inf.mpg.de/~uwe/software/.


     8      Outlook
     The development of Spass+T is ongoing work. More effort has to be spent on fine-tuning
     the theory instantiation rule and the arithmetic simplification rules. A more challenging
     extension is the implementation of subsumption checks for clauses with theory literals,
     which might turn Spass+T into a decision procedure for some classes of formulas. We
     are also looking into using Spass+T with non-numeric base theories, such as arrays,
     lists, or bitvectors.

     Acknowledgments: We are grateful to Christoph Weidenbach and Thomas Hillen-
     brand for helpful discussions.


     References
      [1]   Leo Bachmair, Harald Ganzinger, and Uwe Waldmann. Refutational theorem
            proving for hierarchic first-order theories. Applicable Algebra in Engineering, Com-
            munication and Computing, 5(3/4):193–212, 1994.

      [2]   Clark Barrett and Sergey Berezin. CVC Lite: A new implementation of the Coop-
            erating Validity Checker. In Rajeev Alur and Doron A. Peled, editors, Computer




32                                                                    Empirically Successful Computerized Reasoning
             Aided Verification, 16th International Conference, CAV 2004, LNCS 3114, pages
             515–518, Boston, MA, USA, 2004. Springer-Verlag.

      [3]    Robert S. Boyer and J Strother Moore. Integrating decision procedures into heuris-
             tic theorem provers: A case study of linear arithmetic. In Jean E. Hayes, Donald
             Michie, and Judith Richards, editors, Machine Intelligence 11: Logic and the ac-
             quisition of knowledge, chapter 5, pages 83–124. Oxford University Press, 1988.

      [4]    David Detlefs, Greg Nelson, and James B. Saxe. Simplify: A theorem prover for
             program checking. Journal of the ACM, 52(3):365–473, May 2005.

      [5]    Harald Ganzinger, George Hagen, Robert Nieuwenhuis, Albert Oliveras, and Ce-
             sare Tinelli. DPLL(T): Fast decision procedures. In Rajeev Alur and Doron A.
             Peled, editors, Computer Aided Verification, 16th International Conference, CAV
             2004, LNCS 3114, pages 175–188, Boston, MA, USA, 2004. Springer-Verlag.

      [6]    Harald Ganzinger, Viorica Sofronie-Stokkermans, and Uwe Waldmann. Mod-
             ular proof systems for partial functions with weak equality. In David Basin
             and Michaël Rusinowitch, editors, Automated Reasoning: Second International
             Joint Conference, IJCAR 2004, LNAI 3097, pages 168–182, Cork, Ireland, 2004.
             Springer-Verlag. Corrected version at http://www.mpi-sb.mpg.de/~uwe/paper/
             PartialFun-bibl.html.

      [7]    Scott McPeak and George C. Necula. Data structure specifications via local equal-
             ity axioms. In Kousha Etessami and Sriram K. Rajamani, editors, Computer Aided
             Verification, 17th International Conference, CAV 2005, LNCS 3576, pages 476–
             490, Edinburgh, Scotland, UK, 2005. Springer-Verlag.

      [8]    Stephan Schulz and Geoff Sutcliffe. http://www.cs.miami.edu/~tptp/TPTP/
             Proposals/IntegerArithmetic.p, March 15, 2006.

      [9]    Geoff. Sutcliffe and Christian B. Suttner. The TPTP Problem Library: CNF
             Release v1.2.1. Journal of Automated Reasoning, 21(2):177–203, 1998.

     [10]    Christoph Weidenbach, Uwe Brahm, Thomas Hillenbrand, Enno Keen, Christian
             Theobalt, and Dalibor Topić. SPASS version 2.0. In Andrei Voronkov, editor,
             Automated Deduction – CADE-18, 18th International Conference on Automated
             Deduction, LNAI 2392, pages 275–279, Copenhagen, Denmark, 2002. Springer-
             Verlag.




Empirically Successful Computerized Reasoning                                                     33