=Paper=
{{Paper
|id=Vol-192/paper-2
|storemode=property
|title=SPASS+T
|pdfUrl=https://ceur-ws.org/Vol-192/paper02.pdf
|volume=Vol-192
}}
==SPASS+T==
SPASS+T
Virgile Prevosto1∗ and Uwe Waldmann2
1
CEA – LIST, Centre de Saclay
Software Reliability Laboratory (LSL)
91191 Gif-sur-Yvette Cedex, France
virgile.prevosto@m4x.org
2
Max-Planck-Institut für Informatik
Stuhlsatzenhausweg 85
66123 Saarbrücken, Germany
uwe@mpi-inf.mpg.de
Abstract
Spass+T is an extension of the superposition-based theorem prover Spass that
allows us to enlarge the reasoning capabilities of Spass using an arbitrary SMT
procedure for arithmetic and free function symbols as a black-box. We discuss the
architecture of Spass+T and the capabilities, limitations, and applications of such
a combination.
1 Introduction
Spass (Weidenbach et al. [10]) is a saturation-based theorem prover for first-order logic
based on the superposition calculus. Such provers are notoriously bad at dealing with
integer or real arithmetic – encoding numbers in binary or unary is not really a viable
solution in most application contexts. We have extended Spass in such a way that we
can link it to a rather arbitrary SMT (Satisfiability Modulo Theories) procedure for
arithmetic and free function symbols, for instance our own ModuProve system (based
on the DPLL(T) framework of Ganzinger et al. [5]) or the CVC Lite prover by Barrett
and Berezin [2]. Briefly, the resulting system, called Spass+T, works as follows: Spass
uses its deduction rules to generate formulas as usual. In addition, it passes to the
SMT procedure all the formulas that can be handled by the procedure, i. e., all ground
formulas. As soon as one of the two systems encounters a contradiction, the problem is
solved.
The scenario described so far is a special case of hierarchic theorem proving: a combi-
nation of a base prover that deals with some subclass of formulas, say, formulas in linear
arithmetic, and an extension prover that deals with formulas over the complete signature
∗
This work was partially funded by the German Federal Ministry of Education, Science, Research and
Technology (BMBF) in the framework of the Verisoft project under grant 01 IS C38. The responsibility
for this article lies with the authors. Project website: http://www.verisoft.de.
18 Empirically Successful Computerized Reasoning
and passes all formulas to the base prover that the latter can handle. The correctness
of such a hierarchic combination of deductive systems is obvious. In theory, conditions
for completeness have been given by Bachmair, Ganzinger, Sofronie-Stokkermans, and
Waldmann [1, 6]. For instance, one can get a complete hierarchic combination of provers
if base and non-base vocabulary are separated using abstraction, the term ordering is
chosen in such a way that base terms/atoms are smaller than non-base terms/atoms,
the base prover can deal with arbitrary formulas over the base vocabulary, the extension
is sufficiently complete,1 and the base theory is compact [1].
In practice, however, sufficient completeness need not hold (and cannot be checked
automatically), abstraction enlarges the search space, SMT procedures for some useful
theories can only deal with ground formulas (this is for instance the case for Nelson-
Oppen combinations of arithmetic and free function symbols), and even compactness
may be an issue. As an example, consider the two clauses ∀x, y. (f (x) 6= y + y) and
∀x, y. (f (x) 6= y + y + 1). If the base theory is Presburger arithmetic (linear integer
arithmetic), then the conjunction of these two clauses (expressing the fact that f (x) is
neither even nor odd) is inconsistent. Still any finite set of ground instances of these two
formulas is consistent. Even unification modulo the base theory would not help.
An integration of first-order theorem provers and SMT procedures can therefore
only be a pragmatic one: Completeness can be achieved for special classes of inputs,
but not in general. If we work with non-ground problem description but ground goal
formulas, then there is some hope that we can ultimately produce sufficiently many
ground formulas so that the SMT procedure can find the contradiction. If we want
to solve non-ground problems, in particular if we want to find solutions for variables,
then the purely hierarchic approach must be supplemented by some knowledge about
arithmetic that is built-in directly into Spass.
2 SPASS
Spass (Weidenbach et al. [10]) is known to be one of the most advanced implementations
of the superposition calculus. The superposition calculus is a refutationally complete pro-
cedure for arbitrary first-order clauses with equality, that is, it provides a semi-decision
procedure for the unsatisfiability of sets of clauses. Theorem proving methods such as
resolution or superposition aim at deducing a contradiction from a set of formulas by
recursively inferring new formulas from given ones. New formulas are derived according
to a set of inference rules. In the case of superposition, these rules are restricted versions
of paramodulation, resolution, and factoring, parameterized by a reduction ordering ≻
that is total on ground expressions (that is, ground atoms and ground terms) and by
a selection function S, which assigns to each clause a (possibly empty) multiset of (oc-
currences of) negative literals. The literals in S(C) are called selected. Selected literals,
besides being negative, can be arbitrarily chosen. Ordering and selection function impose
various restrictions on the possible inferences, which are crucial for the efficiency of the-
orem provers like Spass. Let us consider one of the inference rules of the superposition
calculus as an example:
1
Intuitively, sufficient completeness means that every ground term of a base sort is provably equal
to a ground term consisting only of base symbols.
Empirically Successful Computerized Reasoning 19
D ′ ∨ t = t′ C ′ ∨ ¬ s[u] = s′
Negative superposition
(D ′ ∨ C ′ ∨ ¬ s[t′ ] = s′ )σ
if (i) the literal t = t′ is strictly maximal in the first premise, (ii) no literal
is selected in the first premise, (iii) either the literal ¬ s[u] = s ′ is selected
in the second premise or it is maximal and no literal is selected, (iv) t 6 t ′ ,
(v) s[u] 6 s′ , (vi) u is not a variable, and (vii) σ is a most general unifier of
t and u.
This inference rule combines the unification of t and the subterm u of s with subse-
quent replacement of uσ by t′ σ. When we speak of a superposition inference we mean
an arbitrary rule of the calculus.
Note that a clause with selected literals cannot serve as the left premise of a su-
perposition inference, and that maximal terms are superposed either on maximal or
selected literals, and a smaller term is never replaced by a larger one. Moreover, only
the maximal sides of equations are replaced. The pattern of interplay between ordering
restrictions and the selection function is the same for all inference rules of the calculus.
The local restrictions of the superposition inference rules are supplemented by a
global redundancy criterion that allows us to discard formulas that are provably unnec-
essary for deriving a contradiction.
3 System Architecture
Spass+T consists of three modules: the theorem prover Spass, a (rather arbitrary)
SMT procedure for integer or real arithmetic and free function symbols, and a module
SMT-Control connecting both. The proof systems are only loosely coupled; they wait
for each other only if they have completed their own task.
Spass sends three kinds of messages to SMT-Control:
– all the ground formulas present in the input or derived during the saturation;
– proof found (signaling that Spass has derived the empty clause);
– end of file (signaling that Spass has saturated its input).
SMT-Control collects the formulas sent by Spass and repeatedly sends portions of
the set of formulas to the SMT procedure,2 until one of the following events occurs:
– Spass sends proof found;
– the SMT procedure detects unsatisfiability;
– the SMT procedure detects satisfiability after Spass has sent end of file and the
SMT procedure has seen all the output of Spass.
The result is “proof found” in the first two cases, “no proof found” in the third one.
Lacking completeness, “no proof found” does in general not imply satisfiability.
2
Under the assumption that the SMT procedure works incrementally. Alternatively the SMT proce-
dure can be restarted with increasing subsets of the set of formulas.
20 Empirically Successful Computerized Reasoning
4 The Simple Case
There are applications where we can guarantee that every base theory formula generated
by Spass is ground. In this case, the very simple setup described above is already
sufficient. An example is pointer data structure verification à la McPeak and Necula [7].
In this scenario, we consider recursive data structures involving pointers to records or to
nil. Record fields can either be scalar values or pointers. In the axioms used to describe
the behaviour of a data structure, all variables range over pointer values (i. e., there is
no quantification over scalar values). Record fields are encoded as (partial) functions, so
x.data, i. e., the data field of the record that x points to, is written as data(x). McPeak
and Necula require that every occurrence of such a function must be guarded by a
condition that ensures that the argument is non-nil. For instance, the formula stating
that the prev pointer is the inverse of the next pointer in a doubly linked list looks like
∀x. (x 6= nil ∧ next(x) 6= nil → prev (next(x)) = x),
and the formula stating that the data field of any record is positive looks like
∀x. (x 6= nil → data(x) > 0).
To avoid positive disjunctions, we replace the guard formula x 6= nil by isrecord (x),
obtaining
∀x. (isrecord (x) ∧ isrecord (next(x)) → prev (next(x)) = x),
∀x. (isrecord (x) → data(x) > 0).
Since the SMT procedure accepts not only the symbols of the arithmetical theory, such as
+, −, · , <, or ≤, but also free function or predicate symbols, the distinction between base
and non-base symbols is somewhat arbitrary – if it is useful, we may treat both theory
symbols and a subset of the free symbols as base symbols. Let us therefore consider
the guard predicate isrecord as the only non-base symbol in the signature. If we select
the occurrences of isrecord (t) in the antecedent, we ensure that the only inferences that
are possible with such clauses are (repeated) resolution steps with positive occurrences
of isrecord (s). Since isrecord (s) occurs positively only in goal formulas (or formulas
recursively derived from goal formulas), and since the term s is always ground in these
formulas, the base formulas resulting from resolution are ground and can be passed to
the SMT procedure. In effect, Spass+T mimics the derivation steps of McPeak and
Necula, and the completeness result of McPeak and Necula carries over to Spass+T.
5 Theory Instantiation
If non-ground theory literals are not guarded, then it can easily happen that the usual
inference rules of Spass do not produce those ground instances that the SMT procedure
would need to find a contradiction. Consider the following example from Boyer and
Moore [3]: Let min and max be non-theory function symbols denoting the minimum
and the maximum element of a list of numbers. Suppose that we want to refute ¬ (l <
max (a) + k) using the assumptions l ≤ min(a) and 0 < k and the universally quantified
Empirically Successful Computerized Reasoning 21
lemma ∀x. (min(x) ≤ max (x)). Analogously to Boyer and Moore, we need an additional
inference rule that computes the required ground instance of this lemma. For efficiency
reasons, we restrict ourselves to instantiations where a term headed by a non-theory
symbol occurs in at least one other ground clause – if a non-theory term occurs in only
one clause, this clause is very unlikely to be useful to the SMT procedure, at least in
linear arithmetic. We obtain the following inference rule:
C[s] L[t] ∨ D
Theory instantiation
(L[t] ∨ D)σ
if L[t] ∨ D is not ground, t is headed by a non-theory function symbol and
occurs in a maximal literal L[t] immediately below a theory symbol, all
function or predicate symbols occurring above t in L[t] are theory symbols
or equality, C[s] is ground, σ is an mgu of s and t, and (L[t] ∨ D)σ is ground.
In the example above, the theory instantiation inference
l ≤ min(a) min(x) ≤ max (x)
min(a) ≤ max (a)
yields the ground clause that the SMT procedure needs to derive a contradiction. There
is one problem with this rule, though: The generated formula (L[t] ∨ D)σ is subsumed
by the second premise L[t] ∨ D; therefore Spass will delete it again. We export it to the
SMT procedure before the redundancy check, but we must also ensure that (L[t] ∨ D)σ
can again be used as a first premise in a theory instantiation inference. To this end, we
introduce a special new predicate symbol ground and split the inference rule into two
parts:
C[s]
Theory instantiation I
ground (s)
if C[s] is ground and s is headed by a non-theory function symbol.
ground (s) L[t] ∨ D
Theory instantiation II
ground (u1 ) . . . ground(un ) export(D ′ )
if L[t] ∨ D is not ground, t is headed by a non-theory function symbol and
occurs in a maximal literal L[t] immediately below a theory symbol, all
function or predicate symbols occurring above t in L[t] are theory symbols
or equality, σ is an mgu of s and t, D ′ = (L[t] ∨ D)σ is ground, and u1 , . . . , un
are the ground terms occurring in D ′ that are different from s and headed
by a non-theory function symbol.
In the example above, the theory instantiation I inference
l ≤ min(a)
ground (min(a))
and the theory instantiation II inference
ground (min(a)) min(x) ≤ max (x)
ground (max (a)) export(min(a) ≤ max (a))
22 Empirically Successful Computerized Reasoning
cause min(a) ≤ max (a) to be exported to the SMT procedure and ensure that the clause
ground (max (a)) is available for further theory instantiation II inferences within Spass.
If we select guard literals whenever possible, and if we choose the atom ordering so
that non-theory predicates are larger than theory predicates, then the theory instantia-
tion rule is used only as a last resort: It is applied only if there are no non-theory guard
literals that might produce ground instances in a more restricted manner.
6 Arithmetic Simplification
There are several reasons to supplement the external SMT procedure in Spass+T by
simplification techniques that are built-in directly into Spass+T. First of all, such sim-
plification techniques offer the chance to find solutions for variables, say, to replace a
clause ∀x. (¬ x + 2 = 5 ∨ p(x)) by ∀x. (¬ x = 3 ∨ p(x)) and then by p(3). Second,
they allow us to reduce different numeric subexpressions to the same number, so that
the search space of Spass+T is not cluttered by different, but numerically equivalent
clauses, such as p(2 + 1), p(1 + 2), p(1 + (1 + 1)), etc. Third, by applying arithmetic
simplification in advance, an equation like ∀x. (x + 0 = x) can be used to simplify a
formula such as p(((a + 2) − 1) − 1) to p(a).
In Spass+T, we use a combination of additional input axioms encoding some frag-
ment of arithmetic and specialized simplification rules dealing with the numeric part.
The latter include rules for evaluation of constant numeric subexpressions, such as
C[c1 + c2 ]
C[c0 ]
and
C[(t + c1 ) + c2 ]
C[t + c0 ]
where c1 and c2 are numeric constants and c0 = c1 + c2 , and rules for elementary
(in-)equation solving, for instance
C ∨ [¬] t + c1 ∼ c2
C ∨ [¬] t ∼ c0
where c0 = c2 − c1 and ∼ is equality or an ordering relation.
All the arithmetic simplification rules we have implemented have the property that
the resulting formula is smaller than the original one in any Knuth-Bendix ordering,
provided that all constants have the same weight. Still it should be clear that there is
some risk of losing proofs by applying arithmetic simplification. As a trivial example
consider the two clauses ∀x. (p(3 · x + 4)) and ¬ p(3 · 5 + 4). These clauses are obviously
contradictory, but lacking theory unification, there is no way to derive a contradiction
as soon as the second clause has been simplified to ¬ p(19).
In contrast to formulas that describe the relationships between concrete numerical
constants, equations such as ∀x. (x + 0 = x), ∀x. (x − x = 0), or ∀x, y. ((x − y) + y = x)
are preferably added to the input as ordinary axioms. In this way they are not only avail-
able for simplification, but also for superposition or resolution inferences. In our experi-
ments, this capability turned out to be extremely useful for refuting non-ground queries.
Empirically Successful Computerized Reasoning 23
The integer ordering expansion rule, which is essential for some kinds of inductive
proofs, is in some way a hybrid between the two cases above.
C ∨s≤t
Integer ordering expansion
C∨s=t∨s≤t−1 C ∨s = t∨s+1≤ t
It is an inference rule, rather than a simplification rule, and it corresponds to resolution
inferences with the axioms ∀x, y. (¬ x ≤ y ∨ x = y ∨ x ≤ y − 1) and ∀x, y. (¬ x ≤ y ∨ x =
y ∨ x + 1 ≤ y). These are unordered resolution inferences, however, so Spass would
not perform them. The integer ordering expansion rule fills this gap, but it has to be
restricted, for instance by limiting applications to clauses with only one positive literal;
otherwise it turns out to be too productive.
7 Experiments
We have tested Spass+T both on a list of sample conjectures [8] (theorems and non-
theorems) from the TPTP library [9] and on our own list of benchmark problems, mostly
combining arithmetic and data structures. We used Spass+T with CVC Lite 2.5 as ex-
ternal decision procedure for the union of the theory of uninterpreted functions and in-
teger arithmetic. Spass+T options were chosen identically for all benchmark problems;
in particular, arithmetic simplification, theory instantiation, and (a very restricted form
of) integer ordering expansion were switched on, and precedences were chosen uniformly
in such a way that non-theory predicates (notably the containment relation for collec-
tions) became larger than theory predicates. 3 Moreover, the following set of auxiliary
axioms was supplied:
INT01 ∀X, Y. ((X − Y ) + Y = X).
INT02 ∀X. (X + 0 = X).
INT03 ∀X. (0 + X = X).
INT04 ∀X. (X < X + 1).
INT05 ∀X, Y. (X < Y → X ≤ Y ).
INT06 ∀X. (X ≤ X).
INT07 ∀X, Y. (X < Y ↔ Y > X).
INT08 ∀X, Y. (X ≤ Y ↔ Y ≥ X).
INT09 ∀X, Y. (¬ (X < Y ∧ X ≥ Y )).
INT10 ∀X, Y. (¬ (X ≤ Y ∧ X > Y )).
INT11 ∀X, Y. (X ≤ Y ∧ Y ≤ X → X = Y ).
INT12 ∀X, Y. (X − Y = X + (−Y )).
INT13 ∀X. (−(−X) = X).
INT14 ∀X. (X · 0 = 0).
INT15 ∀X. (0 · X = 0).
INT16 ∀X. (X · 1 = X).
INT17 ∀X. (1 · X = X).
INT18 ∀X. (X/1 = X).
INT19 ∀X. (X mod 1 = 0).
3
Using the set DomPred option of Spass.
24 Empirically Successful Computerized Reasoning
The experiments were carried out on an office PC with a 2.40 GHz Intel Pentium 4
CPU and 512 MB RAM running Debian Linux. All times reported are wall-clock time
in seconds.
7.1 TPTP Integer Arithmetic Problems
The list of integer arithmetic problems in the TPTP library contains 147 theorems and
36 non-theorems [8]. Most of the problems are rather easy. The results are summarized
in the following table:
Category Theorems Non-theorems
Number of problems: 147 36
Proof found: 140 0
Termination in less than 1 second: 136 0
Termination in 1 to 8 seconds: 4 0
No proof found: 7 36
Termination in less than 1 second: 6 31
Termination in 1 to 8 seconds: 1 3
Non-termination: 0 2
7.2 Further Problems
Integer arithmetic. In addition to the TPTP list, we have developed a collection
of 75 theorems mostly combining arithmetic and data structures to test Spass+T. In
the following list, the second column gives the runtime of Spass+T (wall clock time in
seconds) and the result, where “+” means “Proof found”, “-” means “No proof found”,
and “∞” means that the time limit of 600 seconds was exceeded.
(1) 1.82 - ∀X, Z. (∃Y. (X + Y = Z))
(2) 0.16 + ∀Y, Z. (∃X. (X + Y = Z))
(3) 0.68 - ∀X, Z. (∃Y. (X − Y = Z))
(4) 0.16 + ∀Y, Z. (∃X. (X − Y = Z))
(5) 0.33 - ∃X. (X + X < −10)
(6) 0.19 + ∃X. (X · 3 + (−5) < −12)
(7) 0.23 + ∃X, Y. (3 · X + 5 · Y = 24)
(8) 0.22 + ∃X, Y. (3 · X + 5 · Y = 23)
(9) 0.38 - ∃X, Y. (3 · X + 5 · Y = 22)
(10) 0.69 + ∀X, Y. (¬ 4 · X + 6 · Y = 21)
(11) 1.01 + ∀X, Y, Z. (2 · X + Y + Z = 10 ∧ X + 2 · Y + Z = 10 → ¬ Z = 0)
Empirically Successful Computerized Reasoning 25
(12) 1.65 + ∀X, Y, Z. (X < 5 ∧ Y < 3 ∧ X + 2 · Y > 7 → Y = 2)
(13) 4.77 + ∀X. (∃Y. (Y < X ∧ ¬ ∃Z. (Y < Z ∧ Z < X)))
(14) 0.21 + ¬ ∃X. (0 < X ∧ ∀Y. (Y < X → Y + 1 < X))
Integer arithmetic plus free function or predicate symbols.
(15) 0.18 + ∀X, Y. (X + Y = f (X) ∧ Y − f (X) = 0 → Y = f (0))
(16) 0.91 + ∀X. (f (X) > X) → f (f (f (6))) ≥ 9
(17) 3.75 + ∀X. (f (X) > X) → ∀Y, Z. (f (f (Y ) + Z) ≥ Y + Z + 2)
(18) 1.23 + ∀X, Y. (X < Y → f (X) < f (Y )) → ∀Y. (f (f (Y ) + 2) > f (f (Y )))
(19) 1.40 - ∀X, Y. (X < Y → f (X) < f (Y )) → ∀Y. (f (f (Y ) + 2) > f (f (Y )) + 1)
(20) 0.18 + ∀X. (f (X) > 1) → 7 − 2 · f (3) < 4
(21) 1.01 + ∀X. (f (X) > X) → ∀X. (X − f (X) < 0)
(22) 1.17 - ∀X, Y. (g(X, Y ) = g(X, Y + 2)) ∧ g(3, 3) = g(3, 4) → g(3, 2) = g(3, 5)
(23) 0.20 + p(14 · 3 + 8) → p(50)
(24) 0.24 + ∀X. (p(X + 3)) → p(5)
(25) 0.46 - ∀X. (p(2 · X)) → p(10)
(26) 0.59 - p(0) ∧ ∀X. (p(X) → p(X + 1)) ∧ ∀X. (p(X) → p(X − 1)) → ∀X. (p(X))
Integer arithmetic and arrays. The formulas
ARR01 ∀A, I, X. (read (write(A, I, X), I) = X).
ARR02 ∀A, I, J, X. (I = J ∨ read (write(A, I, X), J) = read (A, J)).
were used to axiomatize arrays for the following problems:
(27) 0.29 + ∀A, A′ . (A = write(write(write(write(A′ , 3, 33), 4, 444), 5, 55), 4, 44)
→ read (A, 3) = 33)
(28) 0.61 + ∀A, A′ , I. (A = write(write(write(write(A′ , 3, 33), 4, 44), 5, 55), I, 66)
→ read (A, 4) = 44 ∨ read (A, 4) = 66)
(29) 1.43 + ∀A, A′ , I.(A = write(write(write(write(A′ , 3, 33), 4, 444), 5, 55), 4, 44)
∧3≤ I ∧I ≤ 4
→ 33 ≤ read (A, I) ∧ read (A, I) ≤ 44)
(30) 0.25 + ∀A, A′ . (A = write(write(write(write(A′ , 3, 33), 4, 444), 5, 55), 4, 44)
→ ∃I. (read (A, I) = 33))
26 Empirically Successful Computerized Reasoning
(31) 0.15 + ∀A, A′ . (A = write(write(write(write(A′ , 3, 33), 4, 444), 5, 55), 4, 44)
→ ∃I. (read (A, I) < 40 ∧ 30 < read (A, I)))
(32) 0.39 + ∀A, A′ . (∀I. (read (A′ , I) > I)
∧ A = write(write(A′ , 3, 5), 7, 9)
→ ∀I. (read (A, I) > I))
(33) 3.78 + ∀A, A′ , J. (∀I. (read (A′ , I) > I)
∧ A = write(A′ , J, J + 3)
→ ∀I. (read (A, I) > I))
(34) 0.37 + ∀A, A′ . (A = write(write(write(write(A′ , 3, 33), 4, 444), 5, 55), 4, 44)
∧ ∀I. (read (A′ , I) < 100)
→ ∀I. (read (A, I) < 100))
(35) 0.30 + ∀A, A′ , J, K. (∀I. (J ≤ I ∧ I ≤ K → read (A′ , I) > 0)
∧ A = write(A′ , K + 1, 3)
→ ∀I. (J ≤ I ∧ I ≤ K + 1 → read (A, I) > 0))
(36) 5.65 + ∀A, A′ , J, K. (∀I. (J ≤ I ∧ I ≤ K → read (A′ , I) > 0)
∧ A = write(A′ , J + 2, read (A′ , J + 1) + 1)
→ ∀I. (J ≤ I ∧ I ≤ K → read (A, I) > 0))
(37) 3.52 - ∀A, J, K. (∀I. (J ≤ I ∧ I ≤ K → read (A, I) > 0)
→ ∀I. (J + 3 ≤ I ∧ I ≤ K → read (A, I) > 0))
(38) 31.75 + ∀A. (∀I. (1 ≤ I ∧ I ≤ 10 → read (A, I) > I)
∧ ∀I. (11 ≤ I ∧ I ≤ 20 → read (A, I) > 20 − I)
→ ∀I. (6 ≤ I ∧ I ≤ 15 → read (A, I) > 5))
(39) 0.23 + ∀A. (∀I. (20 ≤ I ∧ I ≤ 30 → read (A, I) = I + 25)
→ ∃I. (read (A, I) = 50))
(40) 0.56 - ∀A. (∀I. (20 ≤ I ∧ I ≤ 30 → read (A, I) = 2 · I + 3)
→ ∃I. (read (A, I) = 53))
(41) 0.27 + ∀A, A′ , I, J, K. (¬ read (A′ , I) = read (A′ , J)
∧ A = write(write(write(A′ , J, 0),
K, read (A′ , K) + 1), I, 0)
→ ∃L. (¬ read (A, L) = read (A′ , L)))
(42) 9.90 + ∀A, A′ , I, J, K. (A = write(write(write(A′ , I, 3), I + 2, 2), I + 4, 1)
∧I ≤J ∧J ≤I +3
→ ∃K. (J ≤ K ∧ K ≤ J + 3 ∧ read (A, K) ≤ 3))
Empirically Successful Computerized Reasoning 27
Integer arithmetic and collections. The formulas
COL01 ∀I. (¬ I ∈ ∅).
COL02 ∀I, S. (I ∈ add (I, S)).
COL03 ∀I, S. (¬ I ∈ remove(I, S)).
COL04 ∀I, S, J. (I ∈ S ∨ I = J ↔ I ∈ add (J, S)).
COL05 ∀I, S, J. (I ∈ S ∧ ¬ I = J ↔ I ∈ remove(J, S)).
were used to axiomatize collections of integers for the following problems:
(43) 0.31 + ¬ 4 ∈ add (1, add (3, add (5, ∅)))
(44) 0.56 + ∀S, I, J. (S = add (5, add (3, add (1, ∅)))
∧I ∈S∧J ∈S∧¬I =J
→ I + J < 9)
(45) 0.23 + ∀S, S ′ . (∀I. (I ∈ S ′ → I > 0)
∧ S = add (2, remove(7, S ′ ))
→ ∀I. (I ∈ S → I > 0))
(46) 0.27 + ∀S, S ′ . (∀I. (I ∈ S ′ → I > 0)
∧ S = remove(4, remove(1, remove(2, S ′ )))
→ ∀I. (I ∈ S → I > 2))
(47) 0.64 + ∀S. (∀I. (I ∈ S → I ≥ 0)
∧¬0 ∈ S ∧¬1∈ S
→ ∀I. (I ∈ S → I ≥ 2))
(48) 0.23 + ∀S. (∀I. (I ∈ S → I ≥ 0)
∧ ∀I. (I ∈ S ↔ I ∈ remove(0, remove(1, S)))
→ ∀I. (I ∈ S → I ≥ 2))
(49) 1.92 + ∀S, S ′ , J. (∀I. (I ∈ S ′ → I > 0)
∧ J ∈ S ′ ∧ S = add (J + 2, remove(J, S ′ ))
→ ∀I. (I ∈ S → I > 0))
(50) 2.29 + ∀S, S ′ , J, K. (∀I. (I ∈ S ′ → I > 0)
∧ J ∈ S ′ ∧ K ≥ 0 ∧ S = add (J + K, remove(J, S ′ ))
→ ∀I. (I ∈ S → I > 0))
(51) 1.37 + ∀S, S ′ . (∀I. (I ∈ S ′ → I > 0)
∧ ∀I. (I ∈ S → ∃J. (J ∈ S ′ ∧ I > J))
→ ∀I. (I ∈ S → I > 1))
(52) 1.95 + ∀S, S ′ . (∀I. (I ∈ S ′ → I > 0)
∧ ∀I. (I ∈ S → ∃J. (J ∈ S ′ ∧ 2 · I − 5 · J > 0))
→ ∀I. (I ∈ S → I > 2))
(53) 1.90 + ∀S, S ′ . (∀I. (I ∈ S ′ → I > 0)
∧ ∀I. (I ∈ S → ∃J, K. (J ∈ S ′ ∧ K ∈ S ′ ∧ I = J + K))
→ ∀I. (I ∈ S → I > 1))
28 Empirically Successful Computerized Reasoning
(54) 0.36 + ∀S. (S = add (10, add (30, add (50, ∅)))
→ ∃I. (20 ≤ I ∧ I ≤ 40 ∧ I ∈ S))
Integer arithmetic and collections with counting. The formulas
CCO01 ∀I. (¬ I ∈ ∅).
CCO02 ∀I, S. (I ∈ add (I, S)).
CCO03 ∀I, S. (¬ I ∈ remove(I, S)).
CCO04 ∀I, S, J. (I ∈ S ∨ I = J ↔ I ∈ add (J, S)).
CCO05 ∀I, S, J. (I ∈ S ∧ ¬ I = J ↔ I ∈ remove(J, S)).
CCO06 ∀S. (count (S) ≥ 0).
CCO07 ∀S. (S = ∅ ↔ count (S) = 0).
CCO08 ∀I, S. (¬ I ∈ S ↔ count (add (I, S)) = count (S) + 1).
CCO09 ∀I, S. (I ∈ S ↔ count (add (I, S)) = count(S)).
CCO10 ∀I, S. (I ∈ S ↔ count (remove(I, S)) = count(S) − 1).
CCO11 ∀I, S. (¬ I ∈ S ↔ count (remove(I, S)) = count (S)).
CCO12 ∀I, S. (I ∈ S → S = add (I, remove(I, S))).
were used to axiomatize collections of integers with a counting operation for the following
problems:
(55) 22.09 + ∀S. (count (remove(5, S)) ≥ 7
→ count(remove(4, S)) ≥ 6)
(56) 3.71 + ∀S. (count (add (5, S)) = count (add (3, S))
→ count(remove(5, S)) = count (remove(3, S)))
(57) 2.20 + ∀S, I. (count (S) + 1 ≥ count(add (I, S)))
(58) 11.95 + ∀S, I, K. (K > 0 → count(S) + K ≥ count(add (I, S)))
(59) 19.49 + ∀S, I, K. (K > 0 → count(S) + K ≥ count(add (I, add (I, S))))
(60) 0.42 + ∀S, I. (count (remove(2, S)) = 0
∧ count(remove(3, S)) = 0
→ count (remove(I, S)) = 0)
(61) 0.32 + ∀S. (2 ∈ S ∧ count (S) = 1 → ¬ 5 ∈ S)
(62) 23.22 + ∀S. (2 ∈ S ∧ 3 ∈ S ∧ count(S) = 2 → ¬ 5 ∈ S)
(63) 28.16 + ∀S, I. (2 ∈ S ∧ 3 ∈ S ∧ count (S) = 2 ∧ I > 3 → ¬ I ∈ S)
(64) 118.41 + ∀S, I, J. (I < 3 ∧ 6 < J
∧I ∈S∧J ∈S
∧ count(S) = 2
→ ¬ 5 ∈ S)
(65) 0.34 + ∃S, I. (count (S) + 1 > count(add (I, S)))
Empirically Successful Computerized Reasoning 29
(66) 309.91 + ∃S. (count (S) = 3)
(67) 269.01 + ∀S, I, J, K. (I > J ∧ J > K
∧I ∈S∧J ∈S∧K ∈S
→ count(S) > 2)
(68) 0.53 + ∀S, J. (∀I. (I ∈ S → J > I)
→ count (add (J, S)) > count(S))
(69) 4.04 + count (add (1, add (3, ∅))) = 2
(70) 1.02 + count (add (5, remove(3, add (3, ∅)))) = 1
(71) ∞ - count (add (1, add (5, remove(3, add (2, ∅))))) = 3
(72) 3.49 + ∀S, I. (count (remove(I, add (I, S))) = count (remove(I, S)))
(73) ∞ - ∀S, I. (count (add (0, remove(I, add (I, S))))
= count (add (0, remove(I, S))))
Integer arithmetic and pointer data types. The formulas
PNT01 ∀X. (¬ isrecord (X) → length(X) = 0)
PNT02 ∀X. (isrecord (X) → length(X) ≥ 1)
PNT03 ∀X. (isrecord (X) → length(X) = length(next (X)) + 1)
PNT04 ∀X. (¬ isrecord (X) → ¬ isrecord (split1 (X)))
PNT05 ∀X. (isrecord (X) → isrecord (split1 (X)))
PNT06 ∀X. (isrecord (X) → data(split1 (X)) = data(X))
PNT07 ∀X. (isrecord (X) ∧ ¬ isrecord (next(X))
→ ¬ isrecord (next(split1 (X))))
PNT08 ∀X. (isrecord (X) ∧ isrecord (next (X))
→ next(split1 (X)) = split1 (next(next(X))))
PNT09 ∀X. (¬ isrecord (X) → ¬ isrecord (split2 (X)))
PNT10 ∀X. (¬ isrecord (next(X)) → ¬ isrecord (split2 (X)))
PNT11 ∀X. (isrecord (X) ∧ isrecord (next (X)) → isrecord (split2 (X)))
PNT12 ∀X. (isrecord (X) ∧ isrecord (next (X))
→ data(split2 (X)) = data(next(X)))
PNT13 ∀X. (isrecord (X) ∧ isrecord (next (X))
→ next(split2 (X)) = split2 (next(next(X))))
were used to axiomatize singly linked lists with pointers and two recursive functions
split1 and split2 (computing the sublists of odd-numbered and even-numbered elements
of a list) for the following problems:
(74) 20.02 + ∀X, Y. (isrecord (X) ∧ isrecord (next(X)) ∧ Y = next(next(X))
∧ (2 · length(split2 (Y )) = length(Y ) − 1
∨ 2 · length(split2 (Y )) = length(Y ))
→ (2 · length(split2 (X)) = length(X) − 1
∨ 2 · length(split2 (X)) = length(X)))
30 Empirically Successful Computerized Reasoning
(75) 17.18 + ∀X, Y. (isrecord (X) ∧ isrecord (next(X)) ∧ Y = next(next(X))
∧ length(split1 (Y )) + length(split2 (Y )) = length(Y )
→ length(split1 (X)) + length(split2 (X)) = length(X))
7.3 Discussion
Summarizing the previous section, we see that Spass+T solves 63 out of the 75 problems
in our list. In order to check to what extent the mechanisms implemented in Spass+T
contribute to these proofs, we have gradually disabled various features and re-run the
tests. The following table shows the number of proofs found for several combinations of
features:
SMT Theory Arithmetic Auxiliary Problems
procedure instantiation simplification axioms solved (out of 75)
+ + + + 63
+ + - + 52
+ - + + 56
- - + + 39
- - - + 12
+ + + - 53
How does Spass+T compare to a system like Simplify (Detlefs, Nelson, and Saxe [4])?
The automatic theorem prover Simplify was developed as the proof engine of ESC/Java
and ESC/Modula-3. It handles universal quantifiers in the input by computing (hope-
fully) relevant instances in a similar way as the theory instantiation rule of Spass+T
and analyzing the resulting formulas using SAT checking and a Nelson-Oppen combina-
tion of decision procedures. Simplify (version 1.5.4) solves 40 out of the 75 problems in
our list. The difference to Spass+T is partially due to the fact that Simplify solves only
four out of 23 problems that involve existential quantifiers, namely (41) and (51)–(53).
Somewhat surprisingly, however, Simplify fails also for (55), (56), (59), and (68)–(74).
On the other hand, due to the highly optimized implementation and the reduced search
space, all problems that Simplify does solve are solved in less than one second.
It is perhaps illustrative to have a closer look at those problems that Spass+T failed
to prove. Spass+T has some support for solving non-ground equations, but it makes
no attempt to be complete in this domain, and in particular it does not use any kind of
theory unification. This accounts for most of the missed proofs in the TPTP list and in
theorems (1)–(14), as well as for (22), (25), and (40). It may look strange that Spass+T
succeeds for (2) and (4), but fails for (1) and (3), but the explanation is easy: (2) and (4)
are proved by superposition with INT01. We could add the symmetric version of INT01,
so that (1) and (3) become provable as well, but having both INT01 and its symmetric
version in the clause set destroys termination and seems to create more problems than
it solves. Similarly, (7) and (8) can be handled by superposition with INT14 or INT16
plus elementary equation solving, whereas (9) would require a fully-fledged Diophantine
equation solver.
Theorem (19) is rather difficult for an automated system because for proving it one
has to “invent” a term that does not occur in the problem, namely f (f (Y ) + 1), and
Empirically Successful Computerized Reasoning 31
then use transitivity. Moreover, as one might expect, Spass+T has no chance to prove
the induction theorem (26). On the other hand, an extension of the theory instantiation
rule would enable Spass+T to prove the pigeonhole-like formula ∀X, Y. (f (X) = f (Y ) →
X = Y ) ∧ 6 < f (3) ∧ f (3) < 9 ∧ 6 < f (4) ∧ f (4) < 9 → f (5) < 6 ∨ 9 < f (5) from the
TPTP list. Two changes are required: The inference rule theory instantiation II has to
take an arbitrary number of premises
ground (s1 ) . . . ground(sk ) L[t1 , . . . , tk ] ∨ D
ground (u1 ) . . . ground(un ) export(D ′ )
moreover it has to be applicable even if the terms t i occur below a negated equation
symbol. So far, we have not implemented this modification, since the extended rule
would be extremely prolific.
Even though the CCO axioms have a simple operational reading, it seems to be
difficult for automated provers to detect this fact, as one can see from the poor results
for straightforward computation tasks like (71) or (73). Simplify even exceeds the 600
second time limit for the simpler variants (69), (70), and (72).
There is one odd case left: Surprisingly, Spass+T succeeds to prove formula (37),
if theory instantiation or integer ordering expansion are switched off, but fails to find
the proof if both inference rules are switched on. It turns out that, in the latter case,
Spass+T triggers a bug in CVC Lite.
7.4 Download
A current snapshot of Spass+T for Linux including the benchmark problems can be
downloaded from http://www.mpi-inf.mpg.de/~uwe/software/.
8 Outlook
The development of Spass+T is ongoing work. More effort has to be spent on fine-tuning
the theory instantiation rule and the arithmetic simplification rules. A more challenging
extension is the implementation of subsumption checks for clauses with theory literals,
which might turn Spass+T into a decision procedure for some classes of formulas. We
are also looking into using Spass+T with non-numeric base theories, such as arrays,
lists, or bitvectors.
Acknowledgments: We are grateful to Christoph Weidenbach and Thomas Hillen-
brand for helpful discussions.
References
[1] Leo Bachmair, Harald Ganzinger, and Uwe Waldmann. Refutational theorem
proving for hierarchic first-order theories. Applicable Algebra in Engineering, Com-
munication and Computing, 5(3/4):193–212, 1994.
[2] Clark Barrett and Sergey Berezin. CVC Lite: A new implementation of the Coop-
erating Validity Checker. In Rajeev Alur and Doron A. Peled, editors, Computer
32 Empirically Successful Computerized Reasoning
Aided Verification, 16th International Conference, CAV 2004, LNCS 3114, pages
515–518, Boston, MA, USA, 2004. Springer-Verlag.
[3] Robert S. Boyer and J Strother Moore. Integrating decision procedures into heuris-
tic theorem provers: A case study of linear arithmetic. In Jean E. Hayes, Donald
Michie, and Judith Richards, editors, Machine Intelligence 11: Logic and the ac-
quisition of knowledge, chapter 5, pages 83–124. Oxford University Press, 1988.
[4] David Detlefs, Greg Nelson, and James B. Saxe. Simplify: A theorem prover for
program checking. Journal of the ACM, 52(3):365–473, May 2005.
[5] Harald Ganzinger, George Hagen, Robert Nieuwenhuis, Albert Oliveras, and Ce-
sare Tinelli. DPLL(T): Fast decision procedures. In Rajeev Alur and Doron A.
Peled, editors, Computer Aided Verification, 16th International Conference, CAV
2004, LNCS 3114, pages 175–188, Boston, MA, USA, 2004. Springer-Verlag.
[6] Harald Ganzinger, Viorica Sofronie-Stokkermans, and Uwe Waldmann. Mod-
ular proof systems for partial functions with weak equality. In David Basin
and Michaël Rusinowitch, editors, Automated Reasoning: Second International
Joint Conference, IJCAR 2004, LNAI 3097, pages 168–182, Cork, Ireland, 2004.
Springer-Verlag. Corrected version at http://www.mpi-sb.mpg.de/~uwe/paper/
PartialFun-bibl.html.
[7] Scott McPeak and George C. Necula. Data structure specifications via local equal-
ity axioms. In Kousha Etessami and Sriram K. Rajamani, editors, Computer Aided
Verification, 17th International Conference, CAV 2005, LNCS 3576, pages 476–
490, Edinburgh, Scotland, UK, 2005. Springer-Verlag.
[8] Stephan Schulz and Geoff Sutcliffe. http://www.cs.miami.edu/~tptp/TPTP/
Proposals/IntegerArithmetic.p, March 15, 2006.
[9] Geoff. Sutcliffe and Christian B. Suttner. The TPTP Problem Library: CNF
Release v1.2.1. Journal of Automated Reasoning, 21(2):177–203, 1998.
[10] Christoph Weidenbach, Uwe Brahm, Thomas Hillenbrand, Enno Keen, Christian
Theobalt, and Dalibor Topić. SPASS version 2.0. In Andrei Voronkov, editor,
Automated Deduction – CADE-18, 18th International Conference on Automated
Deduction, LNAI 2392, pages 275–279, Copenhagen, Denmark, 2002. Springer-
Verlag.
Empirically Successful Computerized Reasoning 33