=Paper=
{{Paper
|id=Vol-192/paper-3
|storemode=property
|title=Search for faster and shorter proofs using machine generated lemmas
|pdfUrl=https://ceur-ws.org/Vol-192/paper03.pdf
|volume=Vol-192
}}
==Search for faster and shorter proofs using machine generated lemmas==
Search for faster and shorter proofs
using machine generated lemmas
Petr Pudlák
Charles University in Prague
pudlak@artax.karlin.mff.cuni.cz
Abstract
When we have a set of conjectures formulated in a common language and proved from
a common set of axioms using an automated theorem prover, it is often possible to automat-
ically construct lemmas that can be used to prove the conjectures in a shorter time and/or
with shorter proofs.
We have implemented a system that repeatedly tries to improve the set of assumptions
for proofs of given conjectures using lemmas that it extracts from the proofs constructed
by an automated theorem prover. In many cases it can significantly reduce the total time or
the overall sum of the lengths of the proofs of the conjectures.
We present several examples of such sets of conjectures and show the improvements
gained by the system.
1 Motivation
Imagine a professor of mathematics who gives the same lectures every year. She works in a
standard theory and she has a fixed set of theorems that she wants to present and prove every
year during the course. And because she wants to save her and her students’ time, she would like
to have as efficient proofs as possible. She would like to find such lemmas that would shorten
the total time she has to spend for presenting the proofs.
Our aim will be to search for lemmas in such environment and try to use them to make the
proofs of conjectures more efficient.
Finding useful lemmas has of course much more serious applications, namely reducing the
amount of resources necessary to prove a particular set of conjectures or to prove conjectures
that could not be proved without such lemmas.
Currently the system only restructures and compacts the proofs and doesn’t handle conjec-
tures that the prover was not able to prove at the beginning. It can only improve existing proofs.
In future we hope to be able to also search for proofs of the conjectures that couldn’t be proved
from the initial set of assumptions.
2 Previous research
The idea of an automated discovery of lemmas or theorems is not a new one. There were many
different approaches to solve this task.
34 Empirically Successful Computerized Reasoning
Owen L. Astrachan and Mark E. Stickel [AS92] used the idea of reusing lemmas to speed
up a model elimination theorem prover.
Art Quaife [Qua92] used Otter [McC94] to prove many fundamental mathematical theo-
rems. He included the theorems he had already proved as assumptions for the more complicated
ones. The sequence in which the theorems were proved was determined by Quaife, based on his
mathematical knowledge.
Marc Fuchs, Dirk Fuchs and Matthias Fuchs sought for lemmas using genetic programming
to improve tableau-based proof search [FFF99].
The HR system [Col02a], named after mathematicians Hardy and Ramanujan, uses a model
generator to construct models based on a set of axioms, attempts to formulate conjectures and
then prove them using an automated theorem prover. A brief description can be also found in
[Col02b].
Larry Wos and Gail W. Pieper describe the technique of lemma adjunction in [WP03] and
also discuss many different approaches for evaluating lemmas.
Article [SGC03] summarizes many different criteria for constructing and identifying quality
lemmas. Humans generally use the inductive approach – from many similar problems they try
to induce a more general conjecture. This approach requires good knowledge of the particu-
lar field of mathematic and also good mathematical intuition. Another approach described is
generative, when more sophisticated techniques (for example syntactic manipulation) are used
to construct new conjectures. An example of such a system is the HR [Col02b] program. The
manipulative approach tries to construct interesting conjectures from already existing theorems.
Finally, the deductive approach tries to automatically construct many logical consequences from
a set of axioms using an automated theorem prover and then filter them and pick those that are
interesting for the researcher.
The authors of [SGC03] also categorize different possible filters for interestingness. The
filters include non-obviousness, novelty, suprisingness, intensity and usefulness.
Our approach is somewhat different from those mentioned above. It lies in between the
manipulative and the deductive approach. We observe proofs of all the given problems and
identify lemmas that are common to many of the proofs. The filter we develop and use for
selecting good lemmas falls into the usefulness category - our measure is, how much each lemma
could contribute to the proofs of other conjectures. Unlike other measures, this one can easily
be evaluated by comparing different proofs conducted by the prover.
The lemmas that we produce are therefore interesting from the point of view of a machine.
Hence, this can give us an interesting comparison between human and machine opinions on the
usefulness of a lemma.
The ideas in the article [SGC03] and personal communication with many other researchers
inspired our work which we present in this paper together with the empirical results we have
obtained.
3 Overview of the system
In sequel we assume that a consistent theory is given with some (possibly infinite) set of axioms.
All the formulas we work with are formulated in the language of the theory and the conjectures
that are to be proved and the lemmas that are constructed are proved using axioms selected from
the axioms of the theory.
Empirically Successful Computerized Reasoning 35
The systems starts by proving the conjectures one by one. The proofs of the conjectures that
were successfully proved are then analyzed by the system. Formulas that appear multiple times
in the proofs are then used as additional assumptions when looking for less costly proofs of the
conjectures. The system tries to optimize the set of such formulas. We will call such formulas
lemmas.
3.1 Basic notions
Let us first define some basic notions we will use throughout the text.
Notation 1 (Axiom, conjecture, lemma, proof) By an axiom we understand either an axiom
of the underlying theory or a well known theorem of the theory that we use as an assumption.
A conjecture is a formula given on the input that is to be proved by the system. The system
tries to find the most efficient proof of the conjecture using different sets of assumptions. Each
such a set can contain some of the axioms, other conjectures or lemmas.
A lemma is a formula constructed by the system that is proved in a similar fashion as the
conjectures are, and is used to improve the proofs of the conjectures or the proofs of other
lemmas.
A proof is an output of a successful run of the prover. As we use a single theorem prover
with the same settings on every run, the proof only depends on the conjecture being proved and
on the set of assumptions being used.
The aim of the system is to find such lemmas and such sets of assumptions for the conjec-
tures and the lemmas that either the total time required to prove the conjectures and the lemmas
or the overall size or length of the proofs is minimized.
3.2 System input
Initially, the system is given a set C of conjectures and for each conjecture C from C the system
is given an initial set of assumptions AC . The set AC consists of a subset of axioms of the
underlying theory and (possibly) of some other conjectures from C.
The idea is that the proofs of conjectures in C need not use the set of all the axioms of the
theory (which might be infinite) and that some other conjectures from C might be useful. The
user may have his own idea, which conjectures to add as assumptions to AC .
In the process of computation the system tries to optimize the set of lemmas and for each
lemma the set of its assumptions.
3.3 System output
At the end the system outputs the conjectures given at the input along with the lemmas that
participate in the fastest/shortest set of proofs. For each conjecture and each lemma it outputs
the optimal set of assumptions it has found, in the sense described later.
4 An example
In our example we use a commonly used encoding to encode an axiomatization of propositional
logic into terms, thus forcing the prover to use only the specified axioms and rules for inferences.
Then we use the system to prove some basic propositional theorems.
36 Empirically Successful Computerized Reasoning
Note that the automated theorem provers were already used to look for interesting axiomat-
ics of propositional logic, for example [MVF+ 02]. However, our aim is not to look for a new or
otherwise interesting axiomatic, we use the system to optimize the proofs of several theorems.
We will denote the code of a formula by an over-line. For example for a propositional
formula ϕ we denote the term that codes it by ϕ. Negation is coded by a unary function n and
implication is coded by a binary function i. The fact that ϕ is a theorem ⊢ ϕ is coded by a unary
predicate t. The following table summarizes the codes:
¬ϕ n(ϕ)
ϕ → ψ i(ϕ, ψ)
⊢ϕ t(ϕ)
We use “→” just for implication in propositional logic we are coding. We use “⇒” for im-
plication in predicate logic in which the conjectures are presented and in which the prover
actually works. Thus the statement “if ϕ is provable then ψ → χ is provable” would be coded
as t(A) ⇒ t(i(B,C)).
In order to increase the difficulty of the task we’ve used Meredith’s single axiom for propo-
sitional logic. It has only a single axiom schema and a single rule.
The axiom schema is defined for any formulas ϕ, ψ, χ, ξ and η, but the theorem prover
computes with their codes, which are represented by the variables A, B, C, D and E in the
language of the theorem prover. As these variables are universally quantified, the prover can
replace each variable by any coded formula. The same applies to the coded modus ponens rule.
The coded representations of the schema and the rule are:
Meredith: ⊢ ((((ϕ → ψ) → (¬χ → ¬ξ)) → χ) → η) → ((η → ϕ) → (ξ → ϕ))
t(i(i(i(i(i(A, B), i(n(C), n(D))),C), E), i(i(E, A), i(D, A))))
MP: ϕ, ϕ → ψ ⊢ ψ
(t(A) ∧ t(i(A, B))) ⇒ t(B)
This set of axioms was used as the initial set of assumptions for all the conjectures. No other
dependencies between the conjectures were specified. The system was let to discover all the
lemmas by itself from the proofs of the conjectures.
Table 1 shows the conjectures whose proofs the system was improving. The conjectures
were proved using E prover[Sch02]. The cost of the proofs was measured in the number of
processed clauses. This measure closely corresponds to the time the prover spends on a problem,
but it is not dependent on the hardware the prover runs on. In the second column the conjectures
are presented in their coded form. The third column shows the number of processed clauses the
prover spent on each of the conjectures if they were proved only using the original assumptions.
The fourth column shows the cost of the proofs after the system had finished the improvements
using lemmas generated from the proofs. Note that to obtain the total cost of the final proofs
we also have to include the cost of the proofs of the generated lemmas. The full listing of the
generated lemmas is included in Table 10. As we can see, the total cost of proofs needed to
prove the conjectures in this case was reduced to 3%.
5 Description of the system
Let us first discuss how the lemmas are generated. The modification of the set of assumptions
of the conjectures by using the lemmas will be described in the next part.
Empirically Successful Computerized Reasoning 37
conjecture coding into terms original cost final cost
¬(ϕ → ¬ψ) ⊢ ϕ t(n(i(A, n(B)))) ⇒ t(A) 938 62
¬(ϕ → ¬ψ) ⊢ ψ t(n(i(A, n(B)))) ⇒ t(B) 1673 17
ϕ ⊢ ¬ϕ → ψ t(A) ⇒ t(i(n(A), B)) 31 4
ψ ⊢ ¬ϕ → ψ t(B) ⇒ t(i(n(A), B)) 13 3
¬(ϕ → ϕ) ⊢ ψ t(n(i(A, A))) ⇒ t(B) 41 8
⊢ ϕ → (ψ → ϕ) t(i(A, i(B, A))) 17 4
ϕ → ¬(ψ → ¬χ), ϕ ⊢ χ (t(i(A, n(i(B, n(C))))) ∧ t(A)) ⇒ t(C) 287 20
⊢ϕ→ϕ t(i(A, A)) 15 2
⊢ ¬ϕ → (ϕ → ψ) t(i(n(A), i(A, B)))) 5731 4
⊢ ¬¬ϕ → ϕ t(i(n(n(A)), A)) 3297 7
⊢ ϕ → ¬¬ϕ t(i(A, n(n(A))) 3682 4
total cost of the conjectures: 15725 135
cost for proving the lemmas: 375
total cost: 15725 510
Table 1: Formulas proved from the Meredith’s axiomatization.
5.1 Generation and evaluation of lemmas
Notation 2 (Subsumption) By c ⊑ d we denote that the clause c that subsumes the clause d.
Let a set of proofs1 P be given. We collect all the clauses that appear in the proofs and that were
derived only from the assumptions into a single set
[
L= {c | c is a clause in P derived only from assumptions and without Skolem symbols}
P∈P
We do not include clauses that were derived from the negated conjecture, because they are
not true formulas in our theory. Also we remove those clauses that contain Skolem symbols
created by the prover, because these symbols have different meaning in different runs of the
prover. (In future we might implement the reverse skolemization algorithm [CP80, CP93] to
deal with such formulas.)
Thus, the clauses in L don’t contain any skolem symbols, but they have no special form,
they can contain arbitrary number of positive and/or negative literals.
Note that although it would be an obvious thing to do, we don’t use the conjectures as lem-
mas. The reason is that the conjectures are not necessarily clauses, therefore we can’t process
them the same way as the clauses produced by the prover. As it turns out, many of the con-
jectures are then anyway discovered as lemmas by the system. However, this issue deserves a
better solution in the future.
From the set L we construct a minimal L′ set with respect to subsumption such that L′ has
the following properties:
1. L′ ⊆ L, therefore L′ contains only true formulas.
1 Recall that for us a proof is a successful run of the prover that proves a particular conjecture from a given set of
assumptions.
38 Empirically Successful Computerized Reasoning
2. for every d ∈ L there is c ∈ L′ such that c ⊑ d.
3. for any pair c ∈ L′ , d ∈ L′ we know that c doesn’t subsume d: c 6⊑ d.
Therefore, L′ contains just the most general variants of the lemmas appearing in L.
The clauses from L′ are then used as lemmas to modify the set of assumptions of the con-
jectures.
Now let us have a clause c that is a lemma, c = L1 , . . . , Lk , ¬Lk+1 , . . . , ¬Ln where Li are
atomic formulas. Let |Li | be the number of function symbols appearing in Li . Let P by a proof
whose set of assumptions we want to improve by c. We would like to have an estimate that
would tell us, if it is likely that c will contribute to the proof P. A natural idea suggests itself
that the more often an atomic formula Li occurs in the proof P the more likely the lemma will
contribute and also that the longer Li is (measured in the number of symbols) the more likely it
will shorten/speed up the proof. Therefore our estimation formula is defined by
n
weight(c, P) = ∑ |Li | · (the number of occurrences of Li in P) (1)
i=1
There are many other possible estimation methods. One may, for example, take into account
the number of formulas in the proof which the lemma subsumes, look for similar terms in the
lemma and in the proof, etc.
5.2 Evaluation of the proofs
In order to evaluate the applicability of lemmas, we need to have a criterion for the cost of a
proof:
Notation 3 (Measure of a proof) A proof measure is a function that maps proofs into non-
negative real numbers.
We use the following proof measures:
the number of processed clauses reported by the prover; this measure is closely related to the
time the prover spends while searching for the proof of the conjecture, but is independent
of the hardware of the computer the prover runs on;
the length of the proof is the number of formulas appearing in the proof that is constructed by
the prover;
the size of the proof is the total number of occurrences of function symbols (not predicate
symbols) appearing in the proof that is constructed by the prover.
The length and the size of a proof are also independent on the hardware of the computer being
used.
Notation 4 If we are proving a conjecture C from assumptions A1 , . . . , An , we denote the mea-
sure of the proof by
||A1 , . . . , An C||
If the prover is not able to conduct the proof we set
||A1 , . . . , An C|| = +∞
Empirically Successful Computerized Reasoning 39
Remark 1 (The proof size/length) If we could prove all the given conjectures together, the
size/length of the resulting hypothetical proof would be of course less than the sum of the
sizes/lengths of the individual proofs of the conjectures. But in most cases the prover is not
able to prove all the conjectures together, therefore the system proves the conjectures one by
one. The system then tries to compact the proofs of the conjectures to make them closer to the
size of the hypothetical proof.
The number of generated lemmas is usually very large, so only some of them will be in-
cluded in the output. Such lemmas are marked as accepted. Each lemma is initially unaccepted.
When the system figures out that the lemma is worth including in the output, it marks it as
accepted.
For each conjecture or lemma C the system maintains a list SC of sets of assumptions that
were used to produce different proofs of the conjecture.
Each such a set S ∈ SC is marked as accepted iff all the lemmas it contains are accepted. The
initial set of assumptions of each conjecture contains no lemmas, hence it is always accepted.
Now, we may define:
Notation 5 (Best accepted set of assumptions) The best accepted set of assumptions
SCbest ∈ SC
of a conjecture C is an accepted set of assumptions that produces the best proof of C with respect
to the proof measure:
∀S : ((S ∈ SC ) ∧ (all the lemmas in S are accepted)) ⇒ ||S C|| ≥ ||SCbest C|| (2)
If there are several sets of assumptions that match the criteria for the best accepted set (they
produce proofs of the same measure), we arbitrarily choose one among them.
Let us call the proof of C produced from SCbest the best accepted proof of C.
Remark 2 (System output) The output of the system consists of all the input conjectures and
all the accepted lemmas at the time the system finishes execution. For each of these conjectures
or lemmas the best accepted set of assumptions is presented.
5.3 Modifying the set of assumptions to get better ones
The outline of the work of the system is as follows:
1. The system first tries to prove all the given conjectures one by one. Let C be the set of
those conjectures that were successfully proved. Let L be the set of constructed lemmas
and let La ⊆ L be the set of lemmas that are accepted. Initially, these sets are empty.
2. The system takes the best accepted proofs of all the conjectures and accepted lemmas.
From those proofs the system constructs new lemmas as described in section 5.1. It sets
L = L ∪ {the newly constructed lemmas}
40 Empirically Successful Computerized Reasoning
3. The system produces pairs consisting of a best assumption set of a conjecture and of a
lemma that will be used to improve the set of assumptions of the best proof of the con-
jecture. It takes all the new lemmas together with the lemmas it already has constructed
before and combines them with all the best accepted sets of assumptions of all the con-
jectures and accepted lemmas. This way it produces every possible pair of
L × {SCbest |C ∈ C ∪ La }
However some pairs of a lemma l and the best set of assumptions SCbest of a conjecture C
have to be excluded, namely
• if already l ∈ SCbest , or
• if the lemma l is the conjecture C itself, or
• if the conjecture C is directly or indirectly used to prove l; this means that either C
is one of the assumptions in the best accepted set of l, or it is one of the assumptions
in the best set of some conjecture that is an assumption of l, and so on.
Otherwise it could happen for example that there would be two equivalent lemmas, one
proving another with a one-step proof.
4. These pairs are sorted according to the estimate described in section 5.1.
5. The system subsequently takes these pairs (l, SCbest ) starting with the one with the highest
weight:
(a) It tries to conduct a new proof of C using SCbest ∪ {l}. The gain of this single im-
provement is
g = ||SCbest C|| − ||SCbest , l C||
(b) If g is positive, it means that the lemma l has brought an improvement. The lemma l
is then checked, if its total gain to all the conjectures whose set of assumptions were
improved by l is larger than the cost of the proof of l. If so, the system sets
La = La ∪ {l}
which means that l is marked as accepted.
If l is accepted, we want to incorporate the lemmas that originate from the proof of
l as well as the lemmas that originate from the proofs of the conjectures that use l
as an assumption. Hence, in such a case the process is restarted from the beginning
and the system goes again to point 2.
(c) Until there are any unprocessed pairs (l, Sbest ) the system takes the next pair and
C
goes again to 5a.
6. When all the pairs are exhausted and no improvement was made the system outputs the
conjectures C, the accepted lemmas La and their best accepted sets of assumptions SCbest ,
C ∈ C ∪ La , as described above, and halts.
Empirically Successful Computerized Reasoning 41
It may happen that as the result of adding l, some of the assumptions in SCbest are not needed
any more and do not appear in the proof. In such a case the system omits them and tries to prove
the conjecture only using this reduced set of assumptions. If the proof attempt succeeds, it
is evaluated the same way as described above. This practice helps to reduce the number of
assumptions appearing in the proofs. Without it, the prover would soon become overwhelmed
by the number of assumptions and no further improvement would be possible. In the current
version of the system this check is not iterated, so the proof conducted from the reduced set of
assumptions is not checked again for redundant assumptions.
6 Experimental results
The system is still under development, therefore we don’t have an in-depth statistics of its
behavior. However, we have performed tests on different sets of conjectures from different
sources and we present a selection of them.
All experiments were run on a Linux machine with Intel® Pentium® 4 CPU 3.4GHz with
2GB of RAM. The conjectures were proved using E prover with resource limits set to 80 sec-
onds, 512MB RAM and 100000 processed clauses.
We have developed an independent server component that lies between the actual system
and the automated prover. The component caches all conducted proofs on a hard-disk and if a
client requests the server to perform a proof that has already been processed, the server returns
the cached version. This greatly speeds up the development of the system, particularly if we
rerun the system many times with different settings or with slight modifications on the same set
of problems.
For some cases we include a full list of the conjectures and the lemmas, for others we only
summarize the results.
6.1 TPTP problems – set theory
We used several selected TPTP [SS98] problems concerning set theory. These problems orig-
inate from [Qua92]. For presentation in this paper we have converted the machine syntax into
a more human-readable form. The meaning of the symbols that we use is summarized in Ta-
ble 2. The conjectures that were proved are shown in Table 3. The table also shows the results
obtained when running the system with the proofs being measured by the number of processed
clauses. The second column labeled “level” is an inductively defined property defined for both
the conjectures and the lemmas. The axioms have level 0. If a conjecture or a lemma is proved
just from the axioms, it has level 1. Each lemma or conjecture has its level set to the maximum
level of its assumptions plus 1. This can give us an approximation on how complicated the
conjecture or the lemma is. The third column labeled || · ||i shows the initial cost of the proof
of the conjecture while the fourth column labeled || · || f shows the final cost of the proof of the
conjecture when the system terminates. The fifth column shows the actual formula converted
into a human-readable form. The last column shows which lemmas were finally used to prove
the conjecture.
Some of the conjectures in the listings may have levels greater than 1 although they don’t
use any lemmas or the level of a particular conjecture is higher than one plus the level of the
lemmas that are used to prove it. The reason is that the user has specified that some other
42 Empirically Successful Computerized Reasoning
A⊆B subclass
{A, B} unordered pair
hA, Bi ordered pair
{A} singleton
A[1] first member of an ordered pair A
A[2] second member of an ordered pair A
A∈B membership
A class complement
A×B cross product
0/ null class
A∩B intersection
U universal class
member of(A) choice function (from the axiom of choice)
Table 2: Meaning of the symbols used in the TPTP set theory sample
conjectures should be used as assumptions to prove the conjecture and they increase the level of
the conjecture.
We can see that often the conjectures that were harder to prove with respect to the proof
measure have a higher level. This means that they were finally proved using several layers of
lemmas.
Several conjectures that were too complicated are omitted for brevity.
The lemmas that the system generated with the processed clauses count proof measure are
in Table 4. The lemmas are clauses, but for the presentation we have converted them into a more
readable form using implication notation. As the lemmas have no initial set of axioms given by
the user, they also have no initial cost, so only their final cost is shown.
They are sorted by their total accumulated improvement with respect to the proof measure,
the more useful lemmas are at the beginning of the table. This is however only an informative
ordering, as it depends very much on the order in which the lemmas were tried. For example,
consider some two almost same lemmas l1 and l2 . Whichever is chosen second will bring no
improvement as the assumption sets were already improved by the one chosen first.
The lemmas are more or less complicated formulas that the system found useful for proving
the conjectures. This can give us an interesting comparison, as the input conjectures were
selected by a human, whereas the lemmas were constructed by a machine. Tables 5 and 6 show
the results on the same set of conjectures when different proof measures were selected. Many
of the lemmas appear in all three tables, although in different positions. Such lemmas seem to
be essential for the automated prover when working with this particular theory.
Note that some of the lemmas the system has found are just conjectures reformulated as
clauses. In such a case the system usually proves such conjecture using the lemma in a single
step and then further improves the proof of the lemma.
Table 7 shows the summary of achieved results. For each proof measure the initial and the
final cost of the proofs is shown.
Empirically Successful Computerized Reasoning 43
no. level || · ||i || · || f formula lemmas
C1 1 1 1 ∀X : X = X
C2 1 5 2 ∀X : (X ⊆ X)
C3 1 2 2 ∃X : ∀Z : ¬(Z ∈ X)
C4 2 6 2 ∀X : (0/ ⊆ X) L18
C5 2 16 9 (0/ ∈ U ) L5
C6 2 4 3 ∀X,Y : ((Y ∈ {X}) ⇒ Y = X) L14
C7 1 3 3 ∀Z : (Z = 0/ ∨ ∃Y : (Y ∈ Z))
C8 1 2 2 ∀X : ({X} ∈ U )
C9 2 11 4 ∀X : ((X ⊆ 0) / ⇒ X = 0) / L18 L11
C10 2 8 2 ∀X,Y : ({X} ∈ hX,Y i)
C11 3 19 4 ∀X,Y : ((X ∈ Y ) ⇒ ({X} ⊆ Y )) L1
C12 1 9 9 ∀X,Y : ¬(Y ∈ (X ∩ X))
C13 1 2 2 ∀X,Y : (hX,Y i ∈ U )
C14 1 11 11 ∀X,Y, Z : (((X ⊆ Y ) ∧ (Y ⊆ Z)) ⇒ (X ⊆ Z))
C15 2 7 3 ∀X : ((X ∈ U ) ⇒ (X ∈ {X})) L12
C16 1 22 21 ∀X,Y : ({X, X} ⊆ {X,Y })
C17 2 9 4 ∀X : ((X ∈ U ) ⇒ {X} 6= 0) / L12
C18 2 21 3 ∀X : (¬(X ∈ U ) ⇒ {X} = 0) / L8
C19 1 3 3 ∀X : ({member o f (X)} = X ⇒ (X ∈ U ))
C20 2 7 2 ∀X,Y : ({X, {Y }} ∈ hX,Y i)
C21 1 10 10 ∀X,Y : (({member o f (X)} = X ∧ (Y ∈ X)) ⇒ member o f (X) = Y )
C22 6 87 25 ∀X,Y : ((X ⊆ {Y }) ⇒ (X = 0/ ∨ {Y } = X)) L16
C23 1 19 10 ∀X,Y : (({X} = {Y } ∧ (Y ∈ U )) ⇒ X = Y )
C24 3 15 10 ∀X,Y : (({X} = {Y } ∧ (X ∈ U )) ⇒ X = Y ) L12 L17
C25 2 12 4 ∀X,Y : ((X ∈ U ) ⇒ {X,Y } 6= 0) / L9
C26 2 9 4 ∀X,Y : ((Y ∈ U ) ⇒ {X,Y } 6= 0) / L12
C27 1 3 3 ∀X : (hX[1] , X[2] i = X ⇒ (X ∈ U ))
C28 6 610 7 ∀X,Y, Z : (((X ∈ Z) ∧ (Y ∈ Z)) ⇒ ({X,Y } ⊆ Z)) L1 L2
C30 8 706 213 ∀W, X,Y, Z : ((hW, Xi = hY, Zi ∧ (X ∈ U )) ⇒ X = Z) L3 L10 L5 L17
L9
C31 2 20 12 ∀W, X,Y, Z : ((hW, Xi = hY, Zi ∧ (W ∈ U )) ⇒ W = Y ) L14 L9
C32 2 32 4 ∀X,Y : ((¬(X ∈ U ) ∧ ¬(Y ∈ U )) ⇒ {X,Y } = 0) / L8
C33 2 16 11 ∀X,Y, Z : (((Y ∈ U ) ∧ (Z ∈ U ) ∧ {X,Y } = {X, Z}) ⇒ Y = Z) L12 L14
C34 2 16 11 ∀X,Y, Z : (((X ∈ U ) ∧ (Y ∈ U ) ∧ {X, Z} = {Y, Z}) ⇒ X = Y ) L14 L9
C35 3 21 4 / = hX,Y i ∨ (Y ∈ U ))
∀X,Y : ({{X}, {X, 0}} L10
Table 3: Selected conjectures from the TPTP set theory sample with the proof cost measured by
the number of clauses processed by the prover.
no. level || · || formula lemmas
L1 4 32 (A ∈ C) ∧ (B ∈ C) ⇒ ({A, B} ⊆ C) L7
L2 5 4 (A ∈ C) ⇒ ({A, B} ⊆ C) ∨ (B ∈ C)
L3 7 5 (B ∈ D) ∧ (A ∈ C) ⇒ ({{A, A}, {A, {B, B}}} ∈ (C × D)) L6
L4 3 6 (A ∈ C) ∧ (A ∈ B) ⇒ (A ∈ (B ∩C)) L7
L5 1 8 (A ∈ B) ⇒ (A ∈ U )
L6 6 7 (B ∈ D) ∧ (A ∈ C) ⇒ (hA, Bi ∈ (C × D)) L7
L7 2 8 (B ∈ C) ⇒ (A ∈ U ) ∨ ({B, A} ⊆ C) L13 L5
L8 1 32 0/ = {A, B} ∨ (A ∈ U ) ∨ (B ∈ U )
L9 1 8 (A ∈ U ) ⇒ (A ∈ {A, B})
L10 2 3 0/ = {A, A} ∨ (A ∈ U ) L8
L11 1 6 (B ⊆ A) ∧ (A ⊆ B) ⇒ A = B
L12 1 7 (A ∈ U ) ⇒ (A ∈ {B, A})
L13 1 31 (A ∈ C) ⇒ ({A, B} ⊆ C) ∨ (B ∈ {A, B})
L14 1 5 (A ∈ {C, B}) ⇒ A = B ∨ A = C
L15 1 2 (A ⊆ U )
L16 5 4 (A ⊆ {B,C}) ∧ (C ∈ A) ∧ (B ∈ A) ⇒ A = {B,C}
L17 2 4 (A ∈ U ) ∧ A = B ⇒ (A ∈ {B,C}) L9
L18 1 6 (0/ ⊆ A)
Table 4: Lemmas found for the TPTP set theory sample with the proof cost measured by the
number of clauses processed by the prover.
44 Empirically Successful Computerized Reasoning
no. level || · || formula lemmas
L1 1 37 (A ∈ {C, B}) ⇒ A = B ∨ A = C
L2 2 64 0/ = {A, B} ∨ (A ∈ U ) ∨ (B ∈ U ) L1
L3 2 43 (A ∈ C) ∧ (B ∈ C) ⇒ ({A, B} ⊆ C) L1
L4 1 24 (A ∈ U ) ⇒ (A ∈ {B, A})
L5 1 24 (A ∈ U ) ⇒ (A ∈ {A, B})
L6 3 28 0/ = {A, A} ∨ (A ∈ {A, A}) L2 L4
L7 3 44 (A ∈ U ) ⇒ (A ∈ B) ∨ (A ∈ B) L2
L8 1 34 (A ∈ (B ∩C)) ∧ (A ∈ C) ⇒ false
L9 1 23 {A} = {A, A}
L10 1 21 (A ∈ B) ⇒ (A ∈ U )
L11 1 11 / ⇒ false
(A ∈ 0)
L12 1 18 ({A, B} ∈ U )
L13 2 75 hA, Bi = {{A}, {A, {B}}} L9
L14 1 31 ((A ∩ B) ⊆ B)
L15 2 42 (A ∈ U ) ∨ ({A, A} ⊆ B) L1 L10
Table 5: Lemmas found for the TPTP set theory sample with the proof cost measured by the
proof size.
no. level || · || formula lemmas
L1 2 19 0/ = {A, B} ∨ (A ∈ U ) ∨ (B ∈ U ) L2
L2 1 15 (A ∈ {C, B}) ⇒ A = B ∨ A = C
L3 3 16 (A ∈ U ) ⇒ (A ∈ B) ∨ (A ∈ B) L1
L4 4 20 (A ∈ B) ⇒ ({A, A} ⊆ B) L2
L5 1 8 {A} = {A, A}
L6 1 11 (A ∈ U ) ⇒ (A ∈ {A, B})
L7 1 11 (A ∈ U ) ⇒ (A ∈ {B, A})
L8 1 10 / ⇒ false
(A ∈ 0)
L9 1 16 (B ⊆ A) ∧ (A ⊆ B) ⇒ A = B
L10 2 10 hA, Bi = {{A}, {A, {B}}} L5
L11 1 9 ({A, B} ∈ U )
L12 1 15 (A ∈ C) ∧ (A ∈ B) ⇒ (A ∈ (B ∩C))
L13 1 16 (A ∈ B) ⇒ (A ∈ U )
Table 6: Lemmas found for the TPTP set theory sample with the proof cost measured by the
proof length.
proof measure initial cost final cost
number of processed clauses 3991 1921 (48%)
proof size 3487 2794 (80%)
proof length 1086 878 (81%)
Table 7: Results for the TPTP set theory sample.
Empirically Successful Computerized Reasoning 45
6.2 Mizar problems – Boolean properties of sets
These theorems address basic boolean properties of sets in the Mizar database for mathematics.
The conjectures were converted from the Mizar language by Josef Urban [Urb04, Urb03] into a
form suitable for automated theorem provers.
The system was rather effective for reducing the number of processed clauses required to
prove these set of conjectures. Table 8 shows the summary of achieved results. For each proof
measure the initial and the final cost of the proofs is shown.
proof measure initial cost final cost
number of processed clauses 62706 1852 (3.0%)
proof size 10680 9351 (88%)
proof length 3687 3046 (83%)
Table 8: Results for the Mizar boolean properties of sets.
As the list of the conjectures and the lemmas in this case is rather long, we did not include
it in this paper.
6.3 Meredith’s axiomatization of propositional logic
In this section we give detailed results for the example in section 4. The formulas are presented
using standard logic symbols.
Table 9 shows the conjectures along with the results obtained when running the system with
the proofs being measured by the number of processed clauses and Table 10 shows the lemmas
that were found.
no. level || · ||i || · || f formula lemmas
C1 8 15 2 ⊢ (A → A) L27
C2 11 3682 2 ⊢ (A → ¬¬A)
C3 7 3297 2 ⊢ (¬¬A → A)
C4 2 17 2 ⊢ (A → (B → A))
C5 9 5731 2 ⊢ (¬A → (A → B))
C6 8 41 5 ¬(A → A) ⊢ B L27
C7 5 31 3 A ⊢ (¬A → B)
C8 3 13 3 B ⊢ (¬A → B) L18
C9 11 1673 16 ¬(A → ¬B) ⊢ B L22 L8 L4
C10 7 938 62 ¬(A → ¬B) ⊢ A L17 L9 L16
L30 L18 L4
C11 12 287 20 (P → ¬(Q → ¬R)), P ⊢ R L21 L30 L8 L4
Table 9: Conjectures from the Meredith’s axiomatization example with the proof cost measured
by the number of clauses processed by the prover.
Because we code propositional formulas into terms of predicate logic, we can express much
more than just that a propositional formula is a theorem. We can also express meta-theorems
that speak about provability of different formulas and what are the relations between them. For
example, recall how that modus ponens rule was coded as (t(A)∧t(i(A, B))) ⇒ t(B). The system
derived many lemmas of similar nature, thus discovering many admissible rules. This fact
becomes much more interesting in the case of modal logic, described in the next section, where
the deduction theorem does not hold, hence the admissible rules have much greater importance.
46 Empirically Successful Computerized Reasoning
no. level || · || formula lemmas
L1 6 7 C ⊢ (A → (¬¬B → B)) L14 L4
L2 6 24 D ⊢ (A → (B → (¬¬C → C))) L12 L4 L3
L3 3 8 (((D → B) → (E → B)) → (B → C)) ⊢ (A → (B → C)) L23 L4
L4 1 4 A, (A → B) ⊢ B
L5 3 5 A ⊢ ((A → B) → (C → B)) L18 L12
L6 8 6 ((((B → C) → (¬D → ¬A)) → D) → B) ⊢ (A → B) L27 L12
L7 7 4 ⊢ ((((¬A → ¬B) → A) → C) → (B → C)) L12
L8 8 6 C ⊢ (A → ((¬B → ¬A) → B))
L9 5 28 ((C → E) → (¬B → ¬D)) ⊢ (((A → B) → C) → (D → C)) L18 L4 L19
L10 8 7 (((¬C → ¬A) → C) → B) ⊢ (A → B) L7 L4
L11 8 4 ⊢ (((A → B) → ¬(¬B → ¬C)) → (C → ¬(¬B → ¬C))) L7 L12
L12 1 6 ((((B → D) → (¬E → ¬C)) → E) → A) ⊢ ((A → B) → (C → B))
L13 6 13 (((B → C) → D) → (¬C → ¬A)) ⊢ (A → (B → C)) L9 L12 L4
L14 5 4 ⊢ (A → (B → (¬¬C → C))) L2
L15 10 4 B ⊢ (A → ¬¬A)
L16 2 7 D, C ⊢ (A → (B → C)) L18
L17 5 7 ((C → C) → B) ⊢ (A → B) L31
L18 2 6 B ⊢ (A → B)
L19 4 7 C, A ⊢ (¬A → B)
L20 6 6 ⊢ (((A → (B → B)) → C) → (D → C)) L17 L31 L12
L21 11 4 B ⊢ (A → ¬¬A) L15
L22 6 14 ((C → D) → B), (¬D → ¬A) ⊢ (A → B) L9 L18 L4
L23 2 4 ⊢ ((((A → B) → (C → B)) → (B → D)) → (E → (B → D))) L12
L24 6 4 ⊢ (((A → ¬¬B) → C) → (B → C)) L12
L25 3 4 ⊢ (((A → (¬B → C)) → D) → (B → D)) L23 L12
L26 3 11 (D → C), D ⊢ (A → (B → C)) L18 L4
L27 7 4 ⊢ (A → A) L32
L28 1 6 C ⊢ (A → (B → A))
L29 7 3 C ⊢ (A → (B → B))
L30 6 5 (¬B → ¬D) ⊢ (((A → B) → C) → (D → C)) L9 L18
L31 4 4 ⊢ (((A → A) → B) → (C → B)) L25 L12
L32 6 4 ⊢ (A → (B → (C → C))) L17 L31
L33 9 2 ⊢ (((A → B) → ¬¬C) → (C → ¬¬C)) L9 L13 L11
Table 10: Lemmas found for the Meredith’s axiomatization with the proof cost measured by the
number of clauses processed by the prover.
proof measure initial cost final cost
number of processed clauses 15725 510 (3.2%)
proof size 1299 1024 (79%)
proof length 314 277 (88%)
Table 11: Results for the Meredith’s axiomatization of propositional logic.
Empirically Successful Computerized Reasoning 47
Table 11 shows the summary of achieved results. Again, for each proof measure the initial
and the final cost of the proofs is shown.
6.4 S5 modal logic
S5 modal logic uses meta-theorems (mentioned above), since the theorem of deduction doesn’t
hold in S5. From this point of view S5 is an interesting example.
This set of conjectures is similar to the previous example. We have used [Hal05] to construct
an axiomatization for S5 modal logic with the three Hilbert’s axioms for propositional logic,
axioms K, T and 5 and modus ponens and necessitation rule. We have used the same formula
coding with an additional unary function symbol l(. . .) for the modal operator 2.
120000 45
proof cost
no. of accepted lemmas 40
100000
35
80000 30
25
60000
20
40000 15
10
20000
5
0 0
0 20 40 60 80 100 120 140
Number of proved (not necessarily accepted) lemmas
Figure 1: Run of the system on the S5 axiomatization with with the proof cost measured by the
number of clauses processed by the prover.
For this example we also show a graph that illustrates performance of the system, see Fig-
ure 1. The x axis shows time points distinguished by the total number of lemmas (both accepted
and unaccepted) the system has used at least in one proof. The thick line shows how the total
cost of the proofs evolved and corresponds to the tick marks on the left. The thin line shows the
number of lemmas that were marked as accepted and corresponds to the tick marks on the right.
As we can see, the cost of the proofs was reduced to about 1/3 with the first 10 lemmas. The
cost of the proofs then gradually decreased and the lemmas that were accepted often brought
only a slight gain. There were two more significant improvements at the points 40, 81 and 97,
when interesting lemmas were discovered and sudden advancements were made.
Most of the the lemmas that were discovered say that a particular proposition is a theorem
of S5. But the system also discovered several lemmas that describe admissible rules of S5. For
example the lemma L40 in Table 15 states that from 2B and B → A we can derive 2A.
Table 16 shows the summary of achieved results. For each proof measure the initial and the
final cost of the proofs is shown.
48 Empirically Successful Computerized Reasoning
no. level || · ||i || · || f formula lemmas
C1 6 11 2 ⊢ (A → A) L19
C2 1 2 2 ⊢ (2P → P)
C3 9 1086 2 ⊢ (A → ¬¬A)
C4 2 5 3 ⊢ A ∨ ¬2A L43
C5 1 3 3 ⊢ ¬A ∨ 2A
C6 9 1089 2 ⊢ (¬¬A → A)
C7 7 3586 2 ⊢ ((¬A → A) → A)
C8 7 44 2 ⊢ (¬A → (A → B)) L25
C9 8 680 15 ⊢ 2(A → ¬2¬A) L37 L2 L36
L33
C10 5 15332 72 ⊢ (P → 2¬2¬P) L17 L2 L1
C11 6 43 5 ¬(A → A) ⊢ B L19
C12 8 58 3 A ⊢ (¬A → B)
C13 2 5 3 B ⊢ (¬A → B) L10
C14 6 90 10 ¬(A → ¬B) ⊢ B
C15 7 1293 6 ¬(A → ¬B) ⊢ A L20 L23
C16 8 30413 2 ⊢ ((A → B) → (¬B → ¬A))
C17 9 23254 3 ⊢ (A → (¬B → ¬(A → B))) L19
C18 10 9580 4 A, B ⊢ ¬(A → ¬B) L14
C19 1 3 3 ⊢ 2(¬2¬P → 2¬2¬P)
C20 7 5471 171 ¬(B → ¬A) ⊢ ¬(A → ¬B) L28 L2 L13
L22 L1 L42 L9
L24
C21 1 8 6 2(A → B), 2(B → A) ⊢ 2(2A → 2B)
C22 10 23271 26 ¬(A → ¬¬(B → ¬C)) ⊢ ¬(¬(A → ¬B) → ¬C) L20 L23 L14
Table 12: Conjectures in the S5 axiomatization, prover processed clauses count measure
no. level || · || formula lemmas
L1 2 59 (A → C), (A → (C → B)) ⊢ (A → B) L4
L2 3 40 (A → C), (C → B) ⊢ (A → B) L1
L3 3 40 C, (A → (C → B)) ⊢ (A → B) L1
L4 1 39 22B, (B → A) ⊢ 2A
L5 5 25 ⊢ (¬¬A → A) L3
L6 6 44 (¬B → A) ⊢ (¬A → B)
L7 4 59 (A → (¬C → ¬B)) ⊢ (A → (B → C)) L2
L8 1 33 (¬A → ¬B), B ⊢ A
Table 13: Lemmas found for the S5 axiomatization with the proof cost measured by the proof
size.
no. level || · || formula lemmas
L1 1 13 (A → C), (A → (C → B)) ⊢ (A → B)
L2 2 12 (A → C), (C → B) ⊢ (A → B)
L3 2 13 C, (A → (C → B)) ⊢ (A → B) L1
L4 4 10 (¬B → A) ⊢ (¬A → B) L7
L5 3 8 ⊢ (¬¬A → A)
L6 1 13 22B, (B → A) ⊢ 2A
L7 1 10 (¬B → ¬A) ⊢ (A → B)
Table 14: Lemmas found for the S5 axiomatization with the proof cost measured by the proof
length.
Empirically Successful Computerized Reasoning 49
no. level || · || formula lemmas
L1 3 8 (A → C), (A → (C → B)) ⊢ (A → B) L32
L2 4 11 (A → C), (C → B) ⊢ (A → B)
L3 1 9 (B → (C → D)) ⊢ (A → ((B → C) → (B → D)))
L4 6 8 C, (C → B) ⊢ (A → B) L24
L5 2 11 22B, (B → A) ⊢ 22A L43
L6 8 5 A ⊢ (¬¬(A → B) → B) L13
L7 7 247 ⊢ (¬A → ((B → A) → ¬B)) L8 L25 L1 L15
L8 5 30 ((C → A) → ((C → B) → D)) ⊢ ((A → B) → ((C → A) → D)) L3 L31 L1
L9 5 11 (A → (B → D)), (B → (D → C)) ⊢ (A → (B → C)) L2 L32
L10 1 6 B ⊢ (A → B)
L11 3 21 ((C → A) → B) ⊢ (A → B) L10 L32 L15
L12 6 10 D, (D → C) ⊢ (A → (B → C)) L10
L13 4 8 (A → (C → B)), C ⊢ (A → B) L10 L1
L14 9 6 B, A ⊢ ¬(A → ¬B) L20 L6
L15 1 4 B, (B → A) ⊢ A
L16 1 5 A, B ⊢ 22A
L17 4 5 (B → (A → C)) ⊢ (A → (B → C)) L11 L32
L18 6 5 ((A → B) → A) ⊢ ((A → B) → B) L19 L1
L19 5 4 ⊢ (A → A)
L20 2 7 (¬A → ¬B), B ⊢ A L37 L15
L21 6 7 ((C → C) → B) ⊢ (A → B) L19
L22 5 9 (A → (C → B)), ((C → B) → C) ⊢ (A → B) L2 L1
L23 6 11 ¬(A → C) ⊢ (¬A → B) L29 L24
L24 5 10 (A → C), ¬C ⊢ (A → B) L28 L2
L25 6 4 ⊢ (¬A → (A → B)) L29
L26 6 32 (C → (A → D)) ⊢ (A → (B → (C → D))) L10 L9
L27 2 7 2B, (B → A) ⊢ A L43
L28 1 4 ¬A ⊢ (A → B)
L29 5 6 (A → (¬C → ¬B)) ⊢ (A → (B → C)) L2
L30 6 5 (¬A → (B → ¬C)) ⊢ ((¬A → B) → (C → A)) L29 L32
L31 4 4 ((A → (B → C)) → (((A → B) → (A → C)) → D)) ⊢ ((A → (B → C)) → D) L1
L32 2 7 (A → (B → C)) ⊢ ((A → B) → (A → C)) L15
L33 7 4 ⊢ (¬¬A → (B → A)) L29 L25
L34 8 4 ⊢ (A → (¬¬B → B)) L17 L33
L35 1 9 (¬C → ¬B) ⊢ (A → (B → C))
L36 4 4 (A → ((C → A) → B)) ⊢ (A → B) L1
L37 1 6 (¬B → ¬A) ⊢ (A → B)
L38 8 6 (A → (¬B → (C → B))) ⊢ (A → (¬B → ¬C)) L7 L9
L39 7 6 (¬A → A) ⊢ (¬A → B) L25 L1
L40 2 9 2B, (B → A) ⊢ 2A L43
L41 8 6 ⊢ (A → (B → ¬¬B)) L17 L29 L33
L42 6 11 (A → D), (D → C) ⊢ (A → (B → C)) L2 L24
L43 1 5 2A ⊢ A
Table 15: Lemmas found for the S5 axiomatization with the proof cost measured by the number
of clauses processed by the prover.
proof measure initial cost final cost
number of processed clauses 115327 2091 (1.8%)
proof size 632 379 (60%)
proof length 642 389 (61%)
Table 16: Results for S5 modal logic.
50 Empirically Successful Computerized Reasoning
The system again performed very well in the case when the measure was the number of
processed clauses. This time, the total cost of the proofs was reduced to less than 2%.
7 Future work
As the system is particularly efficient in speeding up the prover, we believe that it could be
modified to search for lemmas that would make it possible to prove conjectures that the prover
alone wasn’t able to prove. This will require a change of strategy, because currently the system
looks primarily for lemmas that improve already existing proofs of the conjectures and therefore
are not general enough to prove some new unknown conjecture.
We would also like to investigate the nature of the lemmas that help to improve particular
proof measures in order to develop a better strategy for their evaluation.
Finally, we plan to perform a in-depth testing of the system on various sets of conjectures
from different sources.
8 Conclusion
Given a related set of conjectures, it is possible to automatically construct lemmas that can
significantly reduce the cost of the proofs of the conjectures. The results are summarized in
Table 17 for convenience.
processed clauses proof size proof length
set of conjectures || · ||i || · || f || · ||i || · || f || · ||i || · || f
TPTP set theory 3991 1921 (48%) 3487 2794 (80%) 1086 878 (81%)
Mizar set properties 62706 1852 (3.0%) 10680 9351 (88%) 3687 3046 (83%)
Meredith’s axiomatization 15725 510 (3.2%) 1299 1024 (79%) 314 277 (88%)
S5 modal logic 115327 2091 (1.8%) 632 379 (60%) 642 389 (61%)
Table 17: Summary of the results of the system on the presented sets of conjectures.
The system that we have developed performs well on different sets of conjectures, particu-
larly if the cost of the proofs of the conjectures is measured in the number of clauses processed
by the prover. The system can also improve the size and/or the length of the proofs, although it
is not as effective in these cases.
References
[AS92] Owen L. Astrachan and Mark E. Stickel. Caching and lemmaizing in model elim-
ination theorem provers. In Deepak Kapur, editor, CADE, volume 607 of Lecture
Notes in Computer Science, pages 224–238. Springer, 1992.
[Col02a] Simon Colton. Automated Theory Formation in Pure Mathematics. Distinguished
Dissertations. Springer, 2002.
Empirically Successful Computerized Reasoning 51
[Col02b] Simon Colton. The HR program for theorem generation. In Andrei Voronkov,
editor, CADE, volume 2392 of Lecture Notes in Computer Science, pages 285–289.
Springer, 2002.
[CP80] P. T. Cox and T. Pietrzykowski. A complete, nonredundant algorithm for reversed
skolemization, volume 87 of Lecture Notes in Computer Science. Springer, May
1980.
[CP93] Ritu Chadha and David Plaisted. Finding logical consequences using unskolemiza-
tion, volume 689 of Lecture Notes in Computer Science. Springer, May 1993.
[FFF99] Marc Fuchs, Dirk Fuchs, and Matthias Fuchs. Generating lemmas for tableau-
based proof search using genetic programming. In Wolfgang Banzhaf, Jason Daida,
Agoston E. Eiben, Max H. Garzon, Vasant Honavar, Mark Jakiela, and Robert E.
Smith, editors, Proceedings of the Genetic and Evolutionary Computation Con-
ference, volume 2, pages 1027–1032, Orlando, Florida, USA, July 1999. Morgan
Kaufmann.
[Hal05] John Halleck. Logic systems, 2005. http://www.cc.utah.edu/˜nahaj/logic/structures/.
[McC94] W. W. McCune. OTTER 3.0 reference manual and guide. Technical Report ANL-
94/6, Argonne National Laboratory, Argonne, Illinois, 1994.
[MVF+ 02] William McCune, Robert Veroff, Branden Fitelson, Kenneth Harris, Andrew Feist,
and Larry Wos. Short single axioms for boolean algebra. J. Autom. Reasoning,
29(1):1–16, 2002.
[Qua92] Art Quaife. Automated Development of Fundamental Mathematical Theories.
Kluwer Academic Publishers, 1992.
[Sch02] S. Schulz. E – A brainiac theorem prover. Journal of AI Communications, 15(2-
3):111–126, 2002.
[SGC03] G. Sutcliffe, Y. Gao, and S. Colton. A Grand Challenge of Theorem Discovery. In
J. Gow, T. Walsh, S. Colton, and V. Sorge, editors, Proceedings of the Workshop on
Challenges and Novel Applications for Automated Reasoning, 19th International
Conference on Automated Reasoning, pages 1–11, 2003.
[SS98] G. Sutcliffe and C. B. Suttner. The TPTP Problem Library: CNF Release v1.2.1.
Journal of Automated Reasoning, 21(2):177–203, 1998.
[Urb03] Josef Urban. Translating Mizar for first order theorem provers. In MKM, volume
2594 of Lecture Notes in Computer Science, pages 203–215. Springer, 2003.
[Urb04] Josef Urban. MPTP - motivation, implementation, first experiments. Journal of
Automated Reasoning, 33(3-4):319–339, 2004.
[WP03] Larry Wos and Gail W. Pieper. Automated Reasoning and the Discovery of Missing
and Elegant Proofs. Rinton Press, 2003.
52 Empirically Successful Computerized Reasoning