=Paper=
{{Paper
|id=Vol-192/paper-3
|storemode=property
|title=Search for faster and shorter proofs using machine generated lemmas
|pdfUrl=https://ceur-ws.org/Vol-192/paper03.pdf
|volume=Vol-192
}}
==Search for faster and shorter proofs using machine generated lemmas==
<pdf width="1500px">https://ceur-ws.org/Vol-192/paper03.pdf</pdf>
<pre>
                         Search for faster and shorter proofs
                          using machine generated lemmas
                                            Petr Pudlák
                                    Charles University in Prague
                                pudlak@artax.karlin.mff.cuni.cz


                                                   Abstract
               When we have a set of conjectures formulated in a common language and proved from
           a common set of axioms using an automated theorem prover, it is often possible to automat-
           ically construct lemmas that can be used to prove the conjectures in a shorter time and/or
           with shorter proofs.
               We have implemented a system that repeatedly tries to improve the set of assumptions
           for proofs of given conjectures using lemmas that it extracts from the proofs constructed
           by an automated theorem prover. In many cases it can significantly reduce the total time or
           the overall sum of the lengths of the proofs of the conjectures.
               We present several examples of such sets of conjectures and show the improvements
           gained by the system.


     1    Motivation
     Imagine a professor of mathematics who gives the same lectures every year. She works in a
     standard theory and she has a fixed set of theorems that she wants to present and prove every
     year during the course. And because she wants to save her and her students’ time, she would like
     to have as efficient proofs as possible. She would like to find such lemmas that would shorten
     the total time she has to spend for presenting the proofs.
         Our aim will be to search for lemmas in such environment and try to use them to make the
     proofs of conjectures more efficient.
         Finding useful lemmas has of course much more serious applications, namely reducing the
     amount of resources necessary to prove a particular set of conjectures or to prove conjectures
     that could not be proved without such lemmas.
         Currently the system only restructures and compacts the proofs and doesn’t handle conjec-
     tures that the prover was not able to prove at the beginning. It can only improve existing proofs.
     In future we hope to be able to also search for proofs of the conjectures that couldn’t be proved
     from the initial set of assumptions.


     2    Previous research
     The idea of an automated discovery of lemmas or theorems is not a new one. There were many
     different approaches to solve this task.


34                                                                       Empirically Successful Computerized Reasoning
          Owen L. Astrachan and Mark E. Stickel [AS92] used the idea of reusing lemmas to speed
     up a model elimination theorem prover.
          Art Quaife [Qua92] used Otter [McC94] to prove many fundamental mathematical theo-
     rems. He included the theorems he had already proved as assumptions for the more complicated
     ones. The sequence in which the theorems were proved was determined by Quaife, based on his
     mathematical knowledge.
          Marc Fuchs, Dirk Fuchs and Matthias Fuchs sought for lemmas using genetic programming
     to improve tableau-based proof search [FFF99].
          The HR system [Col02a], named after mathematicians Hardy and Ramanujan, uses a model
     generator to construct models based on a set of axioms, attempts to formulate conjectures and
     then prove them using an automated theorem prover. A brief description can be also found in
     [Col02b].
          Larry Wos and Gail W. Pieper describe the technique of lemma adjunction in [WP03] and
     also discuss many different approaches for evaluating lemmas.
          Article [SGC03] summarizes many different criteria for constructing and identifying quality
     lemmas. Humans generally use the inductive approach – from many similar problems they try
     to induce a more general conjecture. This approach requires good knowledge of the particu-
     lar field of mathematic and also good mathematical intuition. Another approach described is
     generative, when more sophisticated techniques (for example syntactic manipulation) are used
     to construct new conjectures. An example of such a system is the HR [Col02b] program. The
     manipulative approach tries to construct interesting conjectures from already existing theorems.
     Finally, the deductive approach tries to automatically construct many logical consequences from
     a set of axioms using an automated theorem prover and then filter them and pick those that are
     interesting for the researcher.
          The authors of [SGC03] also categorize different possible filters for interestingness. The
     filters include non-obviousness, novelty, suprisingness, intensity and usefulness.
          Our approach is somewhat different from those mentioned above. It lies in between the
     manipulative and the deductive approach. We observe proofs of all the given problems and
     identify lemmas that are common to many of the proofs. The filter we develop and use for
     selecting good lemmas falls into the usefulness category - our measure is, how much each lemma
     could contribute to the proofs of other conjectures. Unlike other measures, this one can easily
     be evaluated by comparing different proofs conducted by the prover.
          The lemmas that we produce are therefore interesting from the point of view of a machine.
     Hence, this can give us an interesting comparison between human and machine opinions on the
     usefulness of a lemma.
          The ideas in the article [SGC03] and personal communication with many other researchers
     inspired our work which we present in this paper together with the empirical results we have
     obtained.


     3     Overview of the system
     In sequel we assume that a consistent theory is given with some (possibly infinite) set of axioms.
     All the formulas we work with are formulated in the language of the theory and the conjectures
     that are to be proved and the lemmas that are constructed are proved using axioms selected from
     the axioms of the theory.


Empirically Successful Computerized Reasoning                                                             35
         The systems starts by proving the conjectures one by one. The proofs of the conjectures that
     were successfully proved are then analyzed by the system. Formulas that appear multiple times
     in the proofs are then used as additional assumptions when looking for less costly proofs of the
     conjectures. The system tries to optimize the set of such formulas. We will call such formulas
     lemmas.

     3.1   Basic notions
     Let us first define some basic notions we will use throughout the text.
     Notation 1 (Axiom, conjecture, lemma, proof) By an axiom we understand either an axiom
     of the underlying theory or a well known theorem of the theory that we use as an assumption.
         A conjecture is a formula given on the input that is to be proved by the system. The system
     tries to find the most efficient proof of the conjecture using different sets of assumptions. Each
     such a set can contain some of the axioms, other conjectures or lemmas.
         A lemma is a formula constructed by the system that is proved in a similar fashion as the
     conjectures are, and is used to improve the proofs of the conjectures or the proofs of other
     lemmas.
         A proof is an output of a successful run of the prover. As we use a single theorem prover
     with the same settings on every run, the proof only depends on the conjecture being proved and
     on the set of assumptions being used.
         The aim of the system is to find such lemmas and such sets of assumptions for the conjec-
     tures and the lemmas that either the total time required to prove the conjectures and the lemmas
     or the overall size or length of the proofs is minimized.

     3.2   System input
     Initially, the system is given a set C of conjectures and for each conjecture C from C the system
     is given an initial set of assumptions AC . The set AC consists of a subset of axioms of the
     underlying theory and (possibly) of some other conjectures from C.
          The idea is that the proofs of conjectures in C need not use the set of all the axioms of the
     theory (which might be infinite) and that some other conjectures from C might be useful. The
     user may have his own idea, which conjectures to add as assumptions to AC .
          In the process of computation the system tries to optimize the set of lemmas and for each
     lemma the set of its assumptions.

     3.3   System output
     At the end the system outputs the conjectures given at the input along with the lemmas that
     participate in the fastest/shortest set of proofs. For each conjecture and each lemma it outputs
     the optimal set of assumptions it has found, in the sense described later.


     4     An example
     In our example we use a commonly used encoding to encode an axiomatization of propositional
     logic into terms, thus forcing the prover to use only the specified axioms and rules for inferences.
     Then we use the system to prove some basic propositional theorems.


36                                                                     Empirically Successful Computerized Reasoning
         Note that the automated theorem provers were already used to look for interesting axiomat-
     ics of propositional logic, for example [MVF+ 02]. However, our aim is not to look for a new or
     otherwise interesting axiomatic, we use the system to optimize the proofs of several theorems.
         We will denote the code of a formula by an over-line. For example for a propositional
     formula ϕ we denote the term that codes it by ϕ. Negation is coded by a unary function n and
     implication is coded by a binary function i. The fact that ϕ is a theorem ⊢ ϕ is coded by a unary
     predicate t. The following table summarizes the codes:
                                                ¬ϕ    n(ϕ)
                                                ϕ → ψ i(ϕ, ψ)
                                                ⊢ϕ    t(ϕ)
     We use “→” just for implication in propositional logic we are coding. We use “⇒” for im-
     plication in predicate logic in which the conjectures are presented and in which the prover
     actually works. Thus the statement “if ϕ is provable then ψ → χ is provable” would be coded
     as t(A) ⇒ t(i(B,C)).
          In order to increase the difficulty of the task we’ve used Meredith’s single axiom for propo-
     sitional logic. It has only a single axiom schema and a single rule.
          The axiom schema is defined for any formulas ϕ, ψ, χ, ξ and η, but the theorem prover
     computes with their codes, which are represented by the variables A, B, C, D and E in the
     language of the theorem prover. As these variables are universally quantified, the prover can
     replace each variable by any coded formula. The same applies to the coded modus ponens rule.
          The coded representations of the schema and the rule are:
              Meredith: ⊢ ((((ϕ → ψ) → (¬χ → ¬ξ)) → χ) → η) → ((η → ϕ) → (ξ → ϕ))
                        t(i(i(i(i(i(A, B), i(n(C), n(D))),C), E), i(i(E, A), i(D, A))))
                  MP: ϕ, ϕ → ψ ⊢ ψ
                        (t(A) ∧ t(i(A, B))) ⇒ t(B)
     This set of axioms was used as the initial set of assumptions for all the conjectures. No other
     dependencies between the conjectures were specified. The system was let to discover all the
     lemmas by itself from the proofs of the conjectures.
         Table 1 shows the conjectures whose proofs the system was improving. The conjectures
     were proved using E prover[Sch02]. The cost of the proofs was measured in the number of
     processed clauses. This measure closely corresponds to the time the prover spends on a problem,
     but it is not dependent on the hardware the prover runs on. In the second column the conjectures
     are presented in their coded form. The third column shows the number of processed clauses the
     prover spent on each of the conjectures if they were proved only using the original assumptions.
     The fourth column shows the cost of the proofs after the system had finished the improvements
     using lemmas generated from the proofs. Note that to obtain the total cost of the final proofs
     we also have to include the cost of the proofs of the generated lemmas. The full listing of the
     generated lemmas is included in Table 10. As we can see, the total cost of proofs needed to
     prove the conjectures in this case was reduced to 3%.


     5     Description of the system
     Let us first discuss how the lemmas are generated. The modification of the set of assumptions
     of the conjectures by using the lemmas will be described in the next part.


Empirically Successful Computerized Reasoning                                                             37
       conjecture                  coding into terms                                        original cost      final cost
       ¬(ϕ → ¬ψ) ⊢ ϕ               t(n(i(A, n(B)))) ⇒ t(A)                                           938              62
       ¬(ϕ → ¬ψ) ⊢ ψ               t(n(i(A, n(B)))) ⇒ t(B)                                          1673              17
       ϕ ⊢ ¬ϕ → ψ                  t(A) ⇒ t(i(n(A), B))                                               31               4
       ψ ⊢ ¬ϕ → ψ                  t(B) ⇒ t(i(n(A), B))                                               13               3
       ¬(ϕ → ϕ) ⊢ ψ                t(n(i(A, A))) ⇒ t(B)                                               41               8
       ⊢ ϕ → (ψ → ϕ)               t(i(A, i(B, A)))                                                   17               4
       ϕ → ¬(ψ → ¬χ), ϕ ⊢ χ (t(i(A, n(i(B, n(C))))) ∧ t(A)) ⇒ t(C)                                   287              20
       ⊢ϕ→ϕ                        t(i(A, A))                                                         15               2
       ⊢ ¬ϕ → (ϕ → ψ)              t(i(n(A), i(A, B))))                                             5731               4
       ⊢ ¬¬ϕ → ϕ                   t(i(n(n(A)), A))                                                 3297               7
       ⊢ ϕ → ¬¬ϕ                   t(i(A, n(n(A)))                                                  3682               4
       total cost of the conjectures:                                                             15725              135
       cost for proving the lemmas:                                                                                  375
       total cost:                                                                                  15725            510

                         Table 1: Formulas proved from the Meredith’s axiomatization.


     5.1    Generation and evaluation of lemmas
     Notation 2 (Subsumption) By c ⊑ d we denote that the clause c that subsumes the clause d.

     Let a set of proofs1 P be given. We collect all the clauses that appear in the proofs and that were
     derived only from the assumptions into a single set
             [
       L=          {c | c is a clause in P derived only from assumptions and without Skolem symbols}
             P∈P

         We do not include clauses that were derived from the negated conjecture, because they are
     not true formulas in our theory. Also we remove those clauses that contain Skolem symbols
     created by the prover, because these symbols have different meaning in different runs of the
     prover. (In future we might implement the reverse skolemization algorithm [CP80, CP93] to
     deal with such formulas.)
         Thus, the clauses in L don’t contain any skolem symbols, but they have no special form,
     they can contain arbitrary number of positive and/or negative literals.
         Note that although it would be an obvious thing to do, we don’t use the conjectures as lem-
     mas. The reason is that the conjectures are not necessarily clauses, therefore we can’t process
     them the same way as the clauses produced by the prover. As it turns out, many of the con-
     jectures are then anyway discovered as lemmas by the system. However, this issue deserves a
     better solution in the future.
         From the set L we construct a minimal L′ set with respect to subsumption such that L′ has
     the following properties:

        1. L′ ⊆ L, therefore L′ contains only true formulas.
        1 Recall that for us a proof is a successful run of the prover that proves a particular conjecture from a given set of

     assumptions.


38                                                                                   Empirically Successful Computerized Reasoning
         2. for every d ∈ L there is c ∈ L′ such that c ⊑ d.
         3. for any pair c ∈ L′ , d ∈ L′ we know that c doesn’t subsume d: c 6⊑ d.
     Therefore, L′ contains just the most general variants of the lemmas appearing in L.
         The clauses from L′ are then used as lemmas to modify the set of assumptions of the con-
     jectures.
         Now let us have a clause c that is a lemma, c = L1 , . . . , Lk , ¬Lk+1 , . . . , ¬Ln where Li are
     atomic formulas. Let |Li | be the number of function symbols appearing in Li . Let P by a proof
     whose set of assumptions we want to improve by c. We would like to have an estimate that
     would tell us, if it is likely that c will contribute to the proof P. A natural idea suggests itself
     that the more often an atomic formula Li occurs in the proof P the more likely the lemma will
     contribute and also that the longer Li is (measured in the number of symbols) the more likely it
     will shorten/speed up the proof. Therefore our estimation formula is defined by
                                                 n
                         weight(c, P) = ∑ |Li | · (the number of occurrences of Li in P)               (1)
                                                i=1

     There are many other possible estimation methods. One may, for example, take into account
     the number of formulas in the proof which the lemma subsumes, look for similar terms in the
     lemma and in the proof, etc.

     5.2    Evaluation of the proofs
     In order to evaluate the applicability of lemmas, we need to have a criterion for the cost of a
     proof:
     Notation 3 (Measure of a proof) A proof measure is a function that maps proofs into non-
     negative real numbers.
     We use the following proof measures:
     the number of processed clauses reported by the prover; this measure is closely related to the
           time the prover spends while searching for the proof of the conjecture, but is independent
           of the hardware of the computer the prover runs on;
     the length of the proof is the number of formulas appearing in the proof that is constructed by
           the prover;
     the size of the proof is the total number of occurrences of function symbols (not predicate
            symbols) appearing in the proof that is constructed by the prover.
     The length and the size of a proof are also independent on the hardware of the computer being
     used.
     Notation 4 If we are proving a conjecture C from assumptions A1 , . . . , An , we denote the mea-
     sure of the proof by
                                          ||A1 , . . . , An C||
     If the prover is not able to conduct the proof we set

                                                      ||A1 , . . . , An C|| = +∞


Empirically Successful Computerized Reasoning                                                                 39
     Remark 1 (The proof size/length) If we could prove all the given conjectures together, the
     size/length of the resulting hypothetical proof would be of course less than the sum of the
     sizes/lengths of the individual proofs of the conjectures. But in most cases the prover is not
     able to prove all the conjectures together, therefore the system proves the conjectures one by
     one. The system then tries to compact the proofs of the conjectures to make them closer to the
     size of the hypothetical proof.

          The number of generated lemmas is usually very large, so only some of them will be in-
     cluded in the output. Such lemmas are marked as accepted. Each lemma is initially unaccepted.
     When the system figures out that the lemma is worth including in the output, it marks it as
     accepted.
          For each conjecture or lemma C the system maintains a list SC of sets of assumptions that
     were used to produce different proofs of the conjecture.
          Each such a set S ∈ SC is marked as accepted iff all the lemmas it contains are accepted. The
     initial set of assumptions of each conjecture contains no lemmas, hence it is always accepted.
     Now, we may define:

     Notation 5 (Best accepted set of assumptions) The best accepted set of assumptions

                                                SCbest ∈ SC

     of a conjecture C is an accepted set of assumptions that produces the best proof of C with respect
     to the proof measure:

            ∀S : ((S ∈ SC ) ∧ (all the lemmas in S are accepted)) ⇒ ||S C|| ≥ ||SCbest C||              (2)

     If there are several sets of assumptions that match the criteria for the best accepted set (they
     produce proofs of the same measure), we arbitrarily choose one among them.
          Let us call the proof of C produced from SCbest the best accepted proof of C.

     Remark 2 (System output) The output of the system consists of all the input conjectures and
     all the accepted lemmas at the time the system finishes execution. For each of these conjectures
     or lemmas the best accepted set of assumptions is presented.

     5.3   Modifying the set of assumptions to get better ones
     The outline of the work of the system is as follows:

        1. The system first tries to prove all the given conjectures one by one. Let C be the set of
           those conjectures that were successfully proved. Let L be the set of constructed lemmas
           and let La ⊆ L be the set of lemmas that are accepted. Initially, these sets are empty.

        2. The system takes the best accepted proofs of all the conjectures and accepted lemmas.
           From those proofs the system constructs new lemmas as described in section 5.1. It sets

                                   L = L ∪ {the newly constructed lemmas}


40                                                                    Empirically Successful Computerized Reasoning
         3. The system produces pairs consisting of a best assumption set of a conjecture and of a
            lemma that will be used to improve the set of assumptions of the best proof of the con-
            jecture. It takes all the new lemmas together with the lemmas it already has constructed
            before and combines them with all the best accepted sets of assumptions of all the con-
            jectures and accepted lemmas. This way it produces every possible pair of

                                                L × {SCbest |C ∈ C ∪ La }

             However some pairs of a lemma l and the best set of assumptions SCbest of a conjecture C
             have to be excluded, namely

                 • if already l ∈ SCbest , or
                 • if the lemma l is the conjecture C itself, or
                 • if the conjecture C is directly or indirectly used to prove l; this means that either C
                   is one of the assumptions in the best accepted set of l, or it is one of the assumptions
                   in the best set of some conjecture that is an assumption of l, and so on.

             Otherwise it could happen for example that there would be two equivalent lemmas, one
             proving another with a one-step proof.

         4. These pairs are sorted according to the estimate described in section 5.1.

         5. The system subsequently takes these pairs (l, SCbest ) starting with the one with the highest
            weight:

              (a) It tries to conduct a new proof of C using SCbest ∪ {l}. The gain of this single im-
                  provement is
                                           g = ||SCbest C|| − ||SCbest , l C||

              (b) If g is positive, it means that the lemma l has brought an improvement. The lemma l
                  is then checked, if its total gain to all the conjectures whose set of assumptions were
                  improved by l is larger than the cost of the proof of l. If so, the system sets

                                                       La = La ∪ {l}

                  which means that l is marked as accepted.
                  If l is accepted, we want to incorporate the lemmas that originate from the proof of
                  l as well as the lemmas that originate from the proofs of the conjectures that use l
                  as an assumption. Hence, in such a case the process is restarted from the beginning
                  and the system goes again to point 2.
              (c) Until there are any unprocessed pairs (l, Sbest ) the system takes the next pair and
                                                                   C
                   goes again to 5a.

         6. When all the pairs are exhausted and no improvement was made the system outputs the
            conjectures C, the accepted lemmas La and their best accepted sets of assumptions SCbest ,
            C ∈ C ∪ La , as described above, and halts.


Empirically Successful Computerized Reasoning                                                                 41
         It may happen that as the result of adding l, some of the assumptions in SCbest are not needed
     any more and do not appear in the proof. In such a case the system omits them and tries to prove
     the conjecture only using this reduced set of assumptions. If the proof attempt succeeds, it
     is evaluated the same way as described above. This practice helps to reduce the number of
     assumptions appearing in the proofs. Without it, the prover would soon become overwhelmed
     by the number of assumptions and no further improvement would be possible. In the current
     version of the system this check is not iterated, so the proof conducted from the reduced set of
     assumptions is not checked again for redundant assumptions.


     6     Experimental results
     The system is still under development, therefore we don’t have an in-depth statistics of its
     behavior. However, we have performed tests on different sets of conjectures from different
     sources and we present a selection of them.
         All experiments were run on a Linux machine with Intel® Pentium® 4 CPU 3.4GHz with
     2GB of RAM. The conjectures were proved using E prover with resource limits set to 80 sec-
     onds, 512MB RAM and 100000 processed clauses.
         We have developed an independent server component that lies between the actual system
     and the automated prover. The component caches all conducted proofs on a hard-disk and if a
     client requests the server to perform a proof that has already been processed, the server returns
     the cached version. This greatly speeds up the development of the system, particularly if we
     rerun the system many times with different settings or with slight modifications on the same set
     of problems.
         For some cases we include a full list of the conjectures and the lemmas, for others we only
     summarize the results.

     6.1   TPTP problems – set theory
     We used several selected TPTP [SS98] problems concerning set theory. These problems orig-
     inate from [Qua92]. For presentation in this paper we have converted the machine syntax into
     a more human-readable form. The meaning of the symbols that we use is summarized in Ta-
     ble 2. The conjectures that were proved are shown in Table 3. The table also shows the results
     obtained when running the system with the proofs being measured by the number of processed
     clauses. The second column labeled “level” is an inductively defined property defined for both
     the conjectures and the lemmas. The axioms have level 0. If a conjecture or a lemma is proved
     just from the axioms, it has level 1. Each lemma or conjecture has its level set to the maximum
     level of its assumptions plus 1. This can give us an approximation on how complicated the
     conjecture or the lemma is. The third column labeled || · ||i shows the initial cost of the proof
     of the conjecture while the fourth column labeled || · || f shows the final cost of the proof of the
     conjecture when the system terminates. The fifth column shows the actual formula converted
     into a human-readable form. The last column shows which lemmas were finally used to prove
     the conjecture.
         Some of the conjectures in the listings may have levels greater than 1 although they don’t
     use any lemmas or the level of a particular conjecture is higher than one plus the level of the
     lemmas that are used to prove it. The reason is that the user has specified that some other


42                                                                     Empirically Successful Computerized Reasoning
                          A⊆B                   subclass
                          {A, B}                unordered pair
                          hA, Bi                ordered pair
                          {A}                   singleton
                          A[1]                  first member of an ordered pair A
                          A[2]                  second member of an ordered pair A
                          A∈B                   membership
                          A                     class complement
                          A×B                   cross product
                          0/                    null class
                          A∩B                   intersection
                          U                     universal class
                          member of(A)          choice function (from the axiom of choice)

                     Table 2: Meaning of the symbols used in the TPTP set theory sample


     conjectures should be used as assumptions to prove the conjecture and they increase the level of
     the conjecture.
         We can see that often the conjectures that were harder to prove with respect to the proof
     measure have a higher level. This means that they were finally proved using several layers of
     lemmas.
         Several conjectures that were too complicated are omitted for brevity.
         The lemmas that the system generated with the processed clauses count proof measure are
     in Table 4. The lemmas are clauses, but for the presentation we have converted them into a more
     readable form using implication notation. As the lemmas have no initial set of axioms given by
     the user, they also have no initial cost, so only their final cost is shown.
         They are sorted by their total accumulated improvement with respect to the proof measure,
     the more useful lemmas are at the beginning of the table. This is however only an informative
     ordering, as it depends very much on the order in which the lemmas were tried. For example,
     consider some two almost same lemmas l1 and l2 . Whichever is chosen second will bring no
     improvement as the assumption sets were already improved by the one chosen first.
         The lemmas are more or less complicated formulas that the system found useful for proving
     the conjectures. This can give us an interesting comparison, as the input conjectures were
     selected by a human, whereas the lemmas were constructed by a machine. Tables 5 and 6 show
     the results on the same set of conjectures when different proof measures were selected. Many
     of the lemmas appear in all three tables, although in different positions. Such lemmas seem to
     be essential for the automated prover when working with this particular theory.
         Note that some of the lemmas the system has found are just conjectures reformulated as
     clauses. In such a case the system usually proves such conjecture using the lemma in a single
     step and then further improves the proof of the lemma.
         Table 7 shows the summary of achieved results. For each proof measure the initial and the
     final cost of the proofs is shown.


Empirically Successful Computerized Reasoning                                                           43
        no.   level    || · ||i   || · || f    formula                                                             lemmas
        C1        1          1           1     ∀X : X = X
        C2        1          5           2     ∀X : (X ⊆ X)
        C3        1          2           2     ∃X : ∀Z : ¬(Z ∈ X)
        C4        2          6           2     ∀X : (0/ ⊆ X)                                                       L18
        C5        2        16            9     (0/ ∈ U )                                                           L5
        C6        2          4           3     ∀X,Y : ((Y ∈ {X}) ⇒ Y = X)                                          L14
        C7        1          3           3     ∀Z : (Z = 0/ ∨ ∃Y : (Y ∈ Z))
        C8        1          2           2     ∀X : ({X} ∈ U )
        C9        2        11            4     ∀X : ((X ⊆ 0) / ⇒ X = 0) /                                          L18 L11
       C10       2           8           2     ∀X,Y : ({X} ∈ hX,Y i)
       C11       3         19            4     ∀X,Y : ((X ∈ Y ) ⇒ ({X} ⊆ Y ))                                      L1
       C12       1           9           9     ∀X,Y : ¬(Y ∈ (X ∩ X))
       C13        1          2           2     ∀X,Y : (hX,Y i ∈ U )
       C14        1        11          11      ∀X,Y, Z : (((X ⊆ Y ) ∧ (Y ⊆ Z)) ⇒ (X ⊆ Z))
       C15        2          7           3     ∀X : ((X ∈ U ) ⇒ (X ∈ {X}))                                         L12
       C16        1        22          21      ∀X,Y : ({X, X} ⊆ {X,Y })
       C17       2           9           4     ∀X : ((X ∈ U ) ⇒ {X} 6= 0)  /                                       L12
       C18       2         21            3     ∀X : (¬(X ∈ U ) ⇒ {X} = 0)    /                                     L8
       C19       1           3           3     ∀X : ({member o f (X)} = X ⇒ (X ∈ U ))
       C20        2          7           2     ∀X,Y : ({X, {Y }} ∈ hX,Y i)
       C21        1        10          10      ∀X,Y : (({member o f (X)} = X ∧ (Y ∈ X)) ⇒ member o f (X) = Y )
       C22        6        87          25      ∀X,Y : ((X ⊆ {Y }) ⇒ (X = 0/ ∨ {Y } = X))                           L16
       C23        1        19          10      ∀X,Y : (({X} = {Y } ∧ (Y ∈ U )) ⇒ X = Y )
       C24        3        15          10      ∀X,Y : (({X} = {Y } ∧ (X ∈ U )) ⇒ X = Y )                           L12 L17
       C25       2         12            4     ∀X,Y : ((X ∈ U ) ⇒ {X,Y } 6= 0) /                                   L9
       C26       2           9           4     ∀X,Y : ((Y ∈ U ) ⇒ {X,Y } 6= 0) /                                   L12
       C27       1           3           3     ∀X : (hX[1] , X[2] i = X ⇒ (X ∈ U ))
       C28       6       610             7     ∀X,Y, Z : (((X ∈ Z) ∧ (Y ∈ Z)) ⇒ ({X,Y } ⊆ Z))                      L1 L2
       C30       8       706         213       ∀W, X,Y, Z : ((hW, Xi = hY, Zi ∧ (X ∈ U )) ⇒ X = Z)                 L3 L10 L5 L17
                                                                                                                   L9
       C31       2        20          12       ∀W, X,Y, Z : ((hW, Xi = hY, Zi ∧ (W ∈ U )) ⇒ W = Y )                L14 L9
       C32       2        32           4       ∀X,Y : ((¬(X ∈ U ) ∧ ¬(Y ∈ U )) ⇒ {X,Y } = 0)  /                    L8
       C33       2        16          11       ∀X,Y, Z : (((Y ∈ U ) ∧ (Z ∈ U ) ∧ {X,Y } = {X, Z}) ⇒ Y = Z)         L12 L14
       C34       2        16          11       ∀X,Y, Z : (((X ∈ U ) ∧ (Y ∈ U ) ∧ {X, Z} = {Y, Z}) ⇒ X = Y )        L14 L9
       C35       3        21           4                          / = hX,Y i ∨ (Y ∈ U ))
                                               ∀X,Y : ({{X}, {X, 0}}                                               L10

     Table 3: Selected conjectures from the TPTP set theory sample with the proof cost measured by
     the number of clauses processed by the prover.

                 no.      level      || · ||   formula                                                    lemmas
                 L1           4         32     (A ∈ C) ∧ (B ∈ C) ⇒ ({A, B} ⊆ C)                           L7
                 L2          5            4    (A ∈ C) ⇒ ({A, B} ⊆ C) ∨ (B ∈ C)
                 L3           7           5    (B ∈ D) ∧ (A ∈ C) ⇒ ({{A, A}, {A, {B, B}}} ∈ (C × D))      L6
                 L4           3           6    (A ∈ C) ∧ (A ∈ B) ⇒ (A ∈ (B ∩C))                           L7
                 L5           1           8    (A ∈ B) ⇒ (A ∈ U )
                 L6           6           7    (B ∈ D) ∧ (A ∈ C) ⇒ (hA, Bi ∈ (C × D))                     L7
                 L7           2           8    (B ∈ C) ⇒ (A ∈ U ) ∨ ({B, A} ⊆ C)                          L13 L5
                 L8           1         32     0/ = {A, B} ∨ (A ∈ U ) ∨ (B ∈ U )
                 L9           1           8    (A ∈ U ) ⇒ (A ∈ {A, B})
                L10           2           3    0/ = {A, A} ∨ (A ∈ U )                                     L8
                L11           1           6    (B ⊆ A) ∧ (A ⊆ B) ⇒ A = B
                L12           1           7    (A ∈ U ) ⇒ (A ∈ {B, A})
                L13           1         31     (A ∈ C) ⇒ ({A, B} ⊆ C) ∨ (B ∈ {A, B})
                L14           1           5    (A ∈ {C, B}) ⇒ A = B ∨ A = C
                L15           1           2    (A ⊆ U )
                L16           5           4    (A ⊆ {B,C}) ∧ (C ∈ A) ∧ (B ∈ A) ⇒ A = {B,C}
                L17           2           4    (A ∈ U ) ∧ A = B ⇒ (A ∈ {B,C})                             L9
                L18           1           6    (0/ ⊆ A)

     Table 4: Lemmas found for the TPTP set theory sample with the proof cost measured by the
     number of clauses processed by the prover.


44                                                                                           Empirically Successful Computerized Reasoning
                             no.   level   || · ||   formula                              lemmas
                             L1       1       37     (A ∈ {C, B}) ⇒ A = B ∨ A = C
                             L2        2      64     0/ = {A, B} ∨ (A ∈ U ) ∨ (B ∈ U )    L1
                             L3       2       43     (A ∈ C) ∧ (B ∈ C) ⇒ ({A, B} ⊆ C)     L1
                             L4       1       24     (A ∈ U ) ⇒ (A ∈ {B, A})
                             L5       1       24     (A ∈ U ) ⇒ (A ∈ {A, B})
                             L6       3       28     0/ = {A, A} ∨ (A ∈ {A, A})           L2 L4
                             L7       3       44     (A ∈ U ) ⇒ (A ∈ B) ∨ (A ∈ B)         L2
                             L8       1       34     (A ∈ (B ∩C)) ∧ (A ∈ C) ⇒ false
                             L9        1      23     {A} = {A, A}
                            L10        1      21     (A ∈ B) ⇒ (A ∈ U )
                            L11        1      11           / ⇒ false
                                                     (A ∈ 0)
                            L12        1      18     ({A, B} ∈ U )
                            L13       2       75     hA, Bi = {{A}, {A, {B}}}             L9
                            L14        1      31     ((A ∩ B) ⊆ B)
                            L15        2      42     (A ∈ U ) ∨ ({A, A} ⊆ B)              L1 L10

     Table 5: Lemmas found for the TPTP set theory sample with the proof cost measured by the
     proof size.


                             no.   level   || · ||   formula                              lemmas
                             L1        2      19     0/ = {A, B} ∨ (A ∈ U ) ∨ (B ∈ U )    L2
                             L2        1      15     (A ∈ {C, B}) ⇒ A = B ∨ A = C
                             L3        3      16     (A ∈ U ) ⇒ (A ∈ B) ∨ (A ∈ B)         L1
                             L4       4       20     (A ∈ B) ⇒ ({A, A} ⊆ B)               L2
                             L5       1         8    {A} = {A, A}
                             L6       1       11     (A ∈ U ) ⇒ (A ∈ {A, B})
                             L7       1       11     (A ∈ U ) ⇒ (A ∈ {B, A})
                             L8       1       10           / ⇒ false
                                                     (A ∈ 0)
                             L9        1      16     (B ⊆ A) ∧ (A ⊆ B) ⇒ A = B
                            L10        2      10     hA, Bi = {{A}, {A, {B}}}             L5
                            L11        1        9    ({A, B} ∈ U )
                            L12       1       15     (A ∈ C) ∧ (A ∈ B) ⇒ (A ∈ (B ∩C))
                            L13       1       16     (A ∈ B) ⇒ (A ∈ U )

     Table 6: Lemmas found for the TPTP set theory sample with the proof cost measured by the
     proof length.


                            proof measure                             initial cost        final cost
                            number of processed clauses                      3991        1921 (48%)
                            proof size                                       3487        2794 (80%)
                            proof length                                     1086         878 (81%)

                                   Table 7: Results for the TPTP set theory sample.


Empirically Successful Computerized Reasoning                                                          45
     6.2   Mizar problems – Boolean properties of sets
     These theorems address basic boolean properties of sets in the Mizar database for mathematics.
     The conjectures were converted from the Mizar language by Josef Urban [Urb04, Urb03] into a
     form suitable for automated theorem provers.
        The system was rather effective for reducing the number of processed clauses required to
     prove these set of conjectures. Table 8 shows the summary of achieved results. For each proof
     measure the initial and the final cost of the proofs is shown.

                        proof measure                               initial cost        final cost
                        number of processed clauses                      62706        1852 (3.0%)
                        proof size                                       10680        9351 (88%)
                        proof length                                       3687       3046 (83%)

                        Table 8: Results for the Mizar boolean properties of sets.


          As the list of the conjectures and the lemmas in this case is rather long, we did not include
     it in this paper.

     6.3   Meredith’s axiomatization of propositional logic
     In this section we give detailed results for the example in section 4. The formulas are presented
     using standard logic symbols.
         Table 9 shows the conjectures along with the results obtained when running the system with
     the proofs being measured by the number of processed clauses and Table 10 shows the lemmas
     that were found.
                        no.   level   || · ||i   || · || f   formula                    lemmas
                        C1        8       15            2    ⊢ (A → A)                  L27
                        C2      11    3682              2    ⊢ (A → ¬¬A)
                        C3        7   3297              2    ⊢ (¬¬A → A)
                        C4        2       17            2    ⊢ (A → (B → A))
                        C5        9   5731              2    ⊢ (¬A → (A → B))
                        C6        8       41            5    ¬(A → A) ⊢ B               L27
                        C7        5       31            3    A ⊢ (¬A → B)
                        C8        3       13            3    B ⊢ (¬A → B)               L18
                        C9      11    1673            16     ¬(A → ¬B) ⊢ B              L22 L8 L4
                       C10        7     938           62     ¬(A → ¬B) ⊢ A              L17 L9 L16
                                                                                        L30 L18 L4
                       C11      12      287          20      (P → ¬(Q → ¬R)), P ⊢ R     L21 L30 L8 L4

     Table 9: Conjectures from the Meredith’s axiomatization example with the proof cost measured
     by the number of clauses processed by the prover.


         Because we code propositional formulas into terms of predicate logic, we can express much
     more than just that a propositional formula is a theorem. We can also express meta-theorems
     that speak about provability of different formulas and what are the relations between them. For
     example, recall how that modus ponens rule was coded as (t(A)∧t(i(A, B))) ⇒ t(B). The system
     derived many lemmas of similar nature, thus discovering many admissible rules. This fact
     becomes much more interesting in the case of modal logic, described in the next section, where
     the deduction theorem does not hold, hence the admissible rules have much greater importance.


46                                                                                    Empirically Successful Computerized Reasoning
              no.     level   || · ||   formula                                                   lemmas
              L1          6        7    C ⊢ (A → (¬¬B → B))                                       L14 L4
              L2          6      24     D ⊢ (A → (B → (¬¬C → C)))                                 L12 L4 L3
              L3          3        8    (((D → B) → (E → B)) → (B → C)) ⊢ (A → (B → C))           L23 L4
              L4          1        4    A, (A → B) ⊢ B
              L5          3        5    A ⊢ ((A → B) → (C → B))                                   L18 L12
              L6          8        6    ((((B → C) → (¬D → ¬A)) → D) → B) ⊢ (A → B)               L27 L12
              L7          7        4    ⊢ ((((¬A → ¬B) → A) → C) → (B → C))                       L12
              L8          8        6    C ⊢ (A → ((¬B → ¬A) → B))
              L9          5      28     ((C → E) → (¬B → ¬D)) ⊢ (((A → B) → C) → (D → C))         L18 L4 L19
             L10          8        7    (((¬C → ¬A) → C) → B) ⊢ (A → B)                           L7 L4
             L11          8        4    ⊢ (((A → B) → ¬(¬B → ¬C)) → (C → ¬(¬B → ¬C)))             L7 L12
             L12          1        6    ((((B → D) → (¬E → ¬C)) → E) → A) ⊢ ((A → B) → (C → B))
             L13          6      13     (((B → C) → D) → (¬C → ¬A)) ⊢ (A → (B → C))               L9 L12 L4
             L14          5        4    ⊢ (A → (B → (¬¬C → C)))                                   L2
             L15        10         4    B ⊢ (A → ¬¬A)
             L16          2        7    D, C ⊢ (A → (B → C))                                      L18
             L17          5        7    ((C → C) → B) ⊢ (A → B)                                   L31
             L18          2        6    B ⊢ (A → B)
             L19          4        7    C, A ⊢ (¬A → B)
             L20          6        6    ⊢ (((A → (B → B)) → C) → (D → C))                         L17 L31 L12
             L21        11         4    B ⊢ (A → ¬¬A)                                             L15
             L22          6      14     ((C → D) → B), (¬D → ¬A) ⊢ (A → B)                        L9 L18 L4
             L23          2        4    ⊢ ((((A → B) → (C → B)) → (B → D)) → (E → (B → D)))       L12
             L24          6        4    ⊢ (((A → ¬¬B) → C) → (B → C))                             L12
             L25          3        4    ⊢ (((A → (¬B → C)) → D) → (B → D))                        L23 L12
             L26          3      11     (D → C), D ⊢ (A → (B → C))                                L18 L4
             L27          7        4    ⊢ (A → A)                                                 L32
             L28          1        6    C ⊢ (A → (B → A))
             L29          7        3    C ⊢ (A → (B → B))
             L30          6        5    (¬B → ¬D) ⊢ (((A → B) → C) → (D → C))                     L9 L18
             L31          4        4    ⊢ (((A → A) → B) → (C → B))                               L25 L12
             L32          6        4    ⊢ (A → (B → (C → C)))                                     L17 L31
             L33          9        2    ⊢ (((A → B) → ¬¬C) → (C → ¬¬C))                           L9 L13 L11

     Table 10: Lemmas found for the Meredith’s axiomatization with the proof cost measured by the
     number of clauses processed by the prover.


                              proof measure                         initial cost     final cost
                              number of processed clauses                15725      510 (3.2%)
                              proof size                                  1299     1024 (79%)
                              proof length                                  314     277 (88%)

                    Table 11: Results for the Meredith’s axiomatization of propositional logic.


Empirically Successful Computerized Reasoning                                                                   47
        Table 11 shows the summary of achieved results. Again, for each proof measure the initial
     and the final cost of the proofs is shown.

     6.4   S5 modal logic
     S5 modal logic uses meta-theorems (mentioned above), since the theorem of deduction doesn’t
     hold in S5. From this point of view S5 is an interesting example.
         This set of conjectures is similar to the previous example. We have used [Hal05] to construct
     an axiomatization for S5 modal logic with the three Hilbert’s axioms for propositional logic,
     axioms K, T and 5 and modus ponens and necessitation rule. We have used the same formula
     coding with an additional unary function symbol l(. . .) for the modal operator 2.

             120000                                                                           45
                                                                     proof cost
                                                       no. of accepted lemmas                 40
             100000
                                                                                              35
               80000                                                                          30
                                                                                              25
               60000
                                                                                              20
               40000                                                                          15
                                                                                              10
               20000
                                                                                              5
                    0                                                                        0
                        0     20      40       60       80       100      120             140
                            Number of proved (not necessarily accepted) lemmas


     Figure 1: Run of the system on the S5 axiomatization with with the proof cost measured by the
     number of clauses processed by the prover.

         For this example we also show a graph that illustrates performance of the system, see Fig-
     ure 1. The x axis shows time points distinguished by the total number of lemmas (both accepted
     and unaccepted) the system has used at least in one proof. The thick line shows how the total
     cost of the proofs evolved and corresponds to the tick marks on the left. The thin line shows the
     number of lemmas that were marked as accepted and corresponds to the tick marks on the right.
     As we can see, the cost of the proofs was reduced to about 1/3 with the first 10 lemmas. The
     cost of the proofs then gradually decreased and the lemmas that were accepted often brought
     only a slight gain. There were two more significant improvements at the points 40, 81 and 97,
     when interesting lemmas were discovered and sudden advancements were made.
         Most of the the lemmas that were discovered say that a particular proposition is a theorem
     of S5. But the system also discovered several lemmas that describe admissible rules of S5. For
     example the lemma L40 in Table 15 states that from 2B and B → A we can derive 2A.
         Table 16 shows the summary of achieved results. For each proof measure the initial and the
     final cost of the proofs is shown.


48                                                                   Empirically Successful Computerized Reasoning
                  no.    level    || · ||i   || · || f    formula                                    lemmas
                  C1         6        11            2     ⊢ (A → A)                                  L19
                  C2        1           2           2     ⊢ (2P → P)
                  C3        9     1086              2     ⊢ (A → ¬¬A)
                  C4        2           5           3     ⊢ A ∨ ¬2A                                  L43
                  C5         1          3           3     ⊢ ¬A ∨ 2A
                  C6         9    1089              2     ⊢ (¬¬A → A)
                  C7         7    3586              2     ⊢ ((¬A → A) → A)
                  C8         7        44            2     ⊢ (¬A → (A → B))                           L25
                  C9         8      680           15      ⊢ 2(A → ¬2¬A)                              L37 L2 L36
                                                                                                     L33
                 C10        5     15332          72       ⊢ (P → 2¬2¬P)                              L17 L2 L1
                 C11        6        43           5       ¬(A → A) ⊢ B                               L19
                 C12        8        58           3       A ⊢ (¬A → B)
                 C13        2         5           3       B ⊢ (¬A → B)                               L10
                 C14        6        90          10       ¬(A → ¬B) ⊢ B
                 C15        7      1293           6       ¬(A → ¬B) ⊢ A                              L20 L23
                 C16        8     30413           2       ⊢ ((A → B) → (¬B → ¬A))
                 C17        9     23254           3       ⊢ (A → (¬B → ¬(A → B)))                    L19
                 C18       10      9580           4       A, B ⊢ ¬(A → ¬B)                           L14
                 C19        1         3           3       ⊢ 2(¬2¬P → 2¬2¬P)
                 C20        7      5471         171       ¬(B → ¬A) ⊢ ¬(A → ¬B)                      L28 L2 L13
                                                                                                     L22 L1 L42 L9
                                                                                                     L24
                 C21        1         8            6      2(A → B), 2(B → A) ⊢ 2(2A → 2B)
                 C22       10     23271           26      ¬(A → ¬¬(B → ¬C)) ⊢ ¬(¬(A → ¬B) → ¬C)      L20 L23 L14

         Table 12: Conjectures in the S5 axiomatization, prover processed clauses count measure


                           no.    level      || · ||     formula                            lemmas
                           L1        2          59       (A → C), (A → (C → B)) ⊢ (A → B)   L4
                           L2         3         40       (A → C), (C → B) ⊢ (A → B)         L1
                           L3        3          40       C, (A → (C → B)) ⊢ (A → B)         L1
                           L4        1          39       22B, (B → A) ⊢ 2A
                           L5        5          25       ⊢ (¬¬A → A)                        L3
                           L6        6          44       (¬B → A) ⊢ (¬A → B)
                           L7        4          59       (A → (¬C → ¬B)) ⊢ (A → (B → C))    L2
                           L8        1          33       (¬A → ¬B), B ⊢ A

     Table 13: Lemmas found for the S5 axiomatization with the proof cost measured by the proof
     size.


                            no.    level     || · ||     formula                            lemmas
                            L1         1        13       (A → C), (A → (C → B)) ⊢ (A → B)
                            L2         2        12       (A → C), (C → B) ⊢ (A → B)
                            L3         2        13       C, (A → (C → B)) ⊢ (A → B)         L1
                            L4         4        10       (¬B → A) ⊢ (¬A → B)                L7
                            L5         3          8      ⊢ (¬¬A → A)
                            L6         1        13       22B, (B → A) ⊢ 2A
                            L7         1        10       (¬B → ¬A) ⊢ (A → B)

     Table 14: Lemmas found for the S5 axiomatization with the proof cost measured by the proof
     length.


Empirically Successful Computerized Reasoning                                                                        49
        no.   level   || · ||    formula                                                              lemmas
        L1        3        8     (A → C), (A → (C → B)) ⊢ (A → B)                                     L32
        L2        4      11      (A → C), (C → B) ⊢ (A → B)
        L3        1        9     (B → (C → D)) ⊢ (A → ((B → C) → (B → D)))
        L4       6         8     C, (C → B) ⊢ (A → B)                                                 L24
        L5       2       11      22B, (B → A) ⊢ 22A                                                   L43
        L6       8         5     A ⊢ (¬¬(A → B) → B)                                                  L13
        L7       7     247       ⊢ (¬A → ((B → A) → ¬B))                                              L8 L25 L1 L15
        L8       5       30      ((C → A) → ((C → B) → D)) ⊢ ((A → B) → ((C → A) → D))                L3 L31 L1
        L9       5       11      (A → (B → D)), (B → (D → C)) ⊢ (A → (B → C))                         L2 L32
       L10       1         6     B ⊢ (A → B)
       L11       3       21      ((C → A) → B) ⊢ (A → B)                                              L10 L32 L15
       L12       6       10      D, (D → C) ⊢ (A → (B → C))                                           L10
       L13       4         8     (A → (C → B)), C ⊢ (A → B)                                           L10 L1
       L14       9         6     B, A ⊢ ¬(A → ¬B)                                                     L20 L6
       L15       1         4     B, (B → A) ⊢ A
       L16       1         5     A, B ⊢ 22A
       L17       4         5     (B → (A → C)) ⊢ (A → (B → C))                                        L11 L32
       L18       6         5     ((A → B) → A) ⊢ ((A → B) → B)                                        L19 L1
       L19       5         4     ⊢ (A → A)
       L20       2         7     (¬A → ¬B), B ⊢ A                                                     L37 L15
       L21       6         7     ((C → C) → B) ⊢ (A → B)                                              L19
       L22        5        9     (A → (C → B)), ((C → B) → C) ⊢ (A → B)                               L2 L1
       L23        6      11      ¬(A → C) ⊢ (¬A → B)                                                  L29 L24
       L24        5      10      (A → C), ¬C ⊢ (A → B)                                                L28 L2
       L25        6        4     ⊢ (¬A → (A → B))                                                     L29
       L26        6      32      (C → (A → D)) ⊢ (A → (B → (C → D)))                                  L10 L9
       L27        2        7     2B, (B → A) ⊢ A                                                      L43
       L28        1        4     ¬A ⊢ (A → B)
       L29        5        6     (A → (¬C → ¬B)) ⊢ (A → (B → C))                                      L2
       L30        6        5     (¬A → (B → ¬C)) ⊢ ((¬A → B) → (C → A))                               L29 L32
       L31        4        4     ((A → (B → C)) → (((A → B) → (A → C)) → D)) ⊢ ((A → (B → C)) → D)    L1
       L32        2        7     (A → (B → C)) ⊢ ((A → B) → (A → C))                                  L15
       L33        7        4     ⊢ (¬¬A → (B → A))                                                    L29 L25
       L34        8        4     ⊢ (A → (¬¬B → B))                                                    L17 L33
       L35        1        9     (¬C → ¬B) ⊢ (A → (B → C))
       L36        4        4     (A → ((C → A) → B)) ⊢ (A → B)                                        L1
       L37        1        6     (¬B → ¬A) ⊢ (A → B)
       L38        8        6     (A → (¬B → (C → B))) ⊢ (A → (¬B → ¬C))                               L7 L9
       L39        7        6     (¬A → A) ⊢ (¬A → B)                                                  L25 L1
       L40        2        9     2B, (B → A) ⊢ 2A                                                     L43
       L41        8        6     ⊢ (A → (B → ¬¬B))                                                    L17 L29 L33
       L42        6      11      (A → D), (D → C) ⊢ (A → (B → C))                                     L2 L24
       L43        1        5     2A ⊢ A

     Table 15: Lemmas found for the S5 axiomatization with the proof cost measured by the number
     of clauses processed by the prover.


                                proof measure                     initial cost     final cost
                                number of processed clauses          115327      2091 (1.8%)
                                proof size                                632     379 (60%)
                                proof length                              642     389 (61%)

                                          Table 16: Results for S5 modal logic.


50                                                                               Empirically Successful Computerized Reasoning
        The system again performed very well in the case when the measure was the number of
     processed clauses. This time, the total cost of the proofs was reduced to less than 2%.


     7     Future work
     As the system is particularly efficient in speeding up the prover, we believe that it could be
     modified to search for lemmas that would make it possible to prove conjectures that the prover
     alone wasn’t able to prove. This will require a change of strategy, because currently the system
     looks primarily for lemmas that improve already existing proofs of the conjectures and therefore
     are not general enough to prove some new unknown conjecture.
         We would also like to investigate the nature of the lemmas that help to improve particular
     proof measures in order to develop a better strategy for their evaluation.
         Finally, we plan to perform a in-depth testing of the system on various sets of conjectures
     from different sources.


     8     Conclusion
     Given a related set of conjectures, it is possible to automatically construct lemmas that can
     significantly reduce the cost of the proofs of the conjectures. The results are summarized in
     Table 17 for convenience.
                                            processed clauses                proof size               proof length
         set of conjectures               || · ||i     || · || f    || · ||i        || · || f   || · ||i      || · || f
         TPTP set theory                    3991 1921 (48%)          3487 2794 (80%)            1086       878 (81%)
         Mizar set properties             62706 1852 (3.0%)        10680 9351 (88%)             3687 3046 (83%)
         Meredith’s axiomatization        15725     510 (3.2%)       1299 1024 (79%)              314      277 (88%)
         S5 modal logic                  115327 2091 (1.8%)            632       379 (60%)        642      389 (61%)

            Table 17: Summary of the results of the system on the presented sets of conjectures.


         The system that we have developed performs well on different sets of conjectures, particu-
     larly if the cost of the proofs of the conjectures is measured in the number of clauses processed
     by the prover. The system can also improve the size and/or the length of the proofs, although it
     is not as effective in these cases.


     References
     [AS92]         Owen L. Astrachan and Mark E. Stickel. Caching and lemmaizing in model elim-
                    ination theorem provers. In Deepak Kapur, editor, CADE, volume 607 of Lecture
                    Notes in Computer Science, pages 224–238. Springer, 1992.

     [Col02a]       Simon Colton. Automated Theory Formation in Pure Mathematics. Distinguished
                    Dissertations. Springer, 2002.


Empirically Successful Computerized Reasoning                                                                             51
     [Col02b]   Simon Colton. The HR program for theorem generation. In Andrei Voronkov,
                editor, CADE, volume 2392 of Lecture Notes in Computer Science, pages 285–289.
                Springer, 2002.

     [CP80]     P. T. Cox and T. Pietrzykowski. A complete, nonredundant algorithm for reversed
                skolemization, volume 87 of Lecture Notes in Computer Science. Springer, May
                1980.

     [CP93]     Ritu Chadha and David Plaisted. Finding logical consequences using unskolemiza-
                tion, volume 689 of Lecture Notes in Computer Science. Springer, May 1993.

     [FFF99]    Marc Fuchs, Dirk Fuchs, and Matthias Fuchs. Generating lemmas for tableau-
                based proof search using genetic programming. In Wolfgang Banzhaf, Jason Daida,
                Agoston E. Eiben, Max H. Garzon, Vasant Honavar, Mark Jakiela, and Robert E.
                Smith, editors, Proceedings of the Genetic and Evolutionary Computation Con-
                ference, volume 2, pages 1027–1032, Orlando, Florida, USA, July 1999. Morgan
                Kaufmann.

     [Hal05]    John Halleck. Logic systems, 2005. http://www.cc.utah.edu/˜nahaj/logic/structures/.

     [McC94]    W. W. McCune. OTTER 3.0 reference manual and guide. Technical Report ANL-
                94/6, Argonne National Laboratory, Argonne, Illinois, 1994.

     [MVF+ 02] William McCune, Robert Veroff, Branden Fitelson, Kenneth Harris, Andrew Feist,
               and Larry Wos. Short single axioms for boolean algebra. J. Autom. Reasoning,
               29(1):1–16, 2002.

     [Qua92]    Art Quaife. Automated Development of Fundamental Mathematical Theories.
                Kluwer Academic Publishers, 1992.

     [Sch02]    S. Schulz. E – A brainiac theorem prover. Journal of AI Communications, 15(2-
                3):111–126, 2002.

     [SGC03]    G. Sutcliffe, Y. Gao, and S. Colton. A Grand Challenge of Theorem Discovery. In
                J. Gow, T. Walsh, S. Colton, and V. Sorge, editors, Proceedings of the Workshop on
                Challenges and Novel Applications for Automated Reasoning, 19th International
                Conference on Automated Reasoning, pages 1–11, 2003.

     [SS98]     G. Sutcliffe and C. B. Suttner. The TPTP Problem Library: CNF Release v1.2.1.
                Journal of Automated Reasoning, 21(2):177–203, 1998.

     [Urb03]    Josef Urban. Translating Mizar for first order theorem provers. In MKM, volume
                2594 of Lecture Notes in Computer Science, pages 203–215. Springer, 2003.

     [Urb04]    Josef Urban. MPTP - motivation, implementation, first experiments. Journal of
                Automated Reasoning, 33(3-4):319–339, 2004.

     [WP03]     Larry Wos and Gail W. Pieper. Automated Reasoning and the Discovery of Missing
                and Elegant Proofs. Rinton Press, 2003.


52                                                                Empirically Successful Computerized Reasoning

</pre>