=Paper=
{{Paper
|id=Vol-1785/W23
|storemode=property
|title=Initial Experiments with Statistical Conjecturing over Large Formal Corpora
|pdfUrl=https://ceur-ws.org/Vol-1785/W23.pdf
|volume=Vol-1785
|authors=Thibault Gauthier,Cezary Kaliszyk,Josef Urban
|dblpUrl=https://dblp.org/rec/conf/cikm/GauthierKU16
}}
==Initial Experiments with Statistical Conjecturing over Large Formal Corpora==
<pdf width="1500px">https://ceur-ws.org/Vol-1785/W23.pdf</pdf>
<pre>
Initial Experiments with Statistical Conjecturing
           over Large Formal Corpora

             Thibault Gauthier1 , Cezary Kaliszyk1 , and Josef Urban2
                           1
                           University of Innsbruck, Austria
                 {thibault.gauthier,cezary.kaliszyk}@uibk.ac.at
                       2
                         Czech Technical University in Prague
                              josef.urban@gmail.com


        Abstract. A critical part of mathematicians’ work is the process of
        conjecture-making. This involves observing patterns in and between math-
        ematical objects, and transforming such patterns into conjectures that
        describe or better explain the behavior of the objects. Computer scien-
        tists have since long tried to reproduce this process automatically, but
        most of the methods were typically restricted to specific domains or based
        on brute-force enumeration methods. In this work we propose and imple-
        ment methods for generating conjectures by using statistical analogies
        extracted from large formal libraries, and provide their initial evaluation.


1     Introduction
In the past decade, there has been considerable progress in proving conjectures
over large formal corpora such as Flyspeck [7], Isabelle/HOL [12], the Mizar
Mathematical Library (MML) [1] and others. This has been achieved by combin-
ing high-level learning or heuristic fact-selection mechanisms with a number of
methods and strategies for guiding the strongest (typically superposition-based)
automated theorem provers (ATPs). While this approach has not reached its
limits [2], and its results are already very useful, it still seems quite far from the
way humans do mathematics. In particular, even with very precise premise (fact)
selection, today’s ATPs have trouble finding many longer Mizar and Flyspeck
proofs of the toplevel lemmas in their libraries. In fact, only a few of such lemmas
would be called a “theorem” by a mathematician. Often the “theorem” would be
just the final proposition in a formal article, and the toplevel lemmas preceding
the theorem would be classified as simple technical steps that sometimes are not
even mentioned in informal proofs.
    An important inductive method that mathematicians use for proving hard
problems is conjecturing, i.e., proposing plausible lemmas that could be useful
for proving a hard problem.3 There are likely a number of complicated inductive-
deductive feedback loops involved in this process that will need to be explored,
however it seems that a major inductive method is analogy. In a very high-level
3
    A famous example is the Taniyama-Shimura conjecture whose proof by Wiles finished
    the proof of Fermat’s Last Theorem.
way this could be stated as: “Problems and theories with similar properties are
likely to have similar proof ideas and lemmas.”
    Again, analogies can likely be very abstract and advanced. Two analogue
objects may be related through a morphism. They can also be two instances of
the same algebraic structure. The organization of such structures was recently
addressed by the MMT framework [13] in order to represent different logics in a
single consistent library. In this work we start experimenting with the analogies
provided by statistical concept matching [4]. This method has been recently
developed in order to align formal libraries of different systems and to transfer
lemmas between them [5]. Here we use statistical concept matching to find the
most similar sets of concepts inside one large library. The discovered sets of
matching concepts can then be used to translate a hard conjecture C into a
“related” conjecture C 0 , whose (possible) proof might provide a further guidance
for proving C. The remaining components that we use for this first experiment
are standard large-theory reasoning components, such as fast machine-learners
that learn from the thousands of proofs and propose the most plausible lemmas
for proving the related conjectures, and first-order ATPs – in this case we use
Vampire 4.0 [10].

2     Matching concepts
In order to apply some analogies, we first need to discover what they are by
finding similar concepts. For our initial evaluation, our similarity matching will
be limited to concepts represented by constants and ground sub-terms. Later this
can be extended to more complex term structures. We describe here a general
concept matching algorithm inspired and improved upon [4] and discuss how
this algorithm can be adapted to find analogies within one library.

2.1    Matching concepts between two libraries
Given two theorem libraries, the first step is to normalize all statements, as this
increases the likelihood of finding similar theorem shapes. If two theorems have
the same shape (that we will call such abstract shapes a property), then the
concrete constants appearing in this two theorems at the same positions are
more likely to be similar. We will say that such pairs of constants are derived
from the theorem pair.
Example 1. Given the theorems T1 and T2 with the statements, their respective
normalizations, and the properties extracted from their statements:

             T1 : ∀x : num. x = x + 0        T2 : ∀x : real. x = x × 1
    P1 : λnum, +, 0. ∀x : num x = x + 0    P2 : λreal, ×, 1. ∀x : real. x = x × 1
The properties P1 and P2 are α-equivalent, therefore the theorems T1 and T2
form a matching pair of theorems, and the following three matching pairs of
constants are derived:
                          num ↔ real, + ↔ ×, 0 ↔ 1
    We further observe that the matchings (+, ×) and (0, 1) are in relation
through the theorem pair (T1 , T2 ). The strength of this relation – correlation
– is given by the number and accuracy of the theorem pairs these matchings
are jointly derived from. We will call the graph representing the correlations
between different matchings the dependency graph. The similarity score of each
matching (i.e., pair of concepts) is then initialized with a fixed starting value
and computed by a dynamical process that iterates over the dependency graph
until a fixpoint is reached.
    The principal advantage of this method is that the algorithm is not greedy,
allowing a concept to match multiple other concepts in the other libraries. This
is a desirable property, since we want to be able to create multiple analogues for
one theorem. Matchings are then combined into substitutions, which are in turn
applied to the existing theorem statements yielding plausible conjectures.
    Very few adjustments are needed to adapt this method to a single library.
We create a duplicate of the library and match it with its copy. Since we are
not interested in identity substitutions, we prevent theorems from matching with
their copies. However, we keep self matches between constants in the dependency
graph since good analogies can be often found by using partial substitutions.


3   Context-dependent substitutions

Constant matchings by themselves create special one-element substitutions that
transport the properties of one constant to another. In general, substitutions are
created from translating more than one constant. Suppose, that we know that
addition is similar to union and multiplication to intersection. We can hope to
transport the distributivity of multiplication over addition to the distributivity
of intersection over union. Therefore, the correlations between the matchings are
crucial for building the substitutions.
    We now present two methods for creating substitutions from a theorem.
These methods are based on the correlations between the concept pairs and the
similarity score of each concept pair.
    We want to construct substitutions that are most relevant in the local con-
text, i.e., for the symbols that are present in the theorem we want to translate
(target theorem).
    The first method starts by reducing the dependency graph to the subgraph of
all concept pairs whose first element is contained in the target theorem. We first
select a starting node in this subgraph which is not an identity matching, and
recursively find nodes (possibly identities) that are connected by the strongest
edges of the subgraph, under the constraint that no two selected nodes have the
same first element. The algorithm stops when no new node can be added. The
final set of nodes obtained in this way forms a partial substitution. We run this
algorithm either for all starting nodes, or for those with similarity scores above
a certain threshold. This produces a set of substitutions, which are effectively
the most relevant substitutions for the target theorem. This process seems to
produce many substitutions, however in practice, many of them are identical,
which limits their total number.
    The second method is a brute-force approach where we first find the set of
concepts (M atched(T )) that match at least one concept in the set of concepts
(Concepts(T )) present in the target theorem T . We would like to create all
possible substitutions between Concepts(T ) and M atched(T ) and rank them,
however this would often blow up the generation phase. To limit the number
of possible substitutions, we remove possible matches by iterating the following
process until the number of substitutions is below a chosen threshold (1000):
We select the constant C in T with most remaining matchings, remove its worst
match, recompute the number of remaining matches for all constants in T , and
check if the number of substitutions is already below the threshold. If so, the
process terminates, otherwise we continue the iteration.
    Next, we select the 200 substitutions with the highest combined score. The
combined score of a substitution S is computed by multiplying the average cor-
relation and similarity in S, i.e. formally as follows:


    CombinedScore(S) = AverageCorrelation(S) × AverageSimilarity(S)

                                     1          X
       AverageCorrelation(S) =           ×                 Correlation(M, M 0 )
                                    |S|2
                                             M ∈S,M 0 ∈S
                                1    X
        AverageSimilarity(S) =     ×   SimilarityScore(M )
                               |S|
                                           M ∈S

    On top of these combined scores, the diversity of substitutions can be max-
imized by the following shuffling. The process iteratively chooses a (not yet se-
lected) substitution with the best diversity score and increases the set of selected
substitutions T. These scores are then updated to penalize the substitutions that
have more matchings in common with the already selected ones.


    CommonM atchings(S, T ) = |S ∩ T |
                                                CombinedScore(S)
            DiversityScore(S) =            P                               3
                                    (1 +     T ∈T CommonM atchings(S, T ))

   These substitutions will eventually be applied to the initial theorem to create
new conjectures. We hope that if such conjectures can be proved, they will
improve the AI/ATP methods by enriching the theory. We consider in Section 4
two possible scenarios where the combination of conjecturing by analogies and
premise selection could be useful.

4   Scenarios
We will consider two scenarios for the use of conjecturing: without and with a
user given goal.
    In the first scenario no goal set by the user, and our system should decide
what is the next step in the development of the whole library. In this context, our
algorithm produces the most likely conjectures by applying the first substitution
generation method on all theorems. We then evaluate the conjectures by trying
to prove them. We estimate the difficulty of a proved conjecture by the number
of lemmas that were used in its proof. We estimate the relevance of a conjecture
by how much adding this conjecture as a new theorem in the library helps to
automatically prove formalized statements appearing after it in the library.

    In the second scenario, a goal is given and the system should figure a way
how to prove it. The typical AI-ATP method is to search through all lemmas
(in an evaluation this means the lemmas which occur before the goal), and
select the most relevant ones using machine learning techniques. If however an
important lemma was not proven or proven after the goal, the premise selection
fails to produce it. Therefore, we propose a way to produce some of such relevant
missing lemmas. This method is depicted in Fig 1. We first select the 20 best
scoring substitutions for this goal, which creates 20 new goals. We then try to
prove them using the AI-ATP system. If some of them are successfully proved,
we obtain small sets of lemmas which were sufficient to prove the translated
goals. These lemmas are then translated back to the original concepts. We run
our AI-ATP system again and remove those it cannot prove. The final step is to
try to prove the user goal from those additional lemmas. This strategy simulates
the following thought process a human could have: “Can I prove this conjecture
for a similar concept”?, “If so, what where the necessary lemmas?” “If some
of this lemmas holds for my original concept, they would likely help me in my
original proof”.


           interesting lemmas
                        proof
                                  reflected analogies
               conjectures                                 lemmas


                                                          theorems
                                                                 proof
                                        analogies
           original conjecture                           conjectures


   Fig. 1. Creating additional relevant lemmas for a conjecture through analogies.
5     Experiments
5.1     Untargeted Conjecture Generation
In the first scenario, we apply4 the 20 “most plausible” substitutions to all MML
theorems, and attempt to prove a small part (73535) of those, each time using the
whole MML. We want to see how complicated and interesting the conjectures can
be. After premise selection Vampire can prove 10% (7346) of them in 10 s, which
is reduced to 4464 after pseudo-minimization [8] and removing tautologies and
simple consequences of single lemmas. An example of a reasonably interesting
conjecture (and analogy) with a new nontrivial proof is the convexity of empty
subsets of real linear spaces, induced from a similar claim about them being
“circled”:5
registration
  let X be non empty RLSStruct;
  cluster empty -> circled for Element of bool the carrier of X;
Here “circled” is defined as6
definition
  let X be non empty RLSStruct; let A be Subset of X;
  attr A is circled means :Def6: :: RLTOPSP1:def 6
  for r being Real st abs r <= 1 holds r * A c= A;
and “convex” as7
definition
  let V be non empty RLSStruct; let M be Subset of V;
  attr M is convex means :Def2: :: CONVEX1:def 2
  for u, v being VECTOR of V
    for r being Real st 0 < r & r < 1 & u in M & v in M holds
      (r * u) + ((1 - r) * v) in M;
For example the following properties of circled8 and convex9 subsets are quite
similar, leading the conjecturer into conjecturing further properties like the one
stated above.
registration
  let X be RealLinearSpace;
  let A, B be circled Subset of X;
  cluster A + B -> circled ;
4
    The experimental version of our conjecturer used here is available at:
    http://147.32.69.25/~mptp/conj/conjecturing_standalone.tar.gz .
5
  http://mizar.cs.ualberta.ca/~mptp/7.13.01_4.181.1147/html/rltopsp1.html#CC1
6
  http://mizar.cs.ualberta.ca/~mptp/7.13.01_4.181.1147/html/rltopsp1.html#V3
7
  http://mizar.cs.ualberta.ca/~mptp/7.13.01_4.181.1147/html/convex1.html#V1
8
  http://mizar.cs.ualberta.ca/~mptp/7.13.01_4.181.1147/html/rltopsp1.html#FC3
9
  http://mizar.cs.ualberta.ca/~mptp/7.13.01_4.181.1147/html/convex1.html#T2
theorem :: CONVEX1:2
 for V being non empty Abelian add-associative vector-distributive
   scalar-distributive scalar-associative scalar-unital RLSStruct
 for M, N being Subset of V st M is convex & N is convex holds
   M + N is convex

5.2     Targeted Conjecture Generation
In the second experiment, we have used as our target the set of 22069 ATP-
hard Mizar toplevel lemmas (theorems). These are the theorems that could
not be proved in any way (neither with human-advised premises, nor with
learned premise selection) in our recent extensive experiments with state-of-
the-art AI/ATP methods over the MML [9]. For the current experiment, those
experiments are very thorough. They used high ATP time limits, many ATPs,
their targeted strategies, a number of learning methods and their combinations,
and a number of iterations of the learning/proving loop, approaching in total
a month of a server time. Proving these problems with a low time limit and a
single AI/ATP method is thus quite unlikely.
    To each such hard theorem T we apply the 20 best (according to the diversity
score) substitutions. Such substitutions are additionally constrained to target
only the part of the MML that existed before T was stated. This gives rise to
441242 conjectures, which we attempt to prove – again only with the use of
the MML that precedes T . Because of resource limits, we use only one AI/ATP
method: k-NN with 128 best premises, followed by Vampire using 8 s. This takes
about 14 hours on a 64-CPU server, proving 9747 (i.e. 2.2%) of the conjectures.
We do two rounds of pseudo-minimization and remove tautologies and simple
consequences of single lemmas. This leaves 3414 proved conjectures, originating
from 1650 hard theorems, i.e., each such conjecture C is a translation of some
hard theorem T under some plausible substitution σ (C = σ(T )). We translate
the MML lemmas LiC needed for the proof of C “back” into the “terminology of
T ” by applying to them the reverse substitution σ −1 .
    This results in 26770 back-translated conjectures. For each of them we hope
that (i) it will be provable from the MML preceding T , and (ii) it will be useful
for proving its T , since its image under σ was useful for proving σ(T ). We use
again only 8 s to select those that satisfy (i), yielding after the minimization and
removal of trivial ones 2170 proved back-translated lemmas, originating from
500 hard theorems. For each of these 500 theorems T we do standard premise
selection (using slices of size 128 and 64) over the preceding MML, and add
these lemmas (obviously only those that “belong” to T ) to the premises of T .
Then we run Vampire for 30 s on the standard and lemma-augmented problems.
While there is no difference when using 128 lemmas, for 64 lemmas we obtain (in
addition to the 6 proofs that both the standard and augmented methods find)
an interesting new proof of the theorem MATHMORP:2510 , which cannot be found
by Vampire using the standard method even with a time limit of 3600 s.
10
     http://mizar.cs.ualberta.ca/~mptp/7.13.01_4.181.1147/html/mathmorp.html#T25
theorem :: MATHMORP:25
for T being non empty right_complementable Abelian
            add-associative right_zeroed RLSStruct
for X, Y, Z being Subset of T
  holds X (+) (Y (-) Z) c= (X (+) Y) (-) Z

To find this proof, our concept matcher first used the statistical analogy between
addition11 and subtraction12 in additive structures (addMagma). By doing that,
it inadvertently constructed as a conjecture the theorem MATHMORP:2613 , that
actually immediately succeeds MATHMORP:25 in MML. This alone is remarkable,
because this theorem was not known at the point when MATHMORP:25 was being
proved. Using premise selection and Vampire, MATHMORP:26 was proved in 4 s,
and a particular back-translated lemma from its proof turned out to be prov-
able and crucial for proving MATHMORP:25 automatically. This lemma is actually
“trivial” for Mizar, since it follows by climbing Mizar’s extensive type hierar-
chy [6] from an existing typing of the ‘‘(-)’’ function. However, as mentioned
above, we were not able to get this proof without this lemma even with much
higher time limits.


6    Conclusion and Future Work

We have investigated the application of a concept matching algorithm to for-
mulate conjectures through analogies. We have described a way to combine it
with premise selection methods and ATPs. This was designed to create potential
intermediate lemmas that help an ATP to find complex proofs.
    While these are just first experiments, it seems that statistical concept match-
ing can occasionally already come up with plausible conjectures without resorting
to the (in large libraries rather impossible) brute-force term enumeration meth-
ods. So far we do not even use any of the manually invented heuristic methods
such as those pioneered by Lenat [11] and Fajtlowicz [3], and rather rely on
a data-driven approach. Such heuristics and other methods could be combined
with the statistical ones.
    We can likely improve the matching algorithm by allowing the concepts to be
represented by more complex term structures [14]. This may help us to connect
concepts from more different domains. In the same direction, we could also relax
our concept of properties to allow matching with errors. A more generic solution
would be to try different shapes of theorems using substitutions trees or genetic
programming, but this might need efficient implementation.
    We can also modify how the matching algorithm and the AI-ATP system
are combined. A simple approach is to enhance the premise selection algorithm
of the AI-ATP system with the discovered similarities between concepts. In our
experiments we also observe an increasing number of conjectures given by the
11
   http://mizar.cs.ualberta.ca/~mptp/7.13.01_4.181.1147/html/rusub_4.html#K6
12
   http://mizar.cs.ualberta.ca/~mptp/7.13.01_4.181.1147/html/mathmorp.html#K3
13
   http://mizar.cs.ualberta.ca/~mptp/7.13.01_4.181.1147/html/mathmorp.html#T26
number of possible substitutions. A heuristic semantic evaluation could comple-
ment the substitutions scores to estimate the likely of a conjecture to be true.


Acknowledgments

This work has been supported by the Austrian Science Fund (FWF) grant
P26201 and by the European Research Council (ERC) grant AI4REASON.


References

 1. Grzegorz Bancerek, Czeslaw Bylinski, Adam Grabowski, Artur Kornilowicz, Ro-
    man Matuszewski, Adam Naumowicz, Karol Pak, and Josef Urban. Mizar: State-
    of-the-art and beyond. In Manfred Kerber, Jacques Carette, Cezary Kaliszyk,
    Florian Rabe, and Volker Sorge, editors, Intelligent Computer Mathematics - In-
    ternational Conference, CICM 2015, volume 9150 of Lecture Notes in Computer
    Science, pages 261–279. Springer, 2015.
 2. Jasmin C. Blanchette, Cezary Kaliszyk, Lawrence C. Paulson, and Josef Urban.
    Hammering towards QED. J. Formalized Reasoning, 9(1):101–148, 2016.
 3. Siemion Fajtlowicz. On conjectures of graffiti. Discrete Mathematics, 72(1-3):113–
    118, 1988.
 4. Thibault Gauthier and Cezary Kaliszyk. Matching concepts across HOL libraries.
    In Stephen Watt, James Davenport, Alan Sexton, Petr Sojka, and Josef Urban, edi-
    tors, Proc. of the 7th Conference on Intelligent Computer Mathematics (CICM’14),
    volume 8543 of LNCS, pages 267–281. Springer Verlag, 2014.
 5. Thibault Gauthier and Cezary Kaliszyk. Sharing HOL4 and HOL light proof
    knowledge. In Martin Davis, Ansgar Fehnker, Annabelle McIver, and Andrei
    Voronkov, editors, Logic for Programming, Artificial Intelligence, and Reasoning -
    20th International Conference, LPAR-20 2015, Suva, Fiji, November 24-28, 2015,
    Proceedings, volume 9450 of Lecture Notes in Computer Science, pages 372–386.
    Springer, 2015.
 6. Adam Grabowski, Artur Kornilowicz, and Adam Naumowicz. Mizar in a nutshell.
    Journal of Formalized Reasoning, 3(2):153–245, 2010.
 7. Thomas Hales. Dense Sphere Packings: A Blueprint for Formal Proofs, volume
    400 of London Mathematical Society Lecture Note Series. Cambridge University
    Press, 2012.
 8. Cezary Kaliszyk and Josef Urban. Learning-assisted automated reasoning with
    Flyspeck. Journal of Automated Reasoning, 53(2):173–213, 2014.
 9. Cezary Kaliszyk and Josef Urban. MizAR 40 for Mizar 40. J. Autom. Reasoning,
    55(3):245–256, 2015.
10. Laura Kovács and Andrei Voronkov. First-order theorem proving and Vampire. In
    Natasha Sharygina and Helmut Veith, editors, Proceedings of the 25th International
    Conference on Computer Aided Verification (CAV), volume 8044 of LNCS, pages
    1–35. Springer, 2013.
11. Douglas Lenat. An Artificial Intelligence Approach to Discovery in Mathematics.
    PhD thesis, Stanford University, Stanford, USA, 1976.
12. Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL: A Proof
    Assistant for Higher-Order Logic, volume 2283 of LNCS. Springer, 2002.
13. Florian Rabe. The MMT API: A generic MKM system. In Jacques Carette, David
    Aspinall, Christoph Lange, Petr Sojka, and Wolfgang Windsteiger, editors, Proc.
    of the 6th Conference on Intelligent Computer Mathematics (CICM’13), volume
    7961 of LNCS, pages 339–343. Springer, 2013.
14. Jiřı́ Vyskočil, David Stanovský, and Josef Urban. Automated Proof Compression
    by Invention of New Definitions. In Edmund M. Clarke and Andrei Voronkov,
    editors, LPAR (Dakar), volume 6355 of LNCS, pages 447–462. Springer, 2010.

</pre>