<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Learning for Automated Theorem Proving</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stanisław J. Purgał</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Innsbruck</institution>
          ,
          <addr-line>Innsbruck, Tirol</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The field of Artificial Intelligence has seen great advances with the use of Deep Neural Networks. However, the problem of creating a system capable of abstract reasoning remains unsolved. One way to force AI to perform abstract reasoning is to directly learn an abstract task - such as Automated Theorem Proving. In ATP, the goal is to construct a formal proof of a mathematical statement. This requires ifnding a correct sequence of inferences in an exponentially large space of possibilities. This search can be guided by a Deep Neural Network. Towards this end, I have worked on creating a neural architecture that would work well with mathematical formulas and providing a way to generate training data to train such neural networks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Research questions</title>
      <p>In applying deep neural networks to theorem proving I see two main challenges:
1. providing a neural network architecture that works well with formal mathematical
descriptions
2. providing a way to train such network, if possible without relying on human input</p>
    </sec>
    <sec id="sec-3">
      <title>3. Neural models for formulas</title>
      <p>One natural idea for processing mathematical concepts with neural networks is to represent
them as graphs. However, unlike image processing, where convolutions and pooling is a well
performing standard, there is no such architecture for Graph Neural Networks (GNN).</p>
      <p>
        Commonly used pattern is message passing, where every layer combines information from
neighbouring nodes to compute next layer’s node embeddings. This method however has
several drawbacks. For one, information can only travel one edge per layer, so combining
information from far away nodes requires many layers. This problem is made worse by the
fact that GNNs tend to perform badly with large number of layers. Another problem is the fact
that such network can recognize graphs only up to Weisfeiler–Lehman isomorphism test [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ],
meaning that if the test says the graphs are isomorphic, the networks will process the graphs as
if they really were exactly the same — even if they are not.
      </p>
      <p>
        My work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] seeks to overcome these problems by adding two modification to standard
message-passing scheme.
      </p>
      <p>• expanding window – in every layer this network aggregates information from nodes
exponentially far away (in th layer use nodes within 2 distance)
• random identifiers – as part initial ( 0th layer) embedding nodes are given a random
sequence of 1s and 0s. This allows the network to tell the diference between being
connected to the same node and being connected to a node with the same neighbourhood.</p>
      <p>
        A similar idea was also explored in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>While this approach does solve the aforementioned problems, it does not seem to produce
any significant improvement in final performance.</p>
      <p>
        I also developed a formula embedding [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], that that would allow extracting subformulae with
linear transformations (thus easily done by neural layers). We do it by training an autoencoder.
We test these embeddings by training a simple feed-forward network to perform premise
selection classification, using learned earlier embeddings. This achieves 70% accuracy, while
SotA on this dataset is 81% (when training directly on the dataset).
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Self-play</title>
      <p>To actually make use of a neural network we need to be able to provide it with data. One
obvious approach is to use human-generated data and do imitation learning. However, if we
want to be able to solve dificult problems, merely imitating humans might not be enough.</p>
      <p>It would be better to learn from the ground truth, that is straight from mathematical results.
Some attempts at this have been made in the meantime, like [8, 9] where an algorithm tries to
prove theorems from a given dataset and learns from successful attempts – this however requires
a robust dataset of theorems and presents a clear limit of what can be learned. A diferent
attempt [10] generates a dataset of synthetic theorems and learns from it, thus removing the
need for a human provided dataset, however this allows for only a shallow exploration of
possible theorems.</p>
      <p>
        My first attempt at this problem uses an algorithm that was already proven to efectively
explore and learn an extensive field – the board game of Go. The AlphaZero [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] algorithm learns
by playing the game against itself, then uses the generated games to learn and improve. Then
it generates better games and so on. This allows a deep exploration towards interesting and
valuable areas in the space of possible games by continuously using already learned knowledge
to learn more.
      </p>
      <p>In my work, I train a neural network to play a game, where one player generates a provable
theorem and the other proves it. Here, improvement of one driven improvement of the other in
a continuous learning process.</p>
      <p>The main theoretical problem is the goal of the training, which is finding theorems that
are provable but hard to prove. This should result in the adversary (the player constructing
theorems) learning some way to construct uninteresting but hard theorems, and then forever
winning the game. This would cause the prover to forget everything it learned in a vain attempt
to learn to prove theorems which it can never learn to prove because of computational limits.</p>
      <p>The interesting question then becomes, how smart can a prover get before this happens? In
my experiments however, this does not seem to happen, as the neither the prover not adversary
seem to get smart enough, instead producing simple but obfuscated theorems. As of yet I cannot
answer whether it is due to my lackluster implementation, small computational resources
(compared to AlphaZero experiments) or some problem inherent to the algorithm.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Future research</title>
      <p>
        Beyond the two research results discussed above [
        <xref ref-type="bibr" rid="ref5 ref7">5, 7</xref>
        ], my future research involves working
on the theoretical problem mentioned above, to somehow change the objective from finding
hard to prove theorems. One such objective would be finding theorems useful in proving other
theorems.
URL: https://doi.org/10.1093/logcom/exab006. doi:10.1093/logcom/exab006.
arXiv:https://academic.oup.com/logcom/advance-article-pdf/doi/10.1093/logcom/exab
exab006.
[8] C. Kaliszyk, J. Urban, H. Michalewski, M. Olšák, Reinforcement learning of
theorem proving, in: S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N.
CesaBianchi, R. Garnett (Eds.), Advances in Neural Information Processing Systems
31, Curran Associates, Inc., 2018, pp. 8836–8847. URL: http://papers.nips.cc/paper/
8098-reinforcement-learning-of-theorem-proving.pdf.
[9] K. Bansal, C. Szegedy, M. N. Rabe, S. M. Loos, V. Toman, Learning to reason in large
theories without imitation, 2019. arXiv:1905.10501.
[10] V. Firoiu, E. Aygun, A. Anand, Z. Ahmed, X. Glorot, L. Orseau, L. Zhang, D.
Precup, S. Mourad, Training a first-order theorem prover from synthetic data, 2021.
arXiv:2103.03798.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rigazio</surname>
          </string-name>
          ,
          <article-title>Towards deep neural network architectures robust to adversarial examples</article-title>
          ,
          <year>2014</year>
          . arXiv:
          <volume>1412</volume>
          .
          <fpage>5068</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Maddison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Guez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Van Den Driessche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schrittwieser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Antonoglou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Panneershelvam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lanctot</surname>
          </string-name>
          , et al.,
          <article-title>Mastering the game of go with deep neural networks and tree search</article-title>
          ,
          <source>nature</source>
          <volume>529</volume>
          (
          <year>2016</year>
          )
          <fpage>484</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Weisfeiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Lehman</surname>
          </string-name>
          ,
          <article-title>A reduction of a graph to a canonical form and an algebra arising during this reduction, Nauchno-Technicheskaya Informatsia 2 (</article-title>
          <year>1968</year>
          )
          <fpage>12</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jegelka</surname>
          </string-name>
          ,
          <article-title>How powerful are graph neural networks?</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>00826</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Purgał</surname>
          </string-name>
          ,
          <article-title>Improving expressivity of graph neural networks</article-title>
          , CoRR abs/
          <year>2004</year>
          .05994 (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2004</year>
          .05994. arXiv:
          <year>2004</year>
          .05994.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Dasoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Scaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Virmaux</surname>
          </string-name>
          ,
          <article-title>Coloring graph neural networks for node disambiguation</article-title>
          , CoRR abs/
          <year>1912</year>
          .06058 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1912</year>
          .06058. arXiv:
          <year>1912</year>
          .06058.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Purgał</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Parsert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kaliszyk</surname>
          </string-name>
          ,
          <article-title>A study of continuous vector representations for theorem proving</article-title>
          ,
          <source>Journal of Logic and Computation</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>