Structure in Theorem Proving: Analyzing and
Improving the Isabelle Archive of Formal Proofs
Fabian Huch
Technische Universität München, Boltzmannstraße 3, 85748 Garching, Germany


                                      Abstract
                                      The Isabelle Archive of Formal Proofs has grown to a significant size in the past years. It makes up for
                                      an impressive body of research, which enables a number of statistical approaches to various aspects in
                                      theorem proving, and has not yet been utilized exhaustively. However, the growing size also poses some
                                      challenges to address: Material becomes increasingly harder to find, reusability and ease of understanding
                                      become more important. This thesis abstract summarizes my research plans on those topics and briefly
                                      touches on preliminary results, which indicate that the node in-degree of the dependency graph of the
                                      archive follows a scale-free distribution.

                                      Keywords
                                      theorem proving, software engineering, complex networks, proof mining


1. Introduction
Isabelle is an interactive theorem prover [1] with a large collection of formalized material, the
Archive of Formal Proofs (AFP). At the time of writing, the AFP consists of nearly three million
lines of code, and more than 167 000 lemmas have been proven in its close to 600 different
entries [2].
   The entities defined by those theories and their relationships can be described as a dependency
graph (sometimes also referred to as General Dependency Network [3]). For software systems,
dependency graphs have been found to exhibit certain structural properties. For instance, their
                                                                                              −𝛾
in-degree (𝑘𝑖𝑛 ) distribution is often scale-free, i.e. it follows some power law Pr(𝑘𝑖𝑛 ) ∝ 𝑘𝑖𝑛 [4],
in contrast to random graphs where each possible edge is present with a fixed probability
(Erdős–Rény model). Graphs with such topological properties are called complex networks,
which have been studied in the context of real-world graphs from many different areas [5]. In
contrast to research focused on object-oriented characteristics (e.g., classes and inheritance),
structural analysis on dependency graphs of software systems can be transferred to the field of
formal theories more directly.
   Another important application of this dependency network is proof automation. Data mining
methods to extract patterns from graphs have been extensively studied, for instance frequent
itemset mining to find subsets of nodes that frequently occur together [6, 7]. The AFP provides

CICM 2021: 14th Conference on Intelligent Computer Mathematics, July 26–31, 2021, Timisoara, Romania
Envelope-Open huch@in.tum.de (F. Huch)
GLOBE https://www21.in.tum.de/home/~huch (F. Huch)
Orcid 0000-0002-9418-1580 (F. Huch)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
enough data to enable pattern mining for proofs as well as evaluate the effectiveness of derived
automation.


2. Related Work
Regarding the AFP, empirical analysis has already been done by Blanchette et al. in [8]. They
measured the AFP size and authorship distribution and how those evolved over time, computed
the session graph, proof depths, and compared proof size to definition size and statement com-
plexity (by number of clauses). Moreover, they benchmarked how many theorems sledgehammer
could solve on a representative test suite. This research provides a good baseline of empirical
data; I plan to follow up on it by analyzing the underlying GDN of AFP theories, which can
possibly yield some deeper results.
   Very recently, usability of the AFP has also gotten some attention. Two different search
engines for theory contents were introduced: The FindFacts search by Huch and Krauss in [9],
and the concept-oriented SErAPIS engine by Stathopoulos et al. in [10]. Additionally, MacKenzie
et al. evaluated usability of the AFP webpage in [11] and built the prototype for a re-design.
   Complex networks for software systems were first introduced by Myers in [4]. In [12],
Zimmermann and Nagappan measured the correlation between software defects and a com-
prehensive list of network metrics; Šubelj and Bajec later surveyed quality indicators, and
performed package prediction by clustering on the network [13]. While there is no concern
about defects in theorem proving due to the nature of the field, my aim is to transfer some of
those findings to formal theories in order to detect structural problems that impair reusability
and readability.
   For proof automation, the idea of learning from existing material has been around for some
time. In Isabelle, Duncan derived tactics from proofs script sequences using Markov models and
genetic programming with the goal of full automatization in [14], which was successful for less
complex lemmas. More recently, Nawaz et al. used high utility itemset mining on a syntactic
level to discover patterns in proofs scripts of PVS [15]. In contrast, I aim to extract patterns
from the more fine-grained theorem dependency graph. This also makes it possible to utilize
structured Isar proofs in the pattern extraction.


3. Research Topics
In the following, my research plans are separated into concrete questions and topics that involve
engineering tasks.

3.1. Research Questions
RQ1: Do theory dependency networks follow the topological patterns typically found in
     complex networks?

RQ2: Which metrics are relevant quality indicators for formal theories? Can they expose
     structural problems?
RQ3: Is it possible to detect elements that need to be refactored? Can theory- and session-
     structure be predicted to generate refactoring recommendations?

RQ4: How can visualization be helpful to better understand the structure of formalizations?

RQ5: Which patterns can we learn from theorem networks? Can they be used to improve
     automation?

3.2. Further Topics
There are several aspects I want to improve on in the AFP. By introducing digital object
identifiers, entries can be much better referenced. Identifiers could potentially even be assigned
to individual concepts of an entry, though creating such a mapping poses lots of challenges.
Moreover, introducing more fields for entries, such as subject classification and links to (print-
)publications, can help organizing the AFP in the future. Finally, the AFP website needs to be
improved for better usability – there is already a prototype (as discussed in section 2), which
still needs to be integrated.
   Currently, the AFP does not allow proofs by s o r r y (which are admitted with the help of an
oracle [16]). However, it is often desirable to have what is sometimes referred to as Formal
Abstract [17]: a formalization of results from literature without rigorous proofs. To that end, I
want do add a proof by reference mechanism, which weakly checks references and allows such
proofs to be added to the AFP in a separate, less trusted, session group.
   In Isabelle, I plan to integrate the FindFacts search into the prover IDE (and add support for
type-classes and locales), as well as tooling for clone detection and advanced IDE functionality
such as structural search and replace.


4. Preliminary Results
The dependency graph of Isabelle and the AFP (release 2021) consists of approximately 2.1 ⋅ 106
nodes and 2.5 ⋅ 108 directed edges (when considering all types of entities and relationships).
Figure 1 shows the in-degree distribution of that graph. For 𝑘𝑖𝑛 > 2, it is clearly linear on the
log-log scale; this indicates a power-law distribution, which is typical for scale-free networks
[5]. The individual AFP components show similar characteristics; this illustrates why complex
network science is relevant to the topic.


References
 [1] L. C. Paulson, Isabelle: The next seven hundred theorem provers, in: E. Lusk, R. Overbeek
     (Eds.), 9th International Conference on Automated Deduction, Springer Berlin Heidelberg,
     Berlin, Heidelberg, 1988, pp. 772–773.
 [2] Archive of Formal Proofs, Statistics, 2021. URL: https://www.isa-afp.org/statistics.html.
 [3] M. Savic, M. Ivanovic, Graph clustering evaluation metrics as software metrics, in:
     Z. Budimac, T. G. Grbac (Eds.), Proceedings of the 3rd Workshop on Software Quality
     Analysis, Monitoring, Improvement and Applications (SQAMIA 2014), Lovran, Croatia,
                                   10−1

                                   10−2

                                   10−3

                        Pr(𝑘𝑖𝑛 )
                                   10−4

                                   10−5

                                   10−6

                                          100   101   102   103    104   105    106
                                                             𝑘𝑖𝑛
Figure 1: In-Degree distribution the dependency graph from Isabelle and AFP theories (release 2021),
on a log-log scale.


     September 19-22, 2014, volume 1266 of CEUR Workshop Proceedings, CEUR-WS.org, 2014,
     pp. 81–89.
 [4] C. R. Myers, Software systems as complex networks: Structure, function, and evolvability
     of software collaboration graphs, Phys. Rev. E 68 (2003) 046116. doi:1 0 . 1 1 0 3 / P h y s R e v E . 6 8 .
     046116.
 [5] R. V. Solé, R. Ferrer-Cancho, J. M. Montoya, S. Valverde, Selection, tinkering, and emergence
     in complex networks, Complexity 8 (2002) 20–33. doi:1 0 . 1 0 0 2 / c p l x . 1 0 0 5 5 .
 [6] C. C. Aggarwal, H. Wang (Eds.), Managing and Mining Graph Data, volume 40 of Advances
     in Database Systems, Springer US, 2010. doi:1 0 . 1 0 0 7 / 9 7 8 - 1 - 4 4 1 9 - 6 0 4 5 - 0 .
 [7] D. J. Cook, L. B. Holder, Mining Graph Data, John Wiley & Sons, 2006. doi:1 0 . 1 0 0 2 /
     9780470073049.
 [8] J. C. Blanchette, M. Haslbeck, D. Matichuk, T. Nipkow, Mining the Archive of Formal
     Proofs, in: M. Kerber, J. Carette, C. Kaliszyk, F. Rabe, V. Sorge (Eds.), Intelligent Computer
     Mathematics, Springer International Publishing, Cham, 2015, pp. 3–17.
 [9] F. Huch, A. Krauss, Findfacts: A scalable theorem search, in: Isabelle Workshop, 2020.
[10] Y. Stathopoulos, A. Koutsoukou-Argyraki, L. Paulson, Serapis: A concept-oriented search
     engine for the isabelle libraries based on natural language, in: Isabelle Workshop, 2020.
[11] C. MacKenzie, J. Fleuriot, J. Vaughan, An evaluation of the archive of formal proofs, arXiv
     preprint arXiv:2104.01052 (2021).
[12] T. Zimmermann, N. Nagappan, Predicting defects using network analysis on dependency
     graphs, in: Proceedings of the 30th International Conference on Software Engineering,
     ICSE ’08, Association for Computing Machinery, New York, NY, USA, 2008, p. 531–540.
     doi:1 0 . 1 1 4 5 / 1 3 6 8 0 8 8 . 1 3 6 8 1 6 1 .
[13] L. Šubelj, M. Bajec, Software systems through complex networks science: Review, analysis
     and applications, in: Proceedings of the First International Workshop on Software Mining,
     SoftwareMining ’12, Association for Computing Machinery, New York, NY, USA, 2012, p.
     9–16. doi:1 0 . 1 1 4 5 / 2 3 8 4 4 1 6 . 2 3 8 4 4 1 8 .
[14] H. Duncan, The Use of Data-Mining for the Automatic Formation of Tactics, Ph.D. thesis,
     University of Edinburgh, 2007.
[15] M. S. Nawaz, P. Fournier-Viger, J. Zhang, Proof Learning in PVS with Utility Pattern
     Mining, IEEE Access 8 (2020) 119806–119818. doi:1 0 . 1 1 0 9 / A C C E S S . 2 0 2 0 . 3 0 0 4 1 9 9 .
[16] M. Wenzel, et al., The Isabelle/Isar reference manual, Technische Universität München,
     2021.
[17] T. Hales, Big conjectures, in: Computer-aided mathematical proof, 2017.