=Paper=
{{Paper
|id=Vol-3613/AReCCa2023_paper5
|storemode=property
|title=Structure-Generating First-Order Theorem Proving
|pdfUrl=https://ceur-ws.org/Vol-3613/AReCCa2023_paper5.pdf
|volume=Vol-3613
|authors=Christoph Wernhard
|dblpUrl=https://dblp.org/rec/conf/arecca/Wernhard23
}}
==Structure-Generating First-Order Theorem Proving==
<pdf width="1500px">https://ceur-ws.org/Vol-3613/AReCCa2023_paper5.pdf</pdf>
<pre>
                                Structure-Generating First-Order Theorem Proving
                                Christoph Wernhard
                                University of Potsdam, Germany


                                                                      Abstract
                                                                      Provers based on the connection method can become much more powerful than currently believed. We
                                                                      substantiate this thesis with certain generalizations of known techniques. In particular, we generalize
                                                                      proof structure enumeration interwoven with uniﬁcation – the proceeding of goal-driven connection
                                                                      and clausal tableaux provers – to an interplay of goal- and axiom-driven processing. It permits lemma
                                                                      re-use and heuristic restrictions known from saturating provers. Proof structure terms, proof objects that
                                                                      allow to specify and implement various ways of building proofs, are central for our approach. Meredith’s
                                                                      condensed detachment represents the prototypical base case of such proof structure terms. Hence, we
                                                                      focus on condensed detachment problems, ﬁrst-order Horn problems of a speciﬁc form. Experiments
                                                                      show that for this problem class the approach keeps up with state-of-the-art ﬁrst-order provers, leads to
                                                                      remarkably short proofs, solves an ATP challenge problem, and is useful in machine learning for ATP. A
                                                                      general aim is to make ATP more accessible to systematic investigations in the space between calculi
                                                                      and implementation aspects.


                                1. Introduction
                                Our thesis is that provers based on the connection method (CM) can become much more
                                powerful than currently believed. We substantiate this with certain generalizations of known
                                techniques.
                                   Current realizations of the CM operate by enumerating proof structures, interwoven with
                                uniﬁcation of atomic formulas that are associated with nodes in the structure. The scope of
                                variables in these formulas includes the whole structure; in technical terms, they are rigid [1].
                                Failure of uniﬁcation on a partially built-up structure eﬀects backtracking. All those structures
                                that would extend this partial structure are abandoned at once.
                                   As the proof structures are often represented in a tree form as clausal tableaux [2], we
                                call provers that proceed in this way CM-CT provers, for Connection Method/Clausal Tableaux.
                                Examples are the Prolog Technology Theorem Prover [3], SETHEO [4], leanCoP [5, 6] and
                                CMProver [7, 8, 9]. Aside of the CM and clausal tableaux also model elimination [10] is a
                                conceptual model that covers such provers. The underlying principle of enumerating proof
                                structures in combination with uniﬁcation is already present in [11, 12], as an improvement
                                compared to enumerating formula instantiations. In [13, Sect. 3] resolution theorem proving is
                                contrasted as a formula enumeration process with tableau enumeration. With techniques based
                                on proof structure enumeration, each proof structure usually appears at most once, while, e.g.,
                                in proof search by resolution the same subproof may appear several times.

                                AReCCa 2023: Automated Reasoning with Connection Calculi, 18 September 2023, Prague, Czech Republic
                                $ info@christophwernhard.com (C. Wernhard)
                                                                    © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                 CEUR
                                 Workshop
                                 Proceedings
                                               http://ceur-ws.org
                                               ISSN 1613-0073
                                                                    CEUR Workshop Proceedings (CEUR-WS.org)


                                AReCCa 2023                                                                                            64                                                           CEUR-WS.org


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   Typically, CM-CT provers perform the structure enumeration wrapped in iterative deepening
upon the maximal value of some structure size measure. A basic motivation there is completeness
while avoiding the storage requirements of breadth-ﬁrst search [3]. In addition, alternate depth
measures are explored as heuristics variations, e.g., [4, Sect. 6.2].
   CM-CT provers proceed goal-driven through an initially instantiated goal formula at the top
of the proof structure, e.g., with a ground clause obtained from Skolemizing universal variables
in the original goal formula.
   Structure-generating ﬁrst-order theorem proving as discussed here generalizes these principles.
One major inﬂuence is the consideration of proof structure terms in Carew A. Meredith’s
condensed detachment (CD) [14]. These are full binary trees,1 called D-terms [15], referring to
their convenient presentation as terms with the binary function symbol D. A D-term permits to
associate a most general formula that is proven by it if the constants or leaf nodes are associated
with given axioms. See, e.g., [16] for original examples. D-terms allow to specify many useful
operations on proof structures in immediate and straightforward ways. For example, notions
of proof size. Or proof composition: proofs of lemmas can simply be inserted into an overall
proof term. Such a composition of proofs is, for example, necessary to obtain an overall proof
in the case where precomputed and preselected lemmas are supplied as additional axioms to
the theorem prover [17]. Also decomposition of proofs is straightforward: Each subterm of a
D-term is itself a D-term that represents the proof of a lemma.
   CD plays a core role in witness theory [18] and is known [19] as the ﬁrst technical approach
to the Curry-Howard correspondence or “formulas as types” [20, 21]. It provides the basis of
Metamath2 [22, 23], a rather successful computer-processable language for verifying, archiving,
and presenting mathematical proofs. The problems covered by CD form a subclass of ﬁrst-order
Horn problems. Although many historic successes in ATP were achieved involving CD problems
[24, 25, 26, 27, 28, 29, 30, 31, 32], CD was in ATP mostly considered just as a restricted form of
hyperresolution. Emphasis on its proof structures became prevalent in ATP only later, based on
observing parallels to the CM [33, 15].
   Thus our investigations focuses on CD problems, for which these D-terms are readily available,
and we call our prototypical system to investigate the approach SGCD – Structure-Generating
theorem proving for Condensed Detachment. Our starting point is enumerating proof structures,
interwoven with uniﬁcation, as performed by the CM-CT provers. SGCD computes proofs in
increasing levels, characterized for example by some size measure of the D-terms, e.g., number
of nodes or height, similar to the iterative deepening in CM-CT provers. The simple form of the
D-terms, just full binary trees, allows to ﬁnd precise upper bounds of the search space growth
for increasing levels in the The On-Line Encyclopedia of Integer Sequences (OEIS) [34]. Also
measures that were so far hardly considered for CM-CT provers can be taken into account. For
example compacted size, that is, the number of nodes of the minimal DAG representation of the
proof tree. Or we can characterize sets of structures level by level in some inductive way. With
one speciﬁc such characterization, the so-called PSP-level (PSP for Proof-SubProof ), SGCD ﬁnds
a strikingly short proof of a historic problem by Łukasiewicz [15] and a proof of a problem that
was solved by Meredith but was so far very hard for ATP [17, 31].

1
    A binary tree is full if each node has 2 or 0 children.
2
    https://us.metamath.org/.


                                                              65
   Another issue to address is the purely goal-driven operation of the CM-CT provers. Their
depth-ﬁrst search eﬀects that subgoals are re-proven again and again. Moreover, information
about the goal in form of the initial goal instantiation is often propagated only to a few subgoals
that are very close to the overall goal.
   From our enumeration view, instantiating the top node with a given goal is just an optional
possibility. We can also leave it with variables and collect their bindings in each enumerated
structure. Under such a binding, the goal is a lemma formula that can be proven by the structure
in a given level from given axioms. Alternate proof structures yield alternate bindings, alternate
lemmas. We call this mode axiom-driven. SETHEO was used for bottom-up preprocessing in this
way [35]. Hypertableaux [36] may be viewed as proving by maintaining such derived lemmas
in a speciﬁc way. SGCD can be invoked in a loop where goal-driven phases alternate with
axiom-driven phases, for increasing levels. The parameters that control this loop are highly
conﬁgurable. Unit lemmas obtained by the axiom-driven phases are cached and available in
later phases, both axiom- and goal-driven. They can be deleted on the basis of heuristics that are
well-known from saturating theorem proving, for example deleting lemmas whose term height
exceeds a threshold. Such heuristics are usually not available in CM-CT provers, because the
atom instances maintained by them are deeper instantiated, impacted always from the overall
proof structure in contrast to just the substructure that proves the lemma.
   In a typical conﬁguration SGCD starts with an empty cache and proceeds in a main loop
over a “current” level initialized with 0. As level we may, e.g., assume tree size, the number of
inner nodes of the proof tree. In each iteration of the loop SGCD ﬁrst tries to prove the goal in
the goal-driven way with iterative deepening upon a level that starts with the current level, is
incremented in each deepening step by one, and ends with, say, the current level plus 2. Much
like a CM-CT prover with iterative deepening except that: (1) Iterative deepening starts at the
current level and is only tried for two more levels; (2) In addition to the original axioms also
cached lemmas are used to solve subgoals. If no proof is found in the goal-driven phase, SGCD
invokes the axiom-driven mode to generate lemmas in the current level, which are then put
into the cache. For generating these lemmas it solves subgoals by lemmas in smaller levels that
were cached in previous iterations of the main loop. With a generated lemma not just its proof
but also its level is recorded. The level is then available to control the search when the lemma is
used to solve a subgoal in either goal- or axiom-driven mode. If, for example, the search is for
an overall proof with tree size 10 where the fragment under construction already has tree size 8,
then only lemmas with a tree size of at most 2 may be attached to the fragment. After placing
the lemmas generated by the axiom-driven phase in the cache the current level is incremented
by one and SGCD moves to the next iteration of its main loop.
   In the recent terminology of [37], our approach combines the following features: Cs (separa-
tion of proof structural information from the formula), Cm (mixed manner of operation, that
combines starting from the goal theorem (Cg ) and starting from the axioms), and a form of Cc
(cut) through unit lemmas.
   In the following Sect. 2 we provide more background on condensed detachment. Then, in
Sect. 3 we explicate our approach with presenting the system SGCD (Structure-Generating
theorem proving for Condensed Detachment), covering the range from general concepts to
implementation aspects and experimental results. In Sect. 4 we discus some particular open
issues and speculative ideas concerning the approach. Section 5 concludes the paper.


                                                66
  Detailed descriptions of experiments are provided in [38] and [17].
  The system is available as free software from http://cs.christophwernhard.com/cdtools/,
where also additional data and presentations for the experiments can be found, including
HTML-formatted comprehensive result tables and full logs.


2. Background: Condensed Detachment for ATP
CD was developed in the mid-1950s by Carew A. Meredith as an evolution of the method of
substitution and detachment practiced by Łukasiewicz [14, 39, 40, 41]. Reasoning steps are by
detachment, or modus ponens, under implicit substitution by most general uniﬁers. Its basic
application ﬁeld is the investigation of axiomatizations of propositional logics from a ﬁrst-order
meta level. Some of the most advanced formal proofs ever developed by humans were such
applications of CD by Meredith and some other researchers. From a ﬁrst-order ATP perspective,
a CD problem consists of proper axioms (brieﬂy axioms), that is, positive unit clauses, a goal
theorem, that is, a single negative ground unit clause (representing a universally quantiﬁed
atomic goal theorem after Skolemization), and the following ternary Horn clause that models
detachment.
                                    def
                                Det =   P(i(x, y)) ∧ P(x) → P(y).
The premises of Det are called major and minor premise, respectively. All atoms in the problem
have the same predicate P, which is unary and stands for something like provable. The formulas
of the investigated propositional logic are expressed as terms, where the binary function symbol i
stands for implies.
   CD may be considered as an inference rule. With this view, the most straightforward way to
describe a CD inference step from an ATP perspective is to consider it as a positive hyperresolution
step such that from Det and two positive unit clauses, used as major and minor premise, a
third positive unit clause is inferred. A CD proof is a proof of a CD problem constructed with
the CD inference rule, in contrast to, for example, a proof involving binary resolution steps
involving Det that yield non-unit resolvents. Prover9 [42] actually by default chooses positive
hyperresolution as inference rule for CD problems and thus produces CD proofs for these.
   The structure of CD proofs can be represented in a very simple and convenient way as full
binary trees or terms, which can be directly taken into account as objects in proving. In ATP
we ﬁnd this aspect in the connection methods (CM) [43, 44, 45], where the proof structure as a
whole is in the focus, in contrast to extending a set of formulas by inferred formulas. This view
of CD has been made precise and elaborated in [33, 15].
   As already mentioned, we call the structure representations of CD proofs D-terms. A D-term is
a term built from D as binary function symbol and atom labels as constant symbols or primitive
D-Terms, labels of axioms. In other words, it is a full binary tree where the leaf nodes are labels
of axioms. The following line shows four example D-terms for axiom labels 1, 2.

                           1, 2, D(1, 1), D(D(2, 1), D(1, D(2, 1))).

The mapping between the representation of CD proofs in terms of CD inference steps and
D-terms is straightforward: The use of an axiom corresponds to a primitive D-term. A CD


                                                67
inference step corresponds to a D-term D(d1 , d2 ) where d1 is the D-term that proves the unit
clause for the major premise and d2 is the D-term that proves the unit clause for the minor
premise.
   A CD proof in full is represented by a D-term (which just describes structure) taken together
with an axiom assignment, that is, a mapping of primitive D-terms to axioms. It proves from the
axioms all instances of a most general formula that is associated through uniﬁcation, constrained
by the axioms for the leaf nodes and the requirements imposed by Det for the inner nodes.3
Following [15], we call this formula the most general theorem (MGT) of the D-term with respect
to the axiom assignment. The MGT is unique, modulo renaming of variables. Of course, for a
given axiom assignment not all D-terms necessarily have an MGT. If the uniﬁcation constraints
imposed by the axioms and Det cannot be satisﬁed, we say the D-term has no MGT. And, of
course, also depending on the axiom assignment, it is well possible that two diﬀerent D-terms
have the same MGT or that the MGT of one is subsumed by the MGT of the other. A CD problem
also includes a goal theorem, a ground atom. A D-term is a proof of the problem if its MGT
with respect to the axioms subsumes the goal theorem.
                                              def
Example 1. Consider the axiom assignment α =      {1 7→ P(i(x, i(x, x)))} declaring the primitive
D-term 1 as label of the axiom known as Mingle [32]. The MGT of the primitive D-term 1 is then
just this axiom. The MGT of the D-term D(1, 1) is P(y)σ where σ is the most general uniﬁer of
the set of pairs

                      {{P(i(x, y)), P(i(x′ , i(x′ , x′ )))}, {P(x), P(i(x′′ , i(x′′ , x′′ )))}}.

The most general uniﬁer of these pairs is σ = {x 7→ i(x′′ , i(x′′ , x′′ )), x′ 7→ i(x′′ , i(x′′ , x′′ )), y 7→
i(i(x′′ , i(x′′ , x′′ )), i(x′′ , i(x′′ , x′′ ))).}. Hence, after renaming variables (recall that the MGT is just
characterized modulo renaming of variables) the MGT of D(1, 1) is P(i(x, i(x, x)), i(x, i(x, x))).
A D-term proves as goal theorems all ground instances of its MGT, which is deﬁned with respect
to an axiom assignment. So here D(1, 1) proves P(i(a, i(a, a)), i(a, i(a, a))), and also, for example
P(i(i(a, b), i(i(a, b), i(a, b))), i(i(a, b), i(i(a, b), i(a, b)))).

  D-terms, full binary trees, facilitate characterizing and investigating structural properties of
proofs. While, for a variety of reasons, it is far from obvious how to measure the size of proofs
by ATP systems in general, for D-terms there are three immediate size measures:

        • The tree size of a D-term is the number of its inner nodes.
        • The height of a D-term is the number of edges of the longest downward path from the
          root to a leaf.
        • The compacted size of a D-term is the number of distinct compound subterms, or, in other
          words, the number of inner nodes of its minimal DAG.4

In the literature compacted size is also called length, height level and tree size CDcount [29].
3
    There is actually some ﬁne-print here with respect to the usual notion of most general uniﬁer [33, 15, 20].
4
    It is well known that a tree or set of trees is represented by a unique (modulo isomorphism) minimal DAG, which is
    maximally factored (it has no multiple occurrences of the same subtree) or, equivalently, is minimal with respect to
    the number of nodes.


                                                            68
Example 2. The D-term

                            D(D(1, D(1, 1)), D(D(1, D(1, 1)), D(D(1, 1), 1)))

has tree size 8, height 4 and compacted size 5. If we replace the second occurrence of 1 from left
with 2 we obtain
                        D(D(1, D(2, 1)), D(D(1, D(1, 1)), D(D(1, 1), 1))).
Tree size and height remain unaltered, but the compacted size is then 7.

   CD proofs suggest and allow a speciﬁc form of lemmas, which we call unit/subtree lemmas,
reﬂecting two views: As formulas, they are positive unit clauses, which can be re-used in
diﬀerent CD inference steps. In the structural view, they are subterms (or subtrees5 ) of the
overall D-term. If they occur multiply there, they would be factored in the minimal DAG of
the overall D-term. Both views are linked in that the formula of a lemma is the MGT of its
D-term. The most adequate size measure for D-terms that takes the compression achieved by
unit/subtree lemmas into account is compacted size. From the perspective of proof structure
compression methods, unit/subtree lemmas have the property that the compression target is
unique, simply because each tree is represented by a unique minimal DAG.
   A textual representation of a D-term that respects its compacted size, the DAG compression
by unit/subtree lemmas, is provided by a list of factor equations. There, distinct subproofs
with multiple incoming edges in the DAG receive numeric labels by which they are referenced.
Meredith uses such a form with Polish notation for D-terms and their MGTs, e.g., in [16], which
is discussed in [15].

Example 3. Consider the D-term D(D(1, D(1, 1)), D(D(1, D(1, 1)), D(D(1, 1), 1))) from Exam-
ple 2. As a list of factor equations it can be represented as follows.

                                          2 = D(1, 1)
                                          3 = D(1, 2)
                                          4 = D(3, D(3, D(2, 1)))

Here we assume that labels 2–4 are not used for axioms and thus are free for use as reference labels.
Label 4, which is not referenced, is associated with the root or the overall D-term.

   Compared to general ﬁrst-order ATP problems, CD problems are restricted: viewed as a
CNF to be refuted, the problem has positive unit clauses, a single negative ground clause, and
a single ternary Horn clause. Also equality is not explicitly considered. Nevertheless, core
characteristics of general ﬁrst-order ATP problems are present: ﬁrst-order variables, a binary
function symbol, and cyclic predicate dependency.6


5
  We use subtree with the meaning common in computer science and matching the notion of subterm: A subtree of a
  tree T is a tree consisting of a node in T and all of its descendants in T .
6
  In the sense of the dependency graph of a logic program. It has an edge from predicate p to predicate q if there is a
  clause with a body atom whose predicate is p and with a head whose predicate is q.


                                                          69
3. SGCD – Structure-Generating Theorem Proving for
   Condensed Detachment
We explicate our approach by means of describing the SGCD system, which is implemented
in SWI-Prolog [46]. There we enter the space between calculi and “system descriptions”. In
a sense, the calculus (e.g., condensed detachment as some inference system) just tells us in
an inductive way how proof structures are built-up and relate to proven formulas or MGTs.
Although the inductive speciﬁcation suggests a construction, for theorem proving in practice,
there are many ways in which the proof structures can actually be built. We think, this is not
merely the domain of ad-hoc low-level heuristics, but a potentially large ﬁeld that should be
systematically approached as well.
   The programming language Prolog is inherently rather close to proof structure enumeration
with incorporated uniﬁcation. In fact, it expresses arbitrary computations according to this
principle. A predicate (or nondeterministic procedure) enumerates alternate bindings of output
variables, each corresponding to a diﬀerent (implicitly represented) proof of the “predicate”,
viewed as a goal unit clause with free output variables. Thus, Prolog is for SGCD not just
suitable as an implementation language, but also to abstractly describe the involved methods.

3.1. The Core Enumeration Predicate, its Modes and the Cache
We assume CD problems with D-terms as proof structures. Further we assume that the set of
D-terms is grouped into levels, possibly but not necessarily disjoint, where each strict subterm
of a given D-term is in some strictly lower level than the given D-term. The set of the sets of all
D-terms with a speciﬁc tree size is an example of such a level grouping. We then say that the
level is characterized by the tree size.
   At the core of SGCD is a ternary predicate that, for a given level, enumerates all pairs of a
D-term in the level and a formula such that the formula is the MGT of the D-term, with respect
to some assumed axioms that do not explicitly appear as parameter.

                  enum_dterm_mgt_pairs(+Level , ?D-term, ?Formula)

  Depending on the mode in which enum_dterm_mgt_pairs is invoked, that is, which of the
D-term and Formula arguments are inputs or outputs, it realizes diﬀerent functionalities. We
assume there that as an input argument Formula is ground, representing a universally quantiﬁed
theorem after Skolemization, whereas, as an output argument it is just the MGT of D-term,
which may involve variables.

    • If D-term and Formula are both input arguments, then the predicate veriﬁes that D-term
      is a proof of Formula. The predicate then either succeeds (enumerates the empty binding
      as sole value) or fails.
    • If only D-term is an input and Formula an output, it computes the MGT of D-term. The
      predicate then either enumerates a single value for Formula or, in case D-term has no
      MGT, fails.
    • If only Formula is an input and D-term is an output, it acts as a goal-driven prover,
      enumerating proofs of Formula.


                                                70
        • If both D-term and Formula are outputs, it enumerates pairs of a proof and its MGT, that
          is, it generates lemmas with proofs in an axiom-driven way.

   Invoked goal-driven, i.e., with Formula as input and D-term as output, for increasing values
of Level it is in essence a CM-CT prover with iterative deepening, specialized to CD problems.
   To process the lemmas enumerated by axiom-driven invocations, i.e., with both D-term and
Formula as outputs, SGCD maintains a cache of hLevel, D-term, Formulai triples, solutions of
enum_dterm_mgt_pairs(Level , D-term, Formula). If lemmas are computed and cached for
increasing levels, this cache can be used within the implementation of enum_dterm_mgt_pairs
to solve subproblems from the cache instead of recomputing them, if they are for a lower level
than that of the top call. This is possible not just within axiom-driven top calls, but also within
goal-driven invocations. A part of the proof search is then replaced by retrieval from the cache.7
   Moreover, the cache can be subjected to heuristic restrictions based on formulas, the MGTs,
as known from resolution-based (or more generally, saturating) provers. There, for example,
lemmas with formulas that are subsumed by an already present lemma or whose term height
exceeds some threshold are discarded. Such heuristic restrictions are not possible with conven-
tional CM-CT provers, because these do not have MGTs at hand, but rather just formulas that
are due to rigid variables more deeper instantiated, in dependence of the context where they
occur in the proof (this is discussed in more depth as IPT vs. MGT in [15]). With such heuristic
restrictions, the replacement of search by lemmas realized with SGCD’s cache then eﬀects an
actual modiﬁcation, a heuristically motivated restriction, of the explored search space.

3.2. Level Characterizations
SGCD can be used with alternate versions of enum_dterm_mgt_pairs for diﬀerent level
characterizations, including by tree size, height, maximal tree size, and maximal height. The
number of distinct D-terms for increasing values of some size measure gives an upper bound of
the number of trees to consider in proof search by enumerating D-terms level-by-level. This
number is just an upper bound, because it does not take into account that D-term enumeration
is interwoven with uniﬁcation. Through the uniﬁcation, which is constrained by given axioms
and possibly a given goal, fragments of D-terms for which uniﬁability fails are immediately
discarded and D-terms extending them are skipped in the enumeration. Heuristic restrictions
may further limit the number of enumerated structures.
   The number of distinct D-terms for increasing values of some size measure also indicates a
measure-speciﬁc size value up to which it is easily possible to compute for given axioms all
proofs, together with the lemma formulas proven by them.
   If we assume a single axiom such that we can identify D-terms with full binary trees without
any additional labeling, the sequences of the number of distinct D-terms for increasing tree
size, height, or compacted size are well-known and can be found in the OEIS, with identiﬁers
A000108, A001699, and A254789, respectively, as shown in Table 1.
   Aside of these “conventional” criteria, our use of proof structure terms also permits inductive
structure speciﬁcations as level characterizations. Particularly prospective there is the PSP-level
(indicating Proof-SubProof ), speciﬁed as follows.
7
    This realizes a replacing form of lemma application [47, 17]


                                                           71
Table 1
The numbers of distinct D-terms for a single axiom (or full binary trees) of given size n for diﬀerent size
measures.
        n                                   0 1 2         3      4           5                  6
         Tree size           oeis:A000108    1    1    2    5    14        42                132
         Height              oeis:A001699    1    1    3   21   651   457,653    210,065,930,571
         Compacted size      oeis:A254789    1    1    3   15   111     1,119             14,487

Table 2
The numbers of distinct D-terms for a single axiom (or full binary trees) in PSP-level n and of compacted
size n.
        n                                0 1 2 3           4      5         6        7           8
        |PSPLevel (n)| oeis:A001147 1 1 3 15 105 945 10,395 135,135 2,027,025
        Compacted size oeis:A254789 1 1 3 15 111 1,119 14,487 230,943 4,395,855

  (i) Structures in PSP-level 0 are the axiom identiﬁers.
 (ii) Structures in PSP-level n + 1 are all structures D(d1 , d2 ) and D(d2 , d1 ) where d1 is a
      structure in PSP-level n and d2 is a (not necessarily strict) subterm of d1 or an axiom
      identiﬁer.
The PSP-level was motivated by an analysis [15] of Meredith’s variant [16] of Łukasiewicz’s
completeness proof for his shortest single axiom for implicational logic [48], where it was
observed that most steps in Meredith’s proof can be ascribed to the relationship between levels
that underlies the PSP-level. As an inferencing technique it is “global” in the sense that it takes
whole proof structures and combines them to a new proof structure. For this reason it may so
far have escaped the traditional inference systems, which are more locally oriented, e.g., by
extending a branch based on properties of the leaf node or adding the resolvent of two clauses.
   PSP-levels are disjoint. All D-terms in PSP-level n have compacted size n. However, the
cardinality of D-terms in PSP-level n grows slower than that of D-terms of compacted size n,
according to OEIS sequence A001147 in contrast to A254789. Table 2 compares the initial values
of both sequences. It follows that the enumeration of D-terms according to the PSP-level is
“incomplete”, that is, there are D-terms that are not a member of any PSP-level.
   Experimental successes involving enumeration by PSP-level include relatively short proofs
for problems where exhaustive search for a shortest proof seems unfeasible [38, Sect. 4.5], a
short proof for Łukasiewicz’s shortest axiom, LCL038-1 [38, Sect. 4.6][15], and an automatically
found proof of a challenge problem, the “Meredith single axiom” LCL073-1 [17].

3.3. Combining Goal- and Axiom-Driven Reasoning
SGCD combines goal- and axiom-driven invocation of enum_dterm_mgt_pairs by proceed-
ing in two nested loops, as informally outlined in the introduction and shown in Fig. 1. The
goal formula, in Skolemized ground form, is given as input parameter g. The parameters
maxLevel and preAddMaxLevel are speciﬁed with the conﬁguration. The core predicate
enum_dterm_mgt_pairs is for some particular assumed level characterization, for exam-
ple by tree size. It enumerates argument bindings nondeterministically. If it succeeds once in
the inner loop, the exception proof_found(d) returns the bound D-term d.


                                                      72
                         Cache := ∅;
                         for l := 0 to maxLevel do begin
                               for m := l to l + preAddMaxLevel do begin
                                   enum_dterm_mgt_pairs(m, d, g );
                                   throw proof_found(d)
                               end;
                               News := {hl, d, f i | enum_dterm_mgt_pairs(l, d, f )};
                               if News = ∅ then throw exhausted;
                               Cache := merge_news_into_cache(News, Cache )
                         end
Figure 1: The nested loops of the SGCD theorem proving method.


      The procedure merge_news_into_cache(News, Cache ) merges the newly generated

                                              hLevel, D-term, Formulai

triples News with the cache Cache, optionally taking into account conﬁgured heuristics. For
example, sorting the union of the triples according to increasing term height of the lemma
formulas and truncating the sorted list after, say, 1,000 entries.
   If maxLevel is conﬁgured as 0, the method proceeds purely in goal-driven mode with the
inner loop performing iterative deepening upon the level m from 0 up to preAddMaxLevel .
The similarity to CM-CT provers can even be shown empirically by comparing the sets of solved
TPTP problems (see Sect. 3.5.3). If combined with axiom-driven lemma generation, generally
successful conﬁgurations of preAddMaxLevel typically have rather low values, about 0–3.
   Of course, also variations of the basic schema of Fig. 1 can be useful. For example to enumerate
alternate proofs instead of exiting with the ﬁrst found proof through the throw statement.
   For lemma generation [17], SGCD can be run with some resource bound, e.g. a timeout or
a Prolog inference limit, or until axiom-driven enumeration exhausts by heuristic limitations.
After SGCD terminates, the content of the cache represents a set of generated lemmas. Lemmas
deleted at merge_news_into_cache may be maintained in a separate “passive” store which
can be used for example in lemma generation (Sect. 3.5.2) and theorem ﬁnding (Sect. 3.5.4).
   “Hybrid” conﬁgurations are possible, where diﬀerent level characterizations are used for the
goal- and axiom-driven phases. The cache has initially not to be empty: It is possible to initialize
it with externally generated lemmas (possibly by another invocation of SGCD) and modify the
initial settings of the l and m parameters. We call this lemma injection. Search at lower levels is
then completely replaced by retrieval from these external lemmas. SGCD with lemma injection
was used successfully for lemmas selected with the help of machine learning [17].

3.4. Implementation Aspects and the CD Tools Environment
SGCD is integrated in CD Tools, an environment to experiment with CD, which is embedded
in SWI-Prolog 8 and utilizes PIE [8, 9] as basic support for ﬁrst-order ATP. For experimenting
8
    No eﬀorts were made to support other Prolog systems. Brief tries with free systems reckoned as particularly fast,
    notably YAP and Eclipse, unfortunately gave the impression that these are broken in their recent versions. SWI-Prolog
    provides some library predicates that proved very useful, for example variant_sha1/3 or term_factorized/3.


                                                            73
with machine learning [17] SGCD was invoked via Bash scripts, also in portfolio settings where
diﬀerent conﬁgurations are run in parallel.
   CD Tools provides various interfaces to external worlds. It supports, for example, conversion
to and from Łukasiewicz’s Polish notation. There are means to enumerate and test TPTP
problems whether they represent a CD problem and to canonicalize the vocabulary of such CD
problems. A pre-deﬁned association of about 170 common names for about 120 well-known
formulas and combinators, e.g., from [40, 32], helps to specify these as axioms and to detect
them among computed lemmas. Also some support for handling combinators and λ-terms is
implemented, which is used, e.g., for the second included prover CCS [49].
   CD Tools includes implementations of many concepts and operations introduced or discussed
in [15], including D-term comparison by ≥c , various notions of regularity, variations of the
organic property (based on calling a SAT solver9 or based on previously proven lemmas), the
prime property, and n-simpliﬁcation, that is, replacing complex subproofs of minor premises
whose goal is irrelevant for the conclusion with a primitive subproof.
   Many of these operations are available to SGCD for heuristic restrictions that can be optionally
conﬁgured to be applied either immediately to the results of axiom-driven invocations of
enum_dterm_mgt_pairs or when merging new results into the cache.
   Attempting to introduce at least a little bit of goal-direction in the axiom-driven mode, an
experimental heuristic restriction takes the number of distinct subterms in a formula that do
not appear in the goal as a negative criterion.
   For handling large (millions of tree nodes) proof terms, CD Tools provides an implementation
of MGT computation and veriﬁcation that transparently operates on a factored representation.

3.5. Experimental Results
SGCD was evaluated in various experiments, most of them on the 196 pure CD problems in
TPTP 8.0.0 (there are 10 further CD problems in the TPTP with status satisﬁable or a more
complex problem form). We call this problem corpus TPTPCD. The report [38] describes several
experiments with SGCD on that corpus. Additional experiments with machine learning for ATP
and proving a challenge problem for ATP, the “Meredith single axiom” LCL073-1, are described
in [17]. Experiments with CCS, the second prover included in CD Tools, are reported in [49].
Comprehensive result tables and log ﬁles can be found at http://cs.christophwernhard.com/
cdtools/. Here we give brief summaries of the experiments with SGCD.

3.5.1. SGCD in Generally Successful Configurations
In generic conﬁgurations SGCD solves 176 of the 196 TPTPCD problems. This may be compared
with Prover9, which also creates CD proofs, like SGCD but diﬀerently from most other general
ﬁrst-order provers. Prover9 solves 168 of the problems. SGCD fails on ﬁve problems that
Prover9 can solve. In the generally successful conﬁgurations SGCD solves no “new” problems,
i.e., problems rated in the TPTP with 1.00, but 93% of the problems that are solvable at all by a
ﬁrst-order prover and 95% of the problems that can be solved by E [51]. See [38, Sect. 4.1].
9
    Via PIE, which provides a call interface to SAT and QBF solvers by translating between PIE’s Prolog term represen-
    tation of formulas and DIMACS or QDIMACS ﬁles. In experiments we used MiniSat [50].


                                                           74
   These 176 solutions are obtained as the joint results of SGCD in four conﬁgurations whose key
settings are as follows. Conﬁguration 1: level characterization by tree size; maximally 1000 cache
entries; goal-driven phases upon 2 levels (preAddMaxLevel = 1); 165 solutions. Conﬁguration 2:
similar to conﬁguration 1 but height as level characterization; 109 solutions. Conﬁguration 3:
similar to conﬁguration 1 but maximally 3000 cache entries; 161 solutions. Conﬁguration 4:
similar to conﬁguration 1 but goal-driven phases just upon 1 level (preAddMaxLevel = 0) and
strong limitations of the formula size of the cached lemmas just slightly above the maximal
sizes found in the input problems; 80 solutions.

3.5.2. SGCD in Specialized Settings with Lemma Injection
In specialized settings with lemma injection, SGCD solves more diﬃcult problems, including the
1.00-rated LCL073-1 [17]. With or without lemma injection it outperforms Vampire [52] (without
lemma injection 285 vs. 263 solutions for the 312 problems considered in [17]; with lemma
injection 289; Vampire, when given lemmas obtained from SGCD is boosted from 263 to 285
solutions). For an overall comparison of SGCD with and without lemma injection with various
provers, invoked with and without lemmas provided by SGCD, see http://cs.christophwernhard.
com/cdtools/exp-lemmas/lemmas.html. An excerpt of this table is reproduced in [53, App. E].

3.5.3. SGCD in Purely Goal-Driven Configurations
In purely goal-driven conﬁgurations SGCD is indeed very similar to the CM-CT provers and
can solve roughly the same TPTPCD problems. In two purely goal-driven conﬁgurations SGCD
solves 89 of the 196 TPTPCD problems, compared to 92 solvable by various CM-CT provers
according to previously reported experiments. With 86 problems, the overlap of problems
that are solved by both SGCD and CM-CT provers is rather large. See [38, Sect. 4.2] and
http://cs.christophwernhard.com/cdtools/exp-tptpcd-2022-07/table_3.html. The 92 solutions
by CM-CT provers could be stretched to a superset of 96 solutions, obtained by CMProver in
nine conﬁgurations in an experiment with a longer timeout of 1 hour per problem and prover
conﬁguration. Also the well-known leanCoP 2.1 [6] was tried. With a single exception, the
196 TPTPCD problems are provided in the TPTP CNF format, which is not accepted by leanCoP,
such that we had to take conversions to TPTP FOF format (with axioms and goal conjecture
distinguished). Experiments with a 1 hour timeout per problem led to 72 or 51 solutions,
depending on whether the major or minor subgoal appears ﬁrst in axiom Det in the FOF input.

3.5.4. SGCD for Theorem Finding
In purely axiom-driven conﬁgurations SGCD can perform theorem ﬁnding. It ﬁnds proofs
of Łukasiewicz’s 68 Theses in a single run in 2.5 minutes. These theses from a textbook by
Łukasiewicz were used extensively with OTTER [54] by Larry Wos [25, 26, 27, 28]. They could
be solved by OTTER in a single run after the introduction of weight templates [25, 27]. In [28]
the focus was ﬁnding proofs with small compacted size for sets of goals, subsets of the theses
that represent axiom systems. With respect to compacted size, the proofs obtained in our brief
experiments cannot compete with the carefully developed proofs from [28]. However, the tree
size of our proofs is for all considered axiom systems smaller. See [38, Sect. 4.3].


                                               75
3.5.5. SGCD and Proof Size
Prover9 computes for CD problems CD proofs that can be represented by D-terms. Hence we can
directly compare Prover9’s proofs with the D-terms output by SGCD. In general, SGCD ﬁnds
smaller proofs than Prover9 in its default mode, moderately smaller with respect to compacted
size and height, and drastically smaller with respect to tree size. See [38, Sect. 4.4].
   CCS can be applied in a conﬁguration that returns proofs with ascertained minimal compacted
size by exhaustive search [49]. This means that the method then guarantees that the problem
has no proof whose compacted size is strictly smaller than that of the returned proof. This
succeeds for about 44% problems of the TPTPCD corpus. SGCD with enumeration by PSP-level
is particularly useful to ﬁnd proofs with small compacted size in cases where the exhaustive
search for ascertained minimal compacted size appears to be unfeasible. See [38, Sect. 4.5].

3.5.6. SGCD with Enumeration by PSP-Level for Specific Problems and for Lemma
       Generation
With enumeration by PSP-level, SGCD ﬁnds a short proof of LCL038-1, the most diﬃcult
of the three subproblems of proving completeness of Łukasiewicz’s shortest single axiom for
implicational logic. Short extensions of this proof that solve the other two easier subproblems
were then performed with naive structure enumeration predicates provided by CD Tools.
SGCD’s proof of LCL038-1 is drastically shorter than proofs by other ATP systems and even
substantially shorter than the proofs by Łukasiewicz and Meredith. The combined proof for all
three subproblems is a few steps shorter than the proofs by these logicians. See [38, Sect. 4.6]
and also [15].
   For proving LCL073-1, SGCD was ﬁrst invoked purely axiom-driven by PSP-level to generate
about 100k lemmas. These were ranked by general heuristic features, mostly size properties
of the lemma formula, smaller preferred. Then SGCD was invoked a second time, now for
proving, with alternating goal- and axiom-driven phases, initialized by lemma injection with
the best-ranked 3k of the previously genrerated lemmas. The conﬁguration of the phases was
there “hybrid”: while the axiom-driven phases were again by PSP-level, the goal-driven phases
were by height as level characterization. For more details, see [17].
   Enumeration by PSP-level for lemma generation in general led to good results in the machine
learning scenario [17]. It appears to supply to other provers useful lemmas that would not be
found by their own calculi.


4. Some Issues Requiring Further Research
The approach of structure-generating theorem proving in its current state of development raises
questions that require further investigations. We outline some of these, organized into two
major topics.

4.1. Stronger Proof Compressions
SGCD so far focuses on unit/subtree lemmas, which correspond to the sharing of subtrees in
DAGs. There are stronger proof compressions that correspond to shared trees with “holes”.


                                              76
Technically this can be expressed with binary resolution that results in Horn clauses (the
body atoms correspond to the “holes”), with tree grammar compressions where nonterminals
may contain variables (the “holes”), or with combinators, where the “holes” are mapped to
subtree sharing in DAGs (by getting rid of the variables through combinators) [49]. The
combinator representation is particularly suitable for proof structure enumeration, interwoven
with uniﬁcation, initialized with a goal, as in SGCD when operated goal-driven. This is realized
with CCS [49], our second experimental prover in CD Tools. This prover may be viewed as
implementing the connection structure calculus [55], restricted to CD problems. Depending on
the permitted combinators or proof term patterns involving combinators, it can simulate other
calculi, in particular goal-driven versions of binary resolution.
   The most immediate level characterization for enumeration with the combinator representa-
tion is compacted size. We then obtain in essence DAG enumeration, with iterative deepening
upon the DAG size. Allowing combinators to enter the D-terms is only justiﬁed if they permit
proofs with smaller compacted size. That is, if the proving D-term with combinators has smaller
compacted size than those without combinators. Systematic enumeration of proof DAGs is
not as simple as tree enumeration because it requires to maintain subtrees and the lemmas
proven by them which already appear in the proof under construction.10 Rigid variables are not
suﬃcient, some copying or forgetting [55] of variables is required.
   So far, CCS has been tried only in goal-driven mode. It is not clear, how enumeration by
compacted size transfers to SGCD with its axiom-driven mode. The level characterizations
for SGCD are determined for a given proof structure “globally”, independently from context:
tree size, height, PSP-level. During enumeration by compacted size, a given structure has to be
assessed diﬀerently, context depending: if it already occurred in the partially built-up structure,
its cost value is zero – it can be attached again without increasing the overall compacted size. Is
the PSP-level a version of compacted size that is limited in a certain sense such that it permits
global value assessment? If so, is it a distinguished best version within that limitation?
   CCS performs roughly comparable with CM-CT provers. If conﬁgured without combinators, it
yields as a bonus proofs with guaranteed minimal compacted size. While stronger compressions
can in principle achieve drastic savings, it is not clear what role this may play in practice.

4.2. Systematization of Level Characterizations
It seems desirable to formally specify and systematize criteria and possibilities of level charac-
terizations for SGCD. Levels may be disjoint (e.g., tree size, height, compacted size, PSP-level)
or cumulative (e.g., tree size less or equal than the level index). There is an interplay with the
subterm relationship among proof terms. There may be gaps with respect to given axioms: some
level may have no member with MGT, but the next level has members with MGT. A method
that incrementally generates levels then may exhaust too early. The level characterization may
be incomplete, as observed for the PSP-level in Sect. 3.2. As noted in Sect. 4.1, compacted size is
in a sense context dependent and seems to bear a certain relationship to PSP-level enumeration.
   How can diﬀerent level characterizations be combined? Abstractly into a single new level
characterization? The experiment with LCL073-1 shows that “hybrid” conﬁgurations with

10
     As outlined in [49], the DAG enumeration by CCS is with a variation of the value-number method [56, 57].


                                                         77
diﬀerent level characterizations for the goal- and axiom-driven phases are important. A trivial
pragmatic approach to combine diﬀerent level characterizations is invoking prover instances for
each of them in portfolio mode. Also “heuristic” limitations like, e.g., maximal term height of
lemma formulas, which are so far handled in SGCD orthogonally to the level characterizations,
might be incorporated into level characterizations.
   In semi-naive evaluation (e.g., [58]) certain recursive invocations of enumeration predicates
are replaced with accesses to “delta” predicates that represent results newly obtained in the
immediately preceding round. This realizes a forward-reasoning trigger eﬀect: a subgoal with a
delta predicate is evaluated ﬁrst; only for its results, that is, the new results from the preceding
round, the remaining subgoals are considered. A related eﬀect can be observed in some cases
for our proof/formula pair enumeration predicate, where recursive invocations are for the
given level minus one. But so far, such eﬀects and their consequences for improving SGCD’s
enumeration algorithms have not been systematically studied.
   The grouping into levels suggests a “bulk view” on ﬁrst-order ATP, where instead of proving a
single theorem from axioms the objective is to derive from the axioms all lemmas in some given
level. This perspective hides a key diﬃculty in goal-driven proceeding to prove a single theorem,
which precludes decomposing a problem – an open subgoal – into subproblems – further
subgoals – that are independent from each other: The subgoals obtained in the decomposition,
e.g., for the major and minor premise of a CD problem, may share variables. The bulk view
may explain the success of axiom-driven proceeding and its adequacy for theorem ﬁnding
(Sect. 3.5.4). It also may facilitate interfacing with techniques for machine learning that are
based on problem decomposition into independent subproblems.
   The view of enumeration by level suggests to try in ATP research also an “empirical” style of
pattern-observing, inspired by physics, as exempliﬁed for combinatory logic in [59].


5. Conclusion
We have generalized the proof structure enumeration interwoven with uniﬁcation as performed
by goal-driven CM-CT provers to an interplay of goal- and axiom-driven processing. It makes
heuristic restrictions known from saturating provers applicable. The approach was evaluated
with SGCD, an experimental system.
   Proof structure terms, proof objects that allow to specify and implement various ways of
building proofs, were a central tool. Meredith’s CD represents the prototypical base case of
such proof structure terms. Hence, so far we focused on the CD problems, a subclass of the
ﬁrst-order Horn problems with a binary function symbol, and cyclic predicate dependency.
   From the view of ﬁrst-order ATP, CD provides a simpliﬁed setting that inspired and facilitated
our work. We expect that our main techniques and results can be transferred to ﬁrst-order
ATP in general. Incorporation of dedicated equality handling should through the axiom-driven
operation be possible without resorting to lazy variants of paramodulation.
   Another view on generalization is to explore the potential scope of CD. Generalizing CD to
arbitrary ﬁrst-order Horn problems is straightforward and has been realized [49] in a system
related to SGCD. Typical applications of CD are investigating axiomatizations of propositional
logics whose formulas are expressed as ﬁrst-order terms. Proving problems are then ﬁrst-order


                                                78
problems with a single predicate. If the axiomatized logic is classical propositional logic, this
covers theorem proving in that logic, although with an incomparably weaker performance
than that of a SAT solver. Nevertheless, with a generalization of CD to take quantiﬁcation into
account, this approach is used very successfully by Metamath to formalize mathematics [22].
The mismatch between this CD approach and current ATP systems (a generalization of the
mentioned ways to perform propositional reasoning via CD vs. a SAT solver) has recently been
observed and addressed in diﬀerent ways [23]. Dedicated CD-based reasoning, as represented
by our SGCD system, may be a further way to combine Metamath – with its inherent generality
– with ATP. Theoretical results concerning the extension of CD to ﬁrst-order logic in general
can be found in [18]. The CM and the connection structure calculus oﬀer proof representations
for ﬁrst-order logic in general, which possibly may be used like the D-terms of CD [33, 15, 49].
   Concerning related proving techniques, similarly to SGCD an implementation [60] of hy-
pertableaux [36] creates lemmas axiom-driven level-by-level. This approach is, however, much
less ﬂexibly conﬁgurable and it incorporates goal-driven phases only in a rudimentary way, by
termination before a level is completed as soon as a clause subsuming the goal is generated. A
recent enhancement of E by machine learning supplements a classiﬁcation based on derivation
history as clause selection guidance [61]. Since in structure-generating proving the D-term
itself, or, so-to-speak, the derivation history, provides the main guidance, this enhancement
might possibly be viewed as introducing a little bit of structure-generating proving into E.
   Experiments show that on CD problems the structure-generating approach keeps up with
state-of-the-art ﬁrst-order provers and results in comparatively short proofs, which, moreover,
are gap-free and in a simple calculus. In powerful conﬁgurations, state-of-the-art provers such
as E and Vampire do not emit gap-free proofs. Further particular strengths of the approach show
up with two problems that were extensively investigated in ATP. For the ﬁrst, never solved by a
CM-CT prover, SGCD quickly ﬁnds a proof that is shorter than all known ones. For the second,
which escapes all state-of-the-art provers, SGCD, in a speciﬁc conﬁguration, is able to ﬁnd a
proof. In both cases the key is a novel proof structure enumeration strategy.
   As demonstrated in [17], the structure-generating approach is also useful in machine learning
for ATP. It supports a data ﬂow centered around proof structure terms, D-terms. Features of
formulas can be complemented by features of proof structures. In purely axiom-driven mode
SGCD is well suited to generate a large number of proven lemmas from which a small subset is
then selected on the basis of learning and passed as auxiliary input to arbitrary provers.
   Open issues that require further research include the incorporation of stronger proof compres-
sions and a systematic investigation of the possibilities to group proof structures into ordered
“levels” governing the enumeration order of SGCD.


6. Acknowledgments
The author thanks anonymous reviewers for helpful suggestions to improve the presentation.
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-
ID 457292495. The work was supported by the North-German Supercomputing Alliance (HLRN).


                                               79
References
 [1] R. Hähnle, Tableaux and related methods, in: A. Robinson, A. Voronkov (Eds.),
     Handb. of Autom. Reasoning, volume 1, Elsevier, 2001, pp. 101–178. doi:10.1016/
     b978-044450813-3/50005-9.
 [2] R. Letz, Tableau and Connection Calculi. Structure, Complexity, Implementation, Habilita-
     tionsschrift, TU München, 1999. Available from http://www2.tcs.iﬁ.lmu.de/~letz/habil.ps,
     accessed Jul 19, 2023.
 [3] M. E. Stickel, A Prolog technology theorem prover: implementation by an extended Prolog
     compiler, J. Autom. Reasoning 4 (1988) 353–380. doi:10.1007/BF00297245.
 [4] R. Letz, J. Schumann, S. Bayerl, W. Bibel, SETHEO: A high-performance theorem prover, J.
     Autom. Reasoning 8 (1992) 183–212. doi:10.1007/BF00244282.
 [5] J. Otten, W. Bibel, leanCoP: lean connection-based theorem proving, J. Symb. Comput. 36
     (2003) 139–161. doi:10.1016/S0747-7171(03)00037-3.
 [6] J. Otten, Restricting backtracking in connection calculi, AI Communications 23 (2010)
     159–182. doi:10.3233/AIC-2010-0464.
 [7] I. Dahn, C. Wernhard, First order proof problems extracted from an article in the Mizar
     mathematical library, in: M. P. Bonacina, U. Furbach (Eds.), FTP’97, RISC-Linz Report
     Series No. 97–50, Joh. Kepler Univ., Linz, 1997, pp. 58–62. URL: https://www.logic.at/ftp97/
     papers/dahn.pdf.
 [8] C. Wernhard, The PIE system for proving, interpolating and eliminating, in: P. Fontaine,
     S. Schulz, J. Urban (Eds.), PAAR 2016, volume 1635 of CEUR Workshop Proc., CEUR-WS.org,
     2016, pp. 125–138. URL: http://ceur-ws.org/Vol-1635/paper-11.pdf.
 [9] C. Wernhard, Facets of the PIE environment for proving, interpolating and eliminating on
     the basis of ﬁrst-order logic, in: P. Hofstedt, et al. (Eds.), DECLARE 2019, volume 12057 of
     LNCS (LNAI), 2020, pp. 160–177. doi:10.1007/978-3-030-46714-2_11.
[10] D. W. Loveland, Automated Theorem Proving: A Logical Basis, North-Holland, 1978.
[11] D. Prawitz, An improved proof procedure, Theoria 26 (1960) 102–139.
[12] D. Prawitz, Advances and problems in mechanical proof procedures, Machine Intelligence
     4 (1969) 59–71. Reprinted with author preface in J. Siekmann, G. Wright (eds.): Automation
     of Reasoning, vol 2: Classical Papers on Computational Logic 1967–1970, Springer, 1983,
     pp. 283–297.
[13] C. Goller, R. Letz, K. Mayr, J. Schumann, SETHEO V3.2: recent developments – system
     abstract, in: A. Bundy (Ed.), CADE-12, volume 814 of LNCS (LNAI), 1994, pp. 778–782.
     doi:10.1007/3-540-58156-1_59.
[14] A. N. Prior, Logicians at play; or Syll, Simp and Hilbert, Australasian Journal of Philosophy
     34 (1956) 182–192. doi:10.1080/00048405685200181.
[15] C. Wernhard, W. Bibel, Investigations into proof structures, CoRR abs/2304.12827 (2023).
     doi:10.48550/arXiv.2304.12827, submitted.
[16] C. A. Meredith, A. N. Prior, Notes on the axiomatics of the propositional calculus., Notre
     Dame J. of Formal Logic 4 (1963) 171–187. doi:10.1305/ndjfl/1093957574.
[17] M. Rawson, C. Wernhard, Z. Zombori, W. Bibel, Lemmas: Generation, selection, application,
     in: R. Ramanayake, J. Urban (Eds.), TABLEAUX 2023, volume 14278 of LNCS (LNAI),
     Springer, 2023, pp. 153–174. doi:10.1007/978-3-031-43513-3_9.


                                               80
[18] A. Rezuş, Witness Theory – Notes on λ-calculus and Logic, volume 84 of Studies in Logic,
     College Publications, 2020.
[19] A. Rezuş, The Curry-Howard Correspondence revisited (2019b), in: [18], 2020, pp. 209–214.
[20] J. R. Hindley, D. Meredith, Principal type-schemes and condensed detachment, J. Symbolic
     Logic 55 (1990) 90–105. doi:10.2307/2274956.
[21] J. R. Hindley, Basic Simple Type Theory, Cambridge University Press, 1997. doi:10.1017/
     CBO9780511608865.
[22] N. D. Megill, A ﬁnitely axiomatized formalization of predicate calculus with equality,
     Notre Dame J. of Formal Logic 36 (1995) 435–453. doi:10.1305/ndjfl/1040149359.
[23] M. Carneiro, C. E. Brown, J. Urban, Automated theorem proving for Metamath, in:
     A. Naumowicz, R. Thiemann (Eds.), ITP 2023, volume 268 of LIPIcs, Schloss Dagstuhl –
     Leibniz-Zentrum für Informatik, 2023, pp. 9:1–9:19. doi:10.4230/LIPIcs.ITP.2023.9.
[24] L. Wos, S. Winker, W. McCune, R. Overbeek, E. Lusk, R. Stevens, R. Butler, Automated
     reasoning contributes to mathematics and logic, in: M. E. Stickel (Ed.), CADE-10, Springer,
     1990, pp. 485–499. doi:10.1007/3-540-52885-7_109.
[25] L. Wos, Automated reasoning and Bledsoe’s dream for the ﬁeld, in: R. S. Boyer (Ed.),
     Automated Reasoning: Essays in Honor of Woody Bledsoe, Automated Reasoning Series,
     Kluwer Academic Publishers, 1991, pp. 297–345. doi:10.1007/978-94-011-3488-0_
     15.
[26] W. McCune, L. Wos, Experiments in automated deduction with condensed detachment,
     in: D. Kapur (Ed.), CADE-11, volume 607 of LNCS (LNAI), Springer, 1992, pp. 209–223.
     doi:10.1007/3-540-55602-8_167.
[27] L. Wos, The resonance strategy, Computers Math. Applic. 29 (1995) 133–178. doi:10.
     1016/0898-1221(94)00220-F.
[28] L. Wos, The power of combining resonance with heat, J. Autom. Reasoning 17 (1996)
     23–81. doi:10.1007/BF00247668.
[29] R. Veroﬀ, Finding shortest proofs: An application of linked inference rules, J. Autom.
     Reasoning 27 (2001) 123–139. doi:10.1023/A:1010635625063.
[30] B. Fitelson, L. Wos, Missing proofs found, J. Autom. Reasoning 27 (2001) 201–225.
     doi:10.1023/A:1010695827789.
[31] L. Wos, Conquering the Meredith single axiom, J. Autom. Reasoning 27 (2001) 175–199.
     doi:10.1023/A:1010691726881.
[32] D. Ulrich, A legacy recalled and a tradition continued, J. Autom. Reasoning 27 (2001)
     97–122. doi:10.1023/A:1010683508225.
[33] C. Wernhard, W. Bibel, Learning from Łukasiewicz and Meredith: Investigations into
     proof structures, in: A. Platzer, G. Sutcliﬀe (Eds.), CADE 28, volume 12699 of LNCS (LNAI),
     Springer, 2021, pp. 58–75. doi:10.1007/978-3-030-79876-5_4.
[34] OEIS Foundation Inc., The On-Line Encyclopedia of Integer Sequences, 2023. Published
     electronically at http://oeis.org.
[35] J. M. P. Schumann, DELTA — A bottom-up preprocessor for top-down theorem provers,
     in: CADE-12, volume 814 of LNCS (LNAI), Springer, 1994, pp. 774–777. doi:10.1007/
     3-540-58156-1_58.
[36] P. Baumgartner, U. Furbach, I. Niemelä, Hyper tableaux, in: J. J. Alferes, L. M. Pereira,
     E. Orlowska (Eds.), JELIA’96, volume 1126 of LNCS (LNAI), Springer, 1996, pp. 1–17.


                                              81
     doi:10.1007/3-540-61630-6_1.
[37] W. Bibel, Comparison of proof methods, in: J. Otten, W. Bibel (Eds.), AReCCa 2023, CEUR
     Workshop Proc., CEUR-WS.org, 2023. This volume.
[38] C. Wernhard, CD Tools — Condensed detachment and structure generating theorem
     proving (system description), CoRR abs/2207.08453 (2023). doi:10.48550/arXiv.2207.
     08453.
[39] E. J. Lemmon, C. A. Meredith, D. Meredith, A. N. Prior, I. Thomas, Calculi of pure
     strict implication, in: J. W. Davis, D. J. Hockney, W. K. Wilson (Eds.), Philosophical
     Logic, Springer Netherlands, 1969, pp. 215–250. doi:10.1007/978-94-010-9614-0_17,
     reprint of a technical report, Canterbury University College, Christchurch, 1957.
[40] A. N. Prior, Formal Logic, 2nd ed., Clarendon Press, Oxford, 1962. doi:10.1093/acprof:
     oso/9780198241560.001.0001.
[41] D. Meredith, In memoriam: Carew Arthur Meredith (1904–1976)., Notre Dame J. of Formal
     Logic 18 (1977) 513–516. doi:10.1305/ndjfl/1093888116.
[42] W. McCune, Prover9 and Mace4, 2005–2010. http://www.cs.unm.edu/~mccune/prover9.
[43] W. Bibel, Automated Theorem Proving, Vieweg, Braunschweig, 1982. doi:10.1007/
     978-3-322-90102-6, second edition 1987.
[44] W. Bibel, Deduction: Automated Logic, Academic Press, 1993.
[45] W. Bibel, J. Otten, From Schütte’s formal systems to modern automated deduction, in:
     R. Kahle, M. Rathjen (Eds.), The Legacy of Kurt Schütte, Springer, 2020, pp. 215–249.
     doi:10.1007/978-3-030-49424-7_13.
[46] J. Wielemaker, T. Schrijvers, M. Triska, T. Lager, SWI-Prolog, Theory and Practice of Logic
     Programming 12 (2012) 67–96. doi:10.1017/S1471068411000494.
[47] O. L. Astrachan, M. E. Stickel, Caching and lemmaizing in model elimination theo-
     rem provers, in: D. Kapur (Ed.), CADE-11, Springer, 1992, pp. 224–238. doi:10.1007/
     3-540-55602-8_168.
[48] J. Łukasiewicz, The shortest axiom of the implicational calculus of propositions, in:
     Proc. of the Royal Irish Academy, volume 52, Sect. A, No. 3, 1948, pp. 25–33. URL: http:
     //www.jstor.org/stable/20488489.
[49] C. Wernhard, Generating compressed combinatory proof structures – an approach to
     automated ﬁrst-order theorem proving, in: B. Konev, C. Schon, A. Steen (Eds.), PAAR 2022,
     volume 3201 of CEUR Workshop Proc., CEUR-WS.org, 2022. https://arxiv.org/abs/2209.12592.
[50] N. Eén, N. Sörensson, An extensible SAT-solver, in: E. Giunchiglia, A. Tacchella
     (Eds.), SAT 2003, volume 2919 of LNCS, Springer, 2003, pp. 502–518. doi:10.1007/
     978-3-540-24605-3_37.
[51] S. Schulz, S. Cruanes, P. Vukmirović, Faster, higher, stronger: E 2.3, in: P. Fontaine
     (Ed.), CADE 27, number 11716 in LNAI, Springer, 2019, pp. 495–507. doi:10.1007/
     978-3-030-29436-6_29.
[52] L. Kovács, A. Voronkov, First-order theorem proving and Vampire, in: N. Sharygina,
     H. Veith (Eds.), CAV 2013, volume 8044 of LNCS, Springer, 2013, pp. 1–35. doi:10.1007/
     978-3-642-39799-8_1.
[53] M. Rawson, C. Wernhard, Z. Zombori, W. Bibel, Lemmas: Generation, selection, application,
     CoRR abs/2303.05854 (2023). doi:10.48550/arXiv.2303.05854, extended version of
     [17].


                                              82
[54] W. McCune, OTTER 3.3 Reference Manual, Technical Report ANL/MCS-TM-263, Argonne
     National Laboratory, 2003. https://www.cs.unm.edu/~mccune/otter/Otter33.pdf, accessed
     Jul 19, 2023.
[55] E. Eder, A comparison of the resolution calculus and the connection method, and a new
     calculus generalizing both methods, in: E. Börger, H. Kleine Büning, M. M. Richter (Eds.),
     CSL ’88, volume 385 of LNCS, Springer, 1989, pp. 80–98. doi:10.1007/BFb0026296.
[56] A. V. Aho, R. Sethi, J. D. Ullman, Compilers – Principles, Techniques, and Tools, Addison-
     Wesley, 1986.
[57] A. Genitrini, B. Gittenberger, M. Kauers, M. Wallner, Asymptotic enumeration of compacted
     binary trees of bounded right height, J. Comb. Theory, Ser. A 172 (2020) 105177. doi:10.
     1016/j.jcta.2019.105177.
[58] J. D. Ullman, Principles of Database and Knowledge-Base Systems, volume I: Classical
     Database Systems, Computer Science Press, Rockville, Maryland, 1988.
[59] S. Wolfram, Combinators – A Centennial View, Wolfram Media Inc, 2021. Accompanying
     webpage: https://writings.stephenwolfram.com/2020/12/combinators-a-centennial-view/,
     accessed Jun 30, 2022.
[60] B. Pelzer, C. Wernhard, System description: E-KRHyper, in: F. Pfenning (Ed.),
     CADE-21, volume 4603 of LNCS (LNAI), Springer, 2007, pp. 503–513. doi:10.1007/
     978-3-540-73595-3_37.
[61] M. Suda, Improving ENIGMA-style clause selection while learning from history, in:
     A. Platzer, G. Sutcliﬀe (Eds.), CADE 28, volume 12699 of LNCS (LNAI), Springer, 2021, pp.
     543–561. doi:10.1007/978-3-030-79876-5_31.


                                             83

</pre>