<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Typed meta-interpretive learning for proof strategies?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Colin Farquhar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gudmund Grov</string-name>
          <email>G.Grovg@hw.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrew Cropper</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephen Muggleton</string-name>
          <email>s.muggletong@imperial.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alan Bundy</string-name>
          <email>a.bundy@ed.ac.uk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Heriot-Watt University</institution>
          ,
          <addr-line>Edinburgh</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Imperial College</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Edinburgh</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Formal veri cation of computer programs is increasingly used in industry. A popular technique is interactive theorem proving, used for instance by Intel in HOL light. The ability to learn and re-apply proof strategies from a small set of proofs would signi cantly increase the productivity of these systems, and make them more cost-e ective to use. Previous learning attempts have had limited success, which we believe is a result of missing key goal properties in the strategies. Capturing such properties requires predicate invention, and the only state-of-the-art ILP technique which supports this is meta-interpretive learning (MIL). We show that MIL is applicable to this problem, but that without type information it o ers limited improvements in quality over previous work. We then extend MIL with types and give preliminary results indicating that this extension learns better-quality strategies with suitable goal properties. We also show that the quality of the learned strategies can be further enhanced through the use of dependent learning.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The expressiveness of (higher order) interactive theorem provers (ITPs) has made
them a popular choice for formalised mathematics and software veri cation4.
However, this comes at the expense of automation: users must often manually
provide guidance, where each step applies a proof tactic that splits a goal into
smaller sub-goals.</p>
      <p>An observed phenomenon is that proofs often group into families, such that,
once the expert user has discharged one proof, she can mentally extract a proof
strategy which she uses to complete the rest [3]. The remaining proofs have to
be manually guided as well, although the proof strategy is clear in her head.</p>
      <p>If one could learn and reapply proof strategies from a few examples then
this could signi cantly increase automation, making the overall approach more
cost-e ective { a key bottleneck for industrial application { and provide support
for more elegant automated proofs that a user can understand.
? This work has been supported by EPSRC grant EP/J001058/1, and the rst author
is supported by a James Watt scholarship. The fourth author acknowledges support
from his Royal Academy of Engineering/Syngenta Research Chair.
4 See e.g. the AFP [afp.sourceforge.net] and L4.veri ed [sel4.systems].</p>
      <p>Previous work to learn proof strategies [12,6,8] has only attempted to extract
general strategies from a large corpus of proofs, and has not addressed the
desirable extraction of \local" strategies from small families. It has simpli ed the
problem to composition of tactics, with no explanations of why or when a
strategy should be applied. Such explanation is crucial so the user can understand the
learnt strategy. It also reduces the search space and ensures termination without
resorting to hard-coded heuristics which may rule out some proofs.</p>
      <p>This de ciency was part of our motivation in developing the PSGraph
language [9], which describes such \why"s and \when"s by including information
about the tactics and sub-goals. This is achieved by representing proof strategies
as graphs, where proof tactics are represented by boxes and goal information by
predicates which label the wires, which we call wire predicates.</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
tac_1(g1,g2)
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) ...
      </p>
      <p>tac_n(gx,gy)
...</p>
      <p>MIL
y
il
m
a
f
e
r
u
t
c
e
j
n
o
C
Δ ⊢ Γ
Δ ⊢ Γ</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
Δ ⊢ Γ
      </p>
      <p>
        Δ ⊢ Γ
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
lemma Δ ⊢ Γ
apply tac_1
...
apply tac_n
done
Interactive theorem prover
      </p>
      <p>PSGraph</p>
      <p>
        Meta-Interpretive Learning (MIL) [19] supports predicate invention and the
learning of recursive de nitions of the kind represented by de nitions of the wire
predicates. Our vision is based around using MIL to extract proof strategies
which currently reside in the ITP user's head. Our approach is illustrated in Fig.
1, where (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) a user selects one (or a few) conjectures from a \conjecture family"
and guides the proof as normal (using a proof script) in an ITP system. This will
generate a proof tree. (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Metagol is then used to learn a proof strategy from
the proof tree, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) which can then be translated into PSGraph and (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) applied
to the other conjectures in the family, which will generate new proof trees (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ).
      </p>
      <p>In this paper we take the rst steps towards this. We show (C1) that MIL
can learn proof strategies for the PSGraph language. As we are working with
rich data, we show that these strategies have a high branching factor leading to
a large search space. We therefore say they are highly non-deterministic, where
a deterministic strategy has a single branch. Non-determinism is undesirable as
the search space becomes impractically large. We extend MIL with types (C2)
and demonstrate (C3) that \typed MIL learns more deterministic proof
strategies than untyped MIL". We further show that introducing dependent learning
reduces the time taken to learn a strategy (C4).</p>
    </sec>
    <sec id="sec-2">
      <title>2 Related work</title>
      <p>
        The development of proof-based methods { such as formal methods for software
development, or mechanised mathematics { can be divided into 3 phases: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
speci cation of the formal system; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) conjecturing of the properties to be proven;
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) proving the conjectures. (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) includes discovery of invariants required for
veri cation and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) includes discovery of required intermediate lemmas. There
may be mutual dependency between these phases: e.g. a failure from (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) may be
used to highlight mistakes in the speci cation or a need for intermediate lemmas.
      </p>
      <p>
        The main focus for (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) has been the discovery of intermediate assertions such
as system invariants. This work typically involves simulating a speci cation and
generating a set of traces. These become the examples used in machine learning.
Daikon is one of the most well-known tools for invariant discovery and is based
on inductive techniques [7]. In [11] Progol is used to discover invariants for a
case study. They note the need for predicate invention, which is our main reason
for using MIL. In [17] some domain-speci c heuristics are developed for HR (a
declarative machine learning tool) in order to discover invariants for a formal
method called Event-B. [1] uses several several ILP systems to repair faulty
speci cations based on counter-examples and witnesses provided as examples
generated by a model checker.
      </p>
      <p>
        The main focus for (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) has been lemma discovery. Given a source lemma,
[10] uses statistical machine learning to nd analogous target lemmas and a
\mutation" algorithm is developed to mutate the required intermediate lemmas
from the target theory into the source theory. There are (at least) two recent
projects building on this work (see [21]).
      </p>
      <p>
        To prove a conjecture (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), we separate between automated and interactive
theorem provers (ATPs and ITPs), where the former is fully automatic and the
latter is more expressive but requires user guidance. Machine learning has been
successful in improving automation of ATPs by selecting relevant hypotheses
(see e.g. [13]). We aim to support a common methodology within ITP systems
where a strategy is extracted from few examples and used to automate the proof
of similar conjectures. This is orthogonal to work in ATP systems. [14] uses
neural nets to provide hints for the user, but does not generalise proofs into
strategies. A similar approach was used in [15] to classify proofs, again
without learning strategies. Other approaches [12,6,8] have simpli ed the problem to
only address tactic composition. The Learn mega system [12] implemented an
algorithm which computes a least general generalisation of such traces captured
as a regular expression. In her PhD thesis [6], Duncan used a combination of
genetic algorithms and statistical methods to generate a similar regular
expression for the Isabelle prover. Sepia [8] infers a state machine, which is searched
over using breadth- rst search for the Coq prover, allowing richer tactics. All
these examples assume a large corpus of proofs to learn from and do not
provide any guidance for the proof strategies, e.g. none address terminating loops.
This is partially overcome by hard-coded heuristics [6] or search strategies with
sub-optimal memory consumption [8]. In comparison we are addressing the more
challenging problem of learning strategies which provide this guidance in terms of
wire predicates. As these are arbitrary and unpredictable recursive functions on
the term structure of a (higher-order) goal, predicate invention will be required.
As far as we are aware, we are the rst to apply ILP to such rich problems, with
MIL the only machine learning technique supporting predicate invention.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Interactive Theorem Proving &amp; PSGraph</title>
      <p>lemma (A ! B)
(B ! C)</p>
      <p>A ! C
apply (rule impI)
apply (rule impI)
apply (rule impI)
apply (erule impE)
apply assumption
apply (erule impE)
apply assumption
apply assumption
done
!
!</p>
      <p>g0 ` (A ! B) ! (B ! C) ! A ! C
rule impI</p>
      <p>g1 (A ! B) ` (B ! C) ! A ! C
rule impI</p>
      <p>g2 (A ! B),(B ! C) ` A ! C
rule impI</p>
      <p>g3 (A ! B),(B ! C),A ` C
erule impE
A,(B ! C),A ` A g4 g5 B,(B ! C),A ` C
assumption erule impE</p>
      <p>B,C,A ` C g6 g7 B,A ` B
assumption assumption</p>
      <p>Interactive theorem provers (ITPs) enable users to interact with a proof
system and guide a proof. They also support automation in terms of user-provided
tactics. In this paper we will use the higher-order logic (Isabelle/HOL)
embedding of the Isabelle theorem prover [20].</p>
      <p>To develop theories and proofs, a user works with a proof script. The proof
script shown on the left of Fig. 2 illustrates a proof of a conjecture in
propositional logic. The rst line states the conjecture to be proved, and is followed
by a sequence of apply commands before the proof is closed by done. Note that
each command is applied to the rst sub-goal. The apply commands are used to
apply so-called proof tactics, which are programs which reduce a goal to a list of
(normally) smaller sub-goals. rule applies backwards resolution to a goal, erule
applies an elimation rule to a hypothesis, while assumption proves the goal by
applying one of the hypothesis to the goal5. Fig. 2 (middle) illustrates a proof
tree of this script, where we have labelled each goal6. Note that the proof script
shows a single branch of search space: each tactic may produce multiple branches
and backtracking may be required.</p>
      <sec id="sec-3-1">
        <title>The overall proof strategy is to rst remove all ! in the conclusion (rule</title>
        <p>impI), then all ! in the hypothesis (erule impE). At the end, all sub-goals can
be proven by the assumption tactic. The same strategy can be applied to prove
(A ! B ! C) ! (A ! B) ! A ! C. However, the proof script will
vary slightly as additional erule impE and assumption applications are required.</p>
        <p>While there are richer tactics to prove these two conjectures automatically,
they nevertheless illustrate a common phenomenon within ITP systems: by
proving a single conjecture a user will develop a proof strategy that she can reapply</p>
        <sec id="sec-3-1-1">
          <title>5 Stricly speaking rule is the tactic and impI is its argument. 6 This can be extracted from Isabelle using the ProofProcess framework [22].</title>
          <p>across a family of similar conjectures (as seen in Fig. 1). Such families are
common, either as separate conjectures or as sub-goals within a single conjecture.
The strategies are normally in the head of a user, although an expert may
manually encode them as tactics. { our goal is to help automate the extraction and
application of such strategies.</p>
          <p>A proof strategy needs to include procedural information about which tactics
to apply. However, they should also contain declarative information about the
type of goal and progress made to show how a goal should evolve. The lack
of such information is evident in [12,6,8], and to support such strategies we
developed the PSGraph language [9]. Here, a directed labelled graph is used to
capture proof strategies: the boxes of the graph contain the tactics, while the
wires are labelled by wire predicates { predicates to describe why a sub-goal
should be on a given wire. A graph is evaluated as a ow graph, where a goal
ows between tactics on a directed edge if the predicate on the edge holds for
that particular goal. Fig. 2 (right) shows the proof strategy described above
as a PSGraph. Here, imp goal is a predicate that holds if the conclusion is an
implication; imp hyp is a predicate that holds if the hypothesis is an implication
and the conclusion is not; while has assm holds if the conclusion is present in the
hypotheses. The remainder of the paper addresses learning of PSGraphs from a
small set of examples. The shaded parts of Fig. 2 highlight a sub-proof (middle),
which we can learn a sub-strategy from (right). We will return to this below.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Typed Meta-Interpretive Learning</title>
      <p>Our framework is built on top of MIL [18,19], which is a form of ILP based on an
adapted Prolog meta-interpreter. Whereas a standard Prolog meta-interpreter
attempts to prove a goal by repeatedly fetching rst-order clauses whose heads
unify with a given goal, a MIL learner attempts to prove a set of goals by
repeatedly fetching higher-order metarules (e.g. P (X; Y ) Q(Y; X)) whose
heads unify with a given goal. The resulting meta-substitutions are saved in an
abduction store and can be reused in later proofs. Following the proof of a set
of goals, a hypothesis is formed by projecting the meta-substitutions onto their
corresponding metarules, allowing for a form of ILP which supports predicate
invention and the learning of recursive theories.</p>
      <p>To demonstrate this technique, suppose the background knowledge consists of
the ground atom parent(alice,bob) and our goal is the ground atom child(bob,alice).
Let P (X; Y ) Q(Y; X) be a metarule. To prove this goal, a MIL learner fetches
the metarule and applies the meta-substitution = fP=child; Q=parentg to
unify the head of the goal with the metarule. The ground atom inverse(child,parent),
representing the meta-substitution , is saved in an abduction store, and the
learner continues the proof by attempting to recursively prove the body of the
metarule. Once a proof is complete, the ground atom inverse(child,parent), which
is saved in the abduction store, is projected onto the corresponding metarule to
obtain the clause child(X; Y ) parent(Y; X).</p>
      <p>A novel aspect of MIL is its use of predicate invention for problem
decomposition. In [16], the authors used MIL to induce string transformation programs.
In this approach, solutions to simple problems, including their constituent
predicates, are incorporated into the background knowledge and can be reused to
learn solutions to more di cult problems. This dependent learning approach
resulted in more compact programs through predicate reuse, and also led to
reduced learning times.</p>
      <p>prove([]; P rog; P rog):
prove([AtomjAs]; P rog1; P rog2) :
metarule(N ame; M etaSub; (Atom :- Body); Order);
Order;
abduce(metasub(N ame; M etaSub); P rog1; P rog3);
prove(Body; P rog3; P rog4);
prove(As; P rog4; P rog2):</p>
      <p>Our work uses the MetagolDF implementation [16] of MIL, displayed in Fig.
3. We extend this framework with simple types:
De nition 1 (Typed Meta-Interpretive Learning). Typed MIL extends
MIL by labelling each predicate P with a constant t representing its type. This
is written P : t, such that P (X; Y ) becomes P : t(X; Y ). To reuse the algorithm
of Fig. 3, the type is treated as an extra constant argument, thus P : t(X; Y ) is
internally represented as P (t; X; Y ). For readability we use the former.</p>
    </sec>
    <sec id="sec-5">
      <title>5 Typed MIL for Proof Strategies</title>
      <p>
        Learning PSGraphs from example proofs can be reduced to two mutually
dependent learning problems: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) learning a graph's structure; and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) learning
suitable wire predicates. Previous learning attempts [12,6,8] have simpli ed the
problem to just (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), with the learned strategies lacking explanation resulting in,
as our experiments show, a higher branching factor and thus slower search.
Name Metarule
Lift P : psgraph(x; y)
Chain P : psgraph(x; y)
Loop P : psgraph(x; y)
WChain P : wpred(x)
WBin P : wpred(x)
      </p>
      <p>Order
W : wpred(x); R : tactic(x; y) P R
Q : psgraph(x; z); R : psgraph(z; y) P Q; P</p>
      <p>Q : psgraph(x; z); P : psgraph(z; y) P Q; x
Q : gdata(x; z); R : gdata(x; z) P Q; P
Q : gdata(x; z) P Q
R
z y
R; Q</p>
      <p>R</p>
      <p>
        These two problems have di erent
features: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) requires learning clauses which can
be translated into a graph with wire
predicates, while (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) needs a large search space in
order to learn unknown predicates. Working
with higher-order logic this includes arbitrary
W
R
      </p>
      <p>Q
R</p>
      <p>Q</p>
      <p>X
A</p>
      <p>X∨Y
Y</p>
      <p>B
Fig. 5. Metarules in PSGraph
recursive functions. In typed MIL we can control the structure of what is learned
with metarules, and use types to separate the learning problems.</p>
      <p>Fig. 4 shows the metarules used, with a set of rules to learn proof strategies
(with type psgraph) and rules to learn wire predicates (type wpred ). For general
problems the latter set may be larger. A graphical view is given in Fig. 5, with
the outer grey lines indicate the learnt PSGraph in each case and stippled boxes
indicating that the boxes are of type psgraph, meaning they may represent a
sub-graph. The dots indicate where input and output wires are plugged.</p>
      <p>Lift takes a single node in a proof tree with input gx and output gy and
constructs a PSGraph consisting of a single node representing the corresponding
tactic and a labelled input edge. Metagol nds the appropriate tactic clause in
the background information, labelled with type tactic, and tries to nd a clause
of type wpred to de ne the wire predicate on the input edge. If a suitable wpred
clause can be found in the background information it will be inserted, otherwise
Metagol will use further metarules (such as WChain and WBin in Fig. 4) to
attempt to nd a suitable de nition from the available information. Thus we
nd that the node in the proof tree has been "lifted\ into the PSGraph:
P SGraph(psgraph; gx; gy)</p>
      <p>predicate(wpred; gx); tactic(tactic; gx; gy):</p>
      <p>Chain sequentially composes two such psgraphs to nd a larger strategy,
which are themselves found using the Lift rule. Starting at an edge represnting
some goal gx on a proof tree and terminating at gy via some intermediate goal
gz, the resulting PSGraph would be:</p>
      <p>P SGraph(psgraph; gx; gy)
subgraph 1(psgraph; gx; gz);
subgraph 2(psgraph; gz; gy):</p>
      <p>Loop introduces iteration, which is represented in PSGraph as an additional
output edge from a node looping back round to act as an additional input to
that node. Strategies learned using Loop will always have two clauses: a base
case in which the tactic is applied once and a recursive case where it is applied
repeatedly:</p>
      <p>P SGraph base(psgraph; gx; gy)</p>
      <p>P SGraph rec(psgraph; gx; gy)
predicate(wpred; gx); tactic(tactic; gx; gy):
P SGraph base(psgraph; gx; gz);</p>
      <p>P SGraph rec(psgraph; gz; gy):
Theorem 1. A PSGraph constructed using Lift, Chain and Loop will have a
single typed input, no outputs and all wires will be labelled with appropriate
predicates.</p>
      <p>Proof. The proof follows by rule induction over Lift, Chain and Loop. The base
case Lift is trivial: there will be single input wire labelled with a learnt wire
predicate to a node containing the tactic. By the induction hypothesis (IH),
two components sequentially composed using Chain are guaranteed to have one
input. To connect them a wire is added from the rst to the second with the
same input wire predicate as the second. The input of the composition is that
of the rst. For Loop, we know by the IH that the base case has one input
with a wire predicate. The same predicate will label the feedback wire from the
node's output, as illustrated in Fig. 5. When using a tail-recursive metarule,
in order to ensure termination MIL must abduce a base case where the the
order constraints associated with the metarules ensure convergence. For Loop,
the learnt PSGraph is thus guaranteed to have a non-recursive component (see
Fig. 5). If the output is \chained" to another component then the output from
the non-recursive component is used. Finally, there may be multiple clauses
describing a learnt PSGraph and without loss of generality we assume there are
two clauses, A and B, as in Fig. 5 (shaded). By the IH we know that both A
and B have a single input with predicates X and Y respectively. Here, A and
B are put side by side and an identity box (which does nothing) is added with
one input. This has predicate X _ Y . The outputs are then sent to A and B
using their respective types. They will not have output wires. If this composed
box is chained to another component R, which has input wire with predicate Z,
then the output of all non-recursive components are combined to an idenity box
which is plugged to R. Any wires introduced will have predicate R.
Theorem 2. The translation of Lift, Chain and Loop into a PSGraph of Fig.
5 will generate an unique PSGraph.</p>
      <p>Note that this assumes that Metagol has successfully learned suitable wire
predicates for each edge, either by predicate invention or by nding them in teh
background information. We omit the proof of this but note that the language
to express these in PSGraph is very close to Prolog so the translation is
straightforward.</p>
      <p>The metarules in Fig. 4 are used to learn proof strategies from an encoding of
a proof tree. As a running example we will use the shaded sub-proof tree of Fig.
2 (middle) to illustrate this encoding, and how MIL learns the shaded PSGraph
on the right of this gure.</p>
      <p>We must de ne a sub-tree in the context of proof trees, which will be more
restrictive than the standard de nition, in order to support learning from them.
This is because for a given tactic in a proof tree with more than one output,
a sub-tree must capture every output. If one is left out then the generalisation
may not be valid as not all (sub-)goals have been handled. A sub-tree is de ned
in terms of its boundary within a proof tree:
De nition 2 (Proof tree). A proof tree is a tree with a dummy root node, each
edge labelled by a unique named goal, and all other nodes tactic applications.
De nition 3 (Proof sub-tree boundary). Let E be the set of edges in a proof
tree P . A boundary of P is a pair (g; gs) where: g 2 E and gs is a non-empty
list where 8g0 2 gs, g0 2 E; 8g0 2 gs g0 is reachable from g; the path length from
g is equal for all g0 2 gs, unless g0 is terminal (see below); if g0 is a goal in gs
such that no other goal in gs has a longer path to g, then @g00 2= gs such that g00
is after g and has equal path length as g0 to g or g00 is terminal with a shorter
path length.</p>
      <p>To illustrate, (g0; [g3]) and (g2; [g4; g5]) are valid subtree boundaries while (g2; [g4; g7])
is not as g4 and g7 are not equidistant from g2 (and there are other nodes which
are).</p>
      <p>From the Lift rule, we see that a tactic R will have the form R : tactic(X; Y ).
For example, the tactic rule impI will be encoded as rule impI : tactic. Our
running proof sub-tree is encoded as:
rule impI : tactic(g0; g1): rule impI : tactic(g1; g2): rule impI : tactic(g2; g3):</p>
      <p>If a tactic produces two sub-goals then two predicates are created, e.g. the
step that turns g3 into g4 and g5 is represented by the clauses:
erule impE : tactic(g3; g4):</p>
      <p>erule impE : tactic(g3; g5):</p>
      <p>For tactics that do not produce any sub-goals (e.g. assumption), a dummy goal
is created with no goal information in order to preserve the syntax. We call such
goals terminal. All other goals contain a set of hypotheses and a conclusion, which
we provide projections of. For example, g2 is A ! B; B ! C; A ` A ! C.</p>
      <sec id="sec-5-1">
        <title>It has three hypothesis: A ! B, B ! C and A, and the conclusion: A ! C.</title>
        <p>These terms are projected from the goals by hyp:gdata and concl:gdata. To
illustrate use of these, the wire predicate used by the assumption tactic requires
the same term to be in the hypothesis and conclusion: :
has asm : wpred(G)</p>
        <p>hyp:gdata(G; T ); concl:gdata(G; T ):
Isabelle internally stores terms as typed lambda expressions [20], using De Bruijn
indices [5] to abstract over names of bound variables. The following clauses are
used to encode terms:</p>
        <p>b(I) c(S) v(S) app(T; U ) lambda(V; T ) exists(T ) f orall(Y )
Here, b is the De Bruijn index used for a bound variable, c is a constant, v is a
variable, app application, and lambda is a binder. While exists and f orall can
be expressed by lambda we have simpli ed them as they appear frequently.</p>
        <p>We are not trying to learn new terms, and so have not translated their
underlying types. We do not type these predicates for simplicity7. To illustrate the
encoding, the goal information for g2 becomes:
hyp : gdata(g2; app(app(c( !); c(a)); c(b))):
hyp : gdata(g2; app(app(c( !); c(b)); c(c))):
concl : gdata(g2; app(app(c( !); c(a)); c(c))):
Note that a type gdata is used to represent goal data information. By treating
Prolog, which is rst-order, as a meta-language in this way we can fully encode
7 Strictly speaking we should have written e.g. b : term(I), c : term(S) and so on.
the higher-order logic of Isabelle by abstracting away from some of its features.
This can be applied to any formula provided by Isabelle and written in its
declarative tactic language.</p>
        <p>An advantage ILP techniques have over machine learning techniques used in
e.g. [12,6,8], is that we can enrich the background clauses and use this to guide
and simplify learning. For example, we provide de nitions to extract the top
level symbol in a conclusion or hypothesis8:
topsymbol : gdata(G; X)
hypsymbol : gdata(G; X)
concl : gdata(G; app(app(X; A); B)):
hyp : gdata(G; app(app(X; A); B)):
For our running example we have that
topsymbol : gdata(g0;
!) topsymbol : gdata(g1;
!) topsymbol : gdata(g2;
!):</p>
        <p>With these de nitions we de ne a proof-tree encoding as:
De nition 4 (Proof-tree encoding). In a proof-tree encoding each step of
the proof tree is encoded as a relation of type tactic, with associated encoding of
the goal information in terms of their hypotheses and conclusion.</p>
        <p>In order to learn the wire predicates, properties of the terms have to be
learned. To support this, we introduce a set of atomic term operators :
De nition 5 (Atomic term operator). The following clauses are atomic
term operators: const : gdata(c(X)), var : gdata(v(X)), bound : gdata(b(X)),
lef t : gdata(app(X; Y ); X), right : gdata(app(X; Y ); Y ), into : gdata
(f orall(X); X), into : gdata(exists(X); X) and into : gdata(lambda(V; X); X).
For example, the X in app(app(X; A); B) can be projected by two consecutive
lef t : gdata applications. We can now de ne our learning problem:
De nition 6 (Typed MIL of PSGraph). In typed MIL of PSGraph a
binary relation of type psgraph is learned where the background information at
least contains encodings of one or more proof trees together with atomic term
operators, and the given examples are one or more subtree boundaries of the
encoded proof trees.</p>
        <p>When using sub-trees and not the full trees we can learn sub-strategies and,
as discussed later, we can apply dependent learning to learn increasily larger
sub-strategies.</p>
        <p>Returning to our running example of Fig. 2 (where our goal is to learn a
strategy rimp:psgraph describing the shaded tree), Metagol will use a metarule
of the psgraph type as this is the type given for rimp. In our case, Loop will
trigger learning of the \loop body" Q : psgraph(x; z). Here, Lift is applied using
the background information rule impI : tactic(g0; g1) to generate rule impI :
tactic(A; B). Metagol must also nd a wire predicate, of type wpred for the
input. Since no wpred clauses are given in the background information, Metagol
8 With correct metarules, Metagol may be able to nd better de nitions.
must invent one. WBin instantiates B in topsymbol : gdata(A; B) to c(imp),
which is the top symbol of the conclusion of g0. This invented predicate is called
imp goal : wpred9 and the invented graph component is called simpI:psgraph.
In order to reach g3, and end the loop (base case), Lift is again applied to nd a
clause with body identical to simpI:psgraph. The learnt program then becomes:
rimpI : psgraph(A; B)
rimpI : psgraph(A; B)
simpI : psgraph(A; B)
imp goal : wpred(A)
simpI : psgraph(A; C); rimpI : psgraph(C; B): (Loop)
imp goal : wpred(A); rule impI : tactic(A; B): (Lift)
imp goal : wpred(A); rule impI : tactic(A; B): (Lift)
topsymbol : gdata(A; c(imp)): (WBin)
A PSGraph encoding of this, following our described translation, is given in Fig
6. Our metarules have taken a more \functional view" of iteration, meaning
the solution deviates from the example strategy of Fig 2 as there is a separate
base and step case, which are identical here. We discuss an alternative future
approach in x7, which would discover a strategy closer to the one in Fig 2.</p>
        <p>When considered as a sequence of tactics, we note that proof strategies do
not always terminate in ITP. If the Loop metarule is naively applied, an in nite
sequence of tactics may be introduced. We address this by evaluating the
subgoals generated by each iteration of the tactic(s) on the recursive node, and de ne
termination in terms of an ordering over the goals. Note that we cannot derive
a general ordering for all possible conjectures. With the exception of proof by
contradiction, we can use the total number of symbols for propositional and
predicate logic addressed in this paper. Let symbcount be the total number of
symbols (e.g. 8 or !) for a goal (including conclusion hypotheses). We de ne
our termination measure as</p>
        <p>G1</p>
        <p>G2</p>
        <p>symbcount(G1) &gt; symbcount(G2)
Theorem 3. If all given tactics terminate, then a learnt PSGraph will either
fail or terminate for any input.</p>
        <p>Proof. The proof follows by rule induction over the metarules that generate a
PSGraph. A lifted tactic will either fail or generate output. If it does not have an
outgoing edge or the wire predicates do not t then it will fail. If not, it will
succeed with the goals on the output wire. For chained components we can assume
that each component terminates (or fails) from the IH. The chained component
will therefore trivially terminate (or fail). For Loop, the nested component is
guaranteed to terminate/fail from the IH. By each iteration is guaranteed to
move towards a lower bound, thus the loop will terminate.</p>
        <p>Tool support Fig. 6 shows our tool architecture. All components, except the
stippled line, have been implemented. The shaded areas are external parts and
not contributions of this paper. The tool process is as follows: Isabelle proofs are
captured by the ProofProcess tool [22]. This produces a proof tree in XML form,</p>
        <sec id="sec-5-1-1">
          <title>9 We have renamed the invented names for readability.</title>
          <p>which our Generator parses and translates into a Prolog le as described above.
The generator is implemented in Standard ML on top of Isabelle, supported by
Isabelle libraries. Metagol is applied to this le and will, if successful, produce
a proof strategy represented as a psgraph typed predicate. This can then be
translated to PSGraph, also implemented using Standard ML on top of Isabelle,
and used to automate other proofs. Implementation of this is future work.
In the experiments discussed next translation is handled manually. The
example proofs are converted into Prolog, and the learnt strategy is evaluated to
see if these proofs could be expressed using it. This does not require translation
into a PSGraph.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6 Experiments</title>
      <p>We have experimented with untyped and typed MIL to learn proof strategies
from a collection of 15 proofs in propositional logic10. In each example we
provide tactic de nitions, goal information and the metarule set given in g. 4 as
background information. The experiments were run using YAP on Ubuntu using
a 3.10 GHz Intel i5-2400 CPU with 4GB RAM11.</p>
      <p>Our rst experiment considered determinism of typed MIL in comparison
to untyped MIL. For each example we consider the branching factor ( ) of the
learned strategy, indicating the number of possible proof trees (including partial
trees and failures) which could be constructed by applying the strategy to a goal.
Branch points are found by manual inspection of the learned strategies. These
occur when a goal could follow more than one edge in the graph, either through
over-general wire predictes or edges with the same label. Automated extraction
is not currently supported by PSGraph and is future work.</p>
      <p>In these rst experiments we provided an explicitly-de ned wire predicate
clause for each goal in the background information, with the type label omitted
in the untyped experiments. The experiments were repeated with di erent time
limits (1, 2, 4 and 8 seconds).</p>
      <p>The graph in Fig. 7 (upper left) shows the average for both untyped and
typed strategies compared to an optimum value. This optimum represents a
learned strategy from each example with one branch and thus one proof tree can
be formed for each. Using untyped MIL &gt; 1 initially, indicating more than one
path on average, and increases with time as larger solutions are found. With
typed MIL = 1 initially, remaining constant over time. The results show that
10 The examples are taken from: isabelle.in.tum.de/exercises
11 Code for all experiments available at: https://sites.google.com/site/cifarquhar/</p>
      <p>for untyped (UT) and typed (T) MIL.
as time increases untyped MIL learns less e cient strategies. Conversely, typed
MIL produces strategies which are optimally e cient.</p>
      <p>These di erences are due to how Metagol constructs its' solutions. Consider a
strategy consisting of repeated applications of a single tactic. Typed MIL forms
this using the Loop rule, where both base and step cases must include psgraph
clauses. These are formed using lifted tactic clauses with a wpred clause. In
untyped MIL there is no requirement to use psgraph clauses, and so a solution
is found using tactic clauses with no wire predicates. Consequently when a goal
passes through the node there is no wire predicate to direct it either towards
the next tactic or around the loop, and so both must be tried. There is a similar
issue whenever there are multiple outputs from a node, and it is this lack of
pre-conditions which results in the higher .</p>
      <p>At rst neither untyped nor typed MIL was able to learn from every example
within the given time limit. This was due to the size of the solution required;
the time needed grows exponentially with the number of clauses in the solution.
In the untyped case this is less pronounced as the solutions are less detailed. In
typed MIL a solution contains one clause for every lifted tactic plus a number of
clauses describing how they link together, resulting in larger de nitions for the
same strategies. We address this by using dependent learning to learn smaller
sub-strategies as described in x4.</p>
      <p>By using dependent learning we reduce the time taken to learn more
complex strategies, however this has the trade-o of taking longer to nd simpler
strategies. We have still not achieved 100% success within the given time frame,
nor are the additional strategies learned fully deterministic. However, there is a
signi cantly smaller branching factor in the untyped case (Fig. 7, upper right).
This is again due to Metagol's construction of solutions: Metagol will always
use the \simplest" (usually smallest) solution. When using dependent learning
to learn sub-strategies describing sub-trees this means that the best metarule
to use will generally be Chain, as each sub-strategy can then be extended one
node at a time. Consequently there are fewer potential branch points for untyped
MIL, while typed MIL is largely una ected.</p>
      <p>We now address the problem of inventing pre-conditions. In previous
experiments we have provided explicit pre-conditions in the background information,
now we look for Metagol to invent de nitions based on the goal data provided.
Results of these tests are shown in Fig. 7 (lower right).</p>
      <p>The metarules restrict possible solutions, re ected in the weakening of the
strategies seen here. De nitions are limited to using top symbol, hyp symbol,
hyp and concl, which rules out other predicates being found. Since Metagol is
forced to nd a wpred clause in the typed case, and a simple monadic predicate in
the untyped case, it must produce a de nition which is too generalised and which
would permit goals to follow an incorrect edge in the corresponding graph. Future
work will include experiments using a minimal set of metarules from which others
can be inferred, allowing Metagol to nd better de nitions for pre-conditions,
including using the lef t( ; )/right( ; ) notation introduced in De nition 5.</p>
      <p>We note from Fig. 7 that is now reduced for untyped MIL, and has increased
for typed MIL. We also observe in Metagol's output limited reuse of invented
predicates, mainly those de ning pre-conditions. Although limited in scale, this
illustrates how previously learned (sub-)strategy de nitions can be reused in
later examples. Future work will include scaling this up with larger examples
and working with larger sub-strategies.</p>
      <p>
        As a nal experiment we attempt to learn a strategy in predicate logic, using
the example 8A B: A ^ B ! B ^ A. Using the notation given in De nition 5,
we de ne the initial goals as:
concl(gdata; a0; f orall(f orall(app(app(c(imp); app(app(c(conj); b(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )); b(0)));
app(app(c(conj); b(0)); b(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )))))):
concl(gdata; a1; f orall(app(app(c(imp); app(app(c(conj); v(a)); b(0)));
app(app(c(conj); b(0)); v(a))))):
      </p>
      <p>As quanti ers are removed from the goal as the proof is evaluated ( rstly with
the rule allI tactic) we replace the corresponding bound variables. This
experiment learned a strategy similar to the running example shown in x4, including
invented predicates.</p>
    </sec>
    <sec id="sec-7">
      <title>7 Conclusion and further work</title>
      <p>Our rst experiment was able to learn proof strategies from 86% of the examples
and thus supports (C1) - that MIL is capable of learning proof strategies in
our framework. However, in terms of branching, untyped MIL seems to o er
no improvements over previous work [6,12] as it does not learn wire predicates
to provide explanation for a strategy. We have introduced types in the MIL
framework by adding an additional constant argument to the predicate (C2).
The results show that typed MIL learns wire predicates and reduces branching,
although we were able to learn strategies from fewer examples. Our assertion
(C3) that typed MIL reduces non-determinism is distinct from success rate,
however, and so is validated. The introduction of types means a larger number
of clauses to represent strategies, which increases the run time for Metagol and
is the reason for failure in most cases. This is addressed through the use of
dependent learning, and the slight increase in successful learning validates (C4).</p>
      <p>We will investigate a way of improving our learned strategies by using a
combinator-based approach to learning. This will involve the development of
more complex metarules in order to capture branching within a single clause
representing one node, rather than the multiple clauses currently required. We
will experiment with making our de nitions functional and introducing
combinators (such as OR and LOOP ) to handle multiple outputs. As discussed in x5,
we aim to learn strategies closer to the one in g. 2.</p>
      <p>We will move on from examples focused on propositional logic and begin to
look at more complex proofs. As shown in x5, we have already begun
implementing a representation for higher-order logic, allowing us to look at predicate
logic. We will also look at examples from group theory, including those used by
[12], some geometry and we will attempt to learn the rippling proof strategy [2],
which will require inventing very complex wire predicates.</p>
      <p>As one of the key motivations for this work is the reasoning behind applying
a given strategy, we will also investigate learning wire predicates from existing
complex tactics with the aim of improving their e ciency, eg. Isabelle's auto
tactic. This will be one instance which requires learning from multiple positive
examples, which we have had some success with already. We will make further
use of dependent learning to enable us to reuse parts of strategies which can be
applied to multiple proofs in order to help us with this.</p>
      <p>We would also like to show the advantages of typed MIL for other domains:
the approach we have taken should be applicable for most cases where labelled
graphs are learnt, while we have started experimenting on extending previous
work on learning robot strategies [4] with argument types. A current student
project is investigating the use of MIL to learn safety case patterns. Longer
term, we plan to study `type invention' and support for `higher order types'.
8. T. Gransden, N. Walkinshaw, and R. Raman. Sepia: Search for proofs using
inferred automata. In A. P. Felty and A. Middeldorp, editors, CADE-25, volume
9195 of LNCS, pages 246{255. Springer, 2015.
9. G. Grov, A. Kissinger, and Y. Lin. A graphical language for proof strategies. In</p>
      <p>
        LPAR, volume 8312 of LNCS, pages 324{339. Springer, 2013.
10. J. Heras, E. Komendantskaya, M. Johansson, and E. Maclean. Proof-pattern
recognition and lemma discovery in acl2. In LPAR, pages 389{406. Springer, 2013.
11. D. T. Ho, M. Zhang, and K. Ogata. A case study on extracting the characteristics
of the reachable states of a state machine formalizing a communication protocol
with inductive logic programing. In PreProceedings of ILP 2015.
12. M. Jamnik, M. Kerber, M. Pollet, and C. Benzmuller. Automatic learning of proof
methods in proof planning. Logic Journal of IGPL, 11(
        <xref ref-type="bibr" rid="ref6">6</xref>
        ):647{673, 2003.
13. C. Kaliszyk and J. Urban. Learning-assisted automated reasoning with yspeck.
      </p>
      <p>
        JAR, 53(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ):173{213, 2014.
14. E. Komendantskaya, J. Heras, and G. Grov. Machine learning in proof general:
      </p>
      <p>
        Interfacing interfaces. In UITP 2012, pages 15{41, 2013.
15. E. Komendantskaya. Machine learning coalgebraic proofs. Short Post-proceedings
of ILP, 11, 2011.
16. D. Lin, E. Dechter, K. Ellis, J. Tenenbaum, and S. Muggleton. Bias reformulation
for one-shot function induction. In Proceedings of the 23rd European Conference on
Arti cial Intelligence (ECAI 2014), pages 525{530, Amsterdam, 2014. IOS Press.
17. M. T. Llano, A. Ireland, and A. Pease. Discovery of invariants through automated
theory formation. Formal Aspects of Computing, 26(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ):203{249, 2014.
18. S. Muggleton, D. Lin, N. Pahlavi, and A. Tamaddoni-Nezhad. Meta-interpretive
learning: application to grammatical inference. Machine Learning, 94:25{49, 2014.
19. S. Muggleton, D. Lin, and A. Tamaddoni-Nezhad. Meta-interpretive learning of
higher-order dyadic datalog: Predicate invention revisited. Machine Learning, 2015.
      </p>
      <p>
        Published online: DOI 10.1007/s10994-014-5471-y.
20. L. C. Paulson. The foundation of a generic theorem prover. JAR, 5(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ):363{397,
1989.
21. A. Velykis, G. Grov, and L. Freitas. Contributions to AI4FM 2015. Available from
http://www.ai4fm.org/papers/ai4fm-2015-proceedings.pdf.
22. A. Velykis. Capturing Proof Process. PhD thesis, Newcastle University, 2015.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.</given-names>
            <surname>Alrajeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kramer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Russo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Uchitel</surname>
          </string-name>
          .
          <article-title>Elaborating requirements using model checking and inductive learning</article-title>
          .
          <source>Software Engineering</source>
          , IEEE Transactions on,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <volume>361</volume>
          {
          <fpage>383</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Bundy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Basin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ireland</surname>
          </string-name>
          . Rippling:
          <article-title>Meta-level Guidance for Mathematical Reasoning</article-title>
          . Cambridge University Press,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Bundy</surname>
          </string-name>
          , G. Grov, and
          <string-name>
            <given-names>C. B.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Learning from experts to aid the automation of proof search</article-title>
          .
          <source>In AVoCS'09, CSR-2-2009</source>
          , pages
          <fpage>229</fpage>
          {
          <fpage>232</fpage>
          .
          <string-name>
            <given-names>Swansea</given-names>
            <surname>Uni</surname>
          </string-name>
          .,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Cropper</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          .
          <article-title>Learning e cient logical robot strategies involving composable objects</article-title>
          .
          <source>In IJCAI</source>
          ,
          <year>2015</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>N. G. De Bruijn</surname>
          </string-name>
          .
          <article-title>Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the church-rosser theorem</article-title>
          .
          <source>In Indagationes Mathematicae</source>
          , volume
          <volume>75</volume>
          , pages
          <fpage>381</fpage>
          {
          <fpage>392</fpage>
          .
          <string-name>
            <surname>Elsevier</surname>
          </string-name>
          ,
          <year>1972</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>H.</given-names>
            <surname>Duncan</surname>
          </string-name>
          .
          <article-title>The use of Data-Mining for the Automatic Formation of Tactics</article-title>
          .
          <source>PhD thesis</source>
          , University of Edinburgh,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>M. D. Ernst</surname>
            ,
            <given-names>J. H.</given-names>
          </string-name>
          <string-name>
            <surname>Perkins</surname>
            ,
            <given-names>P. J.</given-names>
          </string-name>
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McCamant</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Pacheco</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          <string-name>
            <surname>Tschantz</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Xiao</surname>
          </string-name>
          .
          <article-title>The daikon system for dynamic detection of likely invariants</article-title>
          .
          <source>Science of Computer Programming</source>
          ,
          <volume>69</volume>
          (
          <issue>1</issue>
          ):
          <volume>35</volume>
          {
          <fpage>45</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>