<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The E ect of Predicate Order on Curriculum Learning in ILP</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hank Conn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephen H. Muggleton</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Imperial College London</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The second author acknowledges support from his Royal Academy of Engineering/Syngenta Research Chair at the Department of Computing at Imperial College London. Copyright c by the paper's authors. Copying permitted for private and academic purposes. In: Nicolas Lachiche, Christel Vrain (eds.): Late Breaking Papers of ILP 2017</institution>
          ,
          <addr-line>Orleans</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1991</year>
      </pub-date>
      <fpage>17</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>Development of e ective methods for learning large programs is arguably one of the hardest unsolved problems within ILP. The most obvious approach involves learning a sequence of predicate de nitions incrementally. This approach is known as Curriculum Learning. However, Quinlan and Cameron-Jones' paper from 1993 indicates di culties in this approach since the predictive accuracy of ILP systems, such as FOIL, rapidly degrades given a growing set of learned background predicates, even when a reasonable ordering over the predicate sequence is chosen. Limited progress was made in this problem until the recent advent of bias-reformulation methods within Meta-Interpretive Learning. In this paper we show empirically that given a well-ordered predicate sequence, relatively large sets of dyadic predicates can be learned incrementally using a state-of-the-art Meta-Interpretive Learning system which employs a Universal Set of metarules. However, further experiments show how progressive random permutations of the sequence rapidly degrades performance in a fashion comparable to Quinlan and Cameron-Jones's results. On the basis of these results we propose the need for further identi cation of methods for identifying well-ordered predicate sequences to address this issue.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>learning is its ability to automatically identify a good sequence ordering for the learning. A second aspect is
that MIL, unlike FOIL, is guaranteed to nd minimal predicate de nitions, avoiding problems with over tting
in the presence of large amounts of background knowledge. Since the minimal representation of the target
theory either stays the same or shrinks monotonically with expanding background predicates, this can lead to
reductions in search in the case that hypotheses are considered in increasing order of their size.</p>
      <sec id="sec-1-1">
        <title>In this paper we explore the e ect that predicate sequence choice has on learning performance. In particular,</title>
        <p>we show that a) with a set of inter-related family relations, MIL produces e cient and e ective learning in
the case of a well chosen predicate sequence and b) performance degrades gradually with progressive random
permutations of such an ordering. These results re-enforce the need for techniques, such as dependent learning,
for addressing the problems of learning large logic programs.</p>
      </sec>
      <sec id="sec-1-2">
        <title>This paper is organised as follows. Section 2 describes related work. The Meta-Interpretive Learning framework is described in Section 3. In Section 4 we describe the implementation of Metagol and the algorithm for running the experiments. The experiments are described in Section 5. We conclude and describe further work in Section 6.</title>
        <p>2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <sec id="sec-2-1">
        <title>Induction of large programs from data is one of the long term aims of ILP [11]. However, even when learning</title>
        <p>
          individual predicate de nitions the complexity of admissable search grows exponentially [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Although Quinlan's
FOIL [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] provides e cient heuristic search, the lack of admissable search leads to problems with incompleteness
associated with zero-gain literals [13] as well as hard issues relating to mutual recursion in multi-predicate
learning [
          <xref ref-type="bibr" rid="ref13 ref5">14, 5</xref>
          ]. One initially promising avenue to avoid Quinlan and Cameron-Jones' problem with increasing
background knowledge was presented by Srinivasan et al [
          <xref ref-type="bibr" rid="ref14">15</xref>
          ] who showed that using admissable search Progol
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] can simultaneously increase accuracy while decreasing search time. The reason is that increasing background
knowledge reduces the minimal size of a consistent solution, allowing Progol to consequently reduce both the
search size and the degree of over tting for single clause solutions as relevant background knowledge is provided
incrementally.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Recent advances in the area of Meta-Interpretive Learning (MIL) [10] have demonstrated a way in which</title>
        <p>
          higher-order background knowledge in the form of metarules and abstractions [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] can further constrain
admissable hypothesis space search, leading to decreases in search time and increases in predictive accuracy. While
Progol guarantees minimal solutions for single clause searches, Metagol [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] achieves minimal and admissable
multi-clause predicate de nition searches by iterative deepening. However, MIL learns de nitions incrementally,
which opens the question of e ects related to the order in which predicates are learned. Results in our
experiments indicate that if the idealised ordering used in experiments is randomly permuted, predictive accuracy
degrades rapidly. It is therefore necessary to consider how the order of predicate de nition learning should be
selected automatically. Inital results in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] indicate that a technique referred to as dependent learning can be
e ective in selecting such an ordering, though it has still to be clari ed what the properties of a target theory
are for dependent learning to have guaranteed e ectiveness.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Framework</title>
      <p>
        MIL [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] is a form of ILP based on an adapted Prolog meta-interpreter. Whereas a standard Prolog
metainterpreter attempts to prove a goal by repeatedly fetching rst-order clauses whose heads unify with a given
goal, a MIL learner attempts to prove a set of goals by repeatedly fetching higher-order metarules (Fig. 1b)
whose heads unify with a given goal. The resulting meta-substitutions are saved in an abduction store, and
can be re-used in later proofs. Following the proof of a set of goals, a hypothesis is formed by applying the
meta-substitutions onto their corresponding metarules, allowing for a form of ILP which supports predicate
invention and the learning of recursive theories.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Implementation</title>
      <p>prove([]; P rog; P rog):
prove([AtomjAs]; P rog1; P rog2) :
metarule(Name; MetaSub; (Atom :- Body); Order);
Order;
abduce(metasub(Name; MetaSub); P rog1; P rog3);
prove(Body; P rog3; P rog4);
prove(As; P rog4; P rog2):
(a) Prolog code for generalised meta-interpreter</p>
      <p>R
z
y
(b) Metarules with associated ordering constraints, where
is a pre-de ned ordering over symbols in the signature. The
letters P , Q, and R denote existentially quanti ed
higherorder variables. x, y, and z denote universally quanti ed
rst-order variables.
predicate list = list of predicates in standard order
for each permutation:</p>
      <p>pick two predicates in predicate list and swap their position
for each predicate in predicate list:
generate prolog code
execute metagol learning task
check learned de nition for accuracy
save learned de nition as background knowledge</p>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <sec id="sec-5-1">
        <title>Hypotheses</title>
      </sec>
      <sec id="sec-5-2">
        <title>Materials</title>
        <p>Two hypotheses were tested: (1) learning a series of predicate de nitions using the universal set of metarules
does not decrease performance in Metagol and (2) learning predicates in a randomized order would lead to lower
accuracy.</p>
        <p>
          Learning was based on family relationships in Hindi [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. In Hindi there are a number of terms without speci c
words in English. For example in Hindi there are di erent terms for your father's brother (taaoo) than for
your mother's brother (maamaa). Hindi also has speci c terms for complex concepts such as the daughter of
your mother's brother (mameri). Family trees of 5000 individuals were randomly generated. The trees were
written to a Prolog le containing all of the facts for each individual in terms of the background predicates
male/1, female/1, father/2, and mother/2. This le could then be used as background knowledge for learning
the de nitions of family relationships. In total 43 family relationship concepts were assembled2, along with
their de nitions in Prolog to be learned progressively. The concepts were placed into a reasonable order for
learning, where for example the simpler concepts brother and daughter are learned before the more complex
concept mother's brother's daughter.
        </p>
        <sec id="sec-5-2-1">
          <title>1Available at https://github.com/metagol/metagol web interface 2All experimental code and materials available at https://github.com/metagol/ILP2017</title>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Results</title>
        <p>In the rst experiment the training set was 1% and test set 10% of the total number of positive and negative
examples for each predicate, randomly selected with replacement. Predicate accuracies and running times were
averaged over 50 trials. In the second experiment the predicates were learned in a randomized order. Predictive
accuracies were averaged over 50 trials for each increment of number of swaps. Positive and negative examples
were randomly sampled in equal number for each learning task (averaging 56 training examples and 226 test
examples) in order to give a default predicate accuracy of 50% for a majority class predicator. The experiments
were run on a Windows 7 operating system running YAP 6.2.2, the latest Metagol code from github3 (as of
2017-02-12), and with a test harness running on Java 8 (jre1.8.0 121). The MySQL instance shared by the web
interface and test harness was on version 10.1.19-MariaDB.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and further work</title>
      <p>This paper revisits issues related to Quinlan Cameron-Jones' demonstration that the performance of systems
such as FOIL progressively degrades with increasing numbers of predicates. We show that given a reasonable
ordering, such as that provided to FOIL, Metagol's performance does not degrade while progressively learning</p>
      <sec id="sec-6-1">
        <title>3Available at https://github.com/metagol/metagol</title>
        <p>43 Hindi family relationships. However, when the reasonable ordering over predicates learned is randomly
perturbed, predictive accuracy also progressively degrades.</p>
        <sec id="sec-6-1-1">
          <title>In further work we hope to investigate the degree to which dependent learning guarantees the discovery of a \reasonable order" for learning a large set of predicate de nitions.</title>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Louradour,</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Collobert</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          . Curriculum learning,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>I. Bratko.</surname>
          </string-name>
          <article-title>Prolog Programming for Arti cial Intelligence</article-title>
          . Addison-Wesley, London,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cropper</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.H.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          .
          <article-title>Logical minimisation of meta-rules within meta-interpretive learning</article-title>
          .
          <source>In Proceedings of the 24th International Conference on Inductive Logic Programming</source>
          , pages
          <volume>65</volume>
          {
          <fpage>78</fpage>
          . Springer-Verlag,
          <year>2015</year>
          . LNAI 9046.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cropper</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.H.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          .
          <article-title>Learning higher-order logic programs through abstraction and invention</article-title>
          .
          <source>In Proceedings of the 25th International Joint Conference Arti cial Intelligence (IJCAI</source>
          <year>2016</year>
          ), pages
          <fpage>1418</fpage>
          {
          <fpage>1424</fpage>
          .
          <string-name>
            <surname>IJCAI</surname>
          </string-name>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>De Raedt</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Lavrac</surname>
          </string-name>
          .
          <article-title>Multiple predicate learning in two inductive logic programming settings</article-title>
          .
          <source>Journal on Pure and Applied Logic</source>
          ,
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <volume>227</volume>
          {
          <fpage>254</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Dechter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ellis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.B.</given-names>
            <surname>Tenenbaum</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.H.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          .
          <article-title>Bias reformulation for one-shot function induction</article-title>
          .
          <source>In Proceedings of the 23rd European Conference on Arti cial Intelligence (ECAI</source>
          <year>2014</year>
          ), pages
          <fpage>525</fpage>
          {
          <fpage>530</fpage>
          , Amsterdam,
          <year>2014</year>
          . IOS Press.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.S.</given-names>
            <surname>McGregor. The Oxford</surname>
          </string-name>
          Hindi-English
          <string-name>
            <surname>Dictionary</surname>
          </string-name>
          . Oxford University Press,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.H.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          .
          <source>Inverse entailment and Progol. New Generation Computing</source>
          ,
          <volume>13</volume>
          :
          <fpage>245</fpage>
          {
          <fpage>286</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.H.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pahlavi</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Tamaddoni-Nezhad</surname>
          </string-name>
          .
          <article-title>Meta-interpretive learning: application to grammatical inference</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>94</volume>
          :
          <fpage>25</fpage>
          {
          <fpage>49</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.H.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Tamaddoni-Nezhad</surname>
          </string-name>
          .
          <article-title>Meta-interpretive learning of higher-order dyadic datalog: Predicate invention revisited</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>100</volume>
          (
          <issue>1</issue>
          ):
          <volume>49</volume>
          {
          <fpage>73</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.H.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          , L. De Raedt,
          <string-name>
            <given-names>D.</given-names>
            <surname>Poole</surname>
          </string-name>
          , I. Bratko, P. Flach, and
          <string-name>
            <given-names>K.</given-names>
            <surname>Inoue</surname>
          </string-name>
          .
          <article-title>ILP turns 20: biography and future challenges</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>86</volume>
          (
          <issue>1</issue>
          ):3{
          <fpage>23</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.R.</given-names>
            <surname>Quinlan</surname>
          </string-name>
          .
          <article-title>Learning logical de nitions from relations</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>5</volume>
          :
          <fpage>239</fpage>
          {
          <fpage>266</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.R.</given-names>
            <surname>Quinlan</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.M</given-names>
            <surname>Cameron-Jones</surname>
          </string-name>
          .
          <article-title>FOIL: a midterm report</article-title>
          . In P. Brazdil, editor,
          <source>Proceedings of the 6th European Conference on Machine Learning</source>
          , volume
          <volume>667</volume>
          <source>of Lecture Notes in Arti cial Intelligence</source>
          , pages
          <fpage>3</fpage>
          <lpage>{</lpage>
          20. Springer-Verlag,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.H.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          , , and
          <string-name>
            <given-names>R.D.</given-names>
            <surname>King</surname>
          </string-name>
          .
          <article-title>Comparing the use of background knowledge by inductive logic programming systems</article-title>
          . In L. De Raedt, editor,
          <source>Proceedings of the Fifth International Inductive Logic Programming Workshop. Katholieke Universteit Leuven</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>