=Paper= {{Paper |id=Vol-2085/connLBP-ILP2017 |storemode=property |title=The Effect of Predicate Order on Curriculum Learning in ILP |pdfUrl=https://ceur-ws.org/Vol-2085/connLBP-ILP2017.pdf |volume=Vol-2085 |authors=Hank Conn,Stephen H. Muggleton |dblpUrl=https://dblp.org/rec/conf/ilp/ConnM17 }} ==The Effect of Predicate Order on Curriculum Learning in ILP== https://ceur-ws.org/Vol-2085/connLBP-ILP2017.pdf
    The Effect of Predicate Order on Curriculum Learning
                            in ILP∗

                                     Hank Conn and Stephen H. Muggleton
                                                  Imperial College London




                                                          Abstract
                        Development of effective methods for learning large programs is ar-
                        guably one of the hardest unsolved problems within ILP. The most
                        obvious approach involves learning a sequence of predicate defini-
                        tions incrementally. This approach is known as Curriculum Learn-
                        ing. However, Quinlan and Cameron-Jones’ paper from 1993 indi-
                        cates difficulties in this approach since the predictive accuracy of
                        ILP systems, such as FOIL, rapidly degrades given a growing set
                        of learned background predicates, even when a reasonable ordering
                        over the predicate sequence is chosen. Limited progress was made in
                        this problem until the recent advent of bias-reformulation methods
                        within Meta-Interpretive Learning. In this paper we show empirically
                        that given a well-ordered predicate sequence, relatively large sets of
                        dyadic predicates can be learned incrementally using a state-of-the-art
                        Meta-Interpretive Learning system which employs a Universal Set of
                        metarules. However, further experiments show how progressive ran-
                        dom permutations of the sequence rapidly degrades performance in a
                        fashion comparable to Quinlan and Cameron-Jones’s results. On the
                        basis of these results we propose the need for further identification of
                        methods for identifying well-ordered predicate sequences to address
                        this issue.

1    Introduction
In their seminal paper Quinlan and Cameron-Jones [14] demonstrated the difficulties in using FOIL [12] to learn
a sequence of inter-related predicate definitions. This approach is now referred to as Curiculum Learning [1]. The
experiment involved eighteen list-processing definitions learned incrementally in the sequence order presented
in Chapter 3 of Bratko’s textbook for learning Prolog [2]. Although predicates in the first half of the sequence
were learned accurately and efficiently from extensive tabulated sets of examples, predictive accuracy degraded
substantially in the second half of the sequence, leading to FOIL timing-out with error-prone definitions. While
some of this behaviour was due to difficulties in effective learning of mutually recursive definitions, the dominant
feature was that of progressive expansion of the search space due to the growing vocabulary of learned predicate
definitions being added to the background knowledge.
   By contrast, more recent work in Meta-Interpretive Learning (MIL) [10] has shown that the use of bias-
reformulation methods, such as dependent learning [6] allows accurate and efficient learning of extended se-
quences of string-transformation predicates from small numbers of examples. A key feature of dependent
    ∗
    The second author acknowledges support from his Royal Academy of Engineering/Syngenta Research Chair at the Department
of Computing at Imperial College London.
Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: Nicolas Lachiche, Christel Vrain (eds.): Late Breaking Papers of ILP 2017, Orléans, France, September 4-6, 2017, published at
http://ceur-ws.org




                                                          17
learning is its ability to automatically identify a good sequence ordering for the learning. A second aspect is
that MIL, unlike FOIL, is guaranteed to find minimal predicate definitions, avoiding problems with overfitting
in the presence of large amounts of background knowledge. Since the minimal representation of the target
theory either stays the same or shrinks monotonically with expanding background predicates, this can lead to
reductions in search in the case that hypotheses are considered in increasing order of their size.
   In this paper we explore the effect that predicate sequence choice has on learning performance. In particular,
we show that a) with a set of inter-related family relations, MIL produces efficient and effective learning in
the case of a well chosen predicate sequence and b) performance degrades gradually with progressive random
permutations of such an ordering. These results re-enforce the need for techniques, such as dependent learning,
for addressing the problems of learning large logic programs.
   This paper is organised as follows. Section 2 describes related work. The Meta-Interpretive Learning frame-
work is described in Section 3. In Section 4 we describe the implementation of Metagol and the algorithm for
running the experiments. The experiments are described in Section 5. We conclude and describe further work
in Section 6.


2   Related work
Induction of large programs from data is one of the long term aims of ILP [11]. However, even when learning
individual predicate definitions the complexity of admissable search grows exponentially [8]. Although Quinlan’s
FOIL [12] provides efficient heuristic search, the lack of admissable search leads to problems with incompleteness
associated with zero-gain literals [13] as well as hard issues relating to mutual recursion in multi-predicate
learning [14, 5]. One initially promising avenue to avoid Quinlan and Cameron-Jones’ problem with increasing
background knowledge was presented by Srinivasan et al [15] who showed that using admissable search Progol
[8] can simultaneously increase accuracy while decreasing search time. The reason is that increasing background
knowledge reduces the minimal size of a consistent solution, allowing Progol to consequently reduce both the
search size and the degree of overfitting for single clause solutions as relevant background knowledge is provided
incrementally.
    Recent advances in the area of Meta-Interpretive Learning (MIL) [10] have demonstrated a way in which
higher-order background knowledge in the form of metarules and abstractions [4] can further constrain admiss-
able hypothesis space search, leading to decreases in search time and increases in predictive accuracy. While
Progol guarantees minimal solutions for single clause searches, Metagol [10] achieves minimal and admissable
multi-clause predicate definition searches by iterative deepening. However, MIL learns definitions incrementally,
which opens the question of effects related to the order in which predicates are learned. Results in our exper-
iments indicate that if the idealised ordering used in experiments is randomly permuted, predictive accuracy
degrades rapidly. It is therefore necessary to consider how the order of predicate definition learning should be
selected automatically. Inital results in [6] indicate that a technique referred to as dependent learning can be
effective in selecting such an ordering, though it has still to be clarified what the properties of a target theory
are for dependent learning to have guaranteed effectiveness.


3   Framework
MIL [9, 10] is a form of ILP based on an adapted Prolog meta-interpreter. Whereas a standard Prolog meta-
interpreter attempts to prove a goal by repeatedly fetching first-order clauses whose heads unify with a given
goal, a MIL learner attempts to prove a set of goals by repeatedly fetching higher-order metarules (Fig. 1b)
whose heads unify with a given goal. The resulting meta-substitutions are saved in an abduction store, and
can be re-used in later proofs. Following the proof of a set of goals, a hypothesis is formed by applying the
meta-substitutions onto their corresponding metarules, allowing for a form of ILP which supports predicate
invention and the learning of recursive theories.


4   Implementation
Figure 1a shows MetagolDF [6], an implementation of the MIL framework, similar in form to a standard Prolog
meta-interpreter. A universal set of metarules (see [3]) (Fig. 1b) is defined separately.




                                                   18
                                                                        Name       Metarule                       Order
       prove([], P rog, P rog).                                         Inverse    P (x, y) ← Q(y, x)             P Q
       prove([Atom|As], P rog1, P rog2) : −                             Chain      P (x, y) ← Q(x, z), R(z, y)    P  Q, P  R
        metarule(N ame, M etaSub, (Atom :- Body), Order),               TailRec    P (x, y) ← Q(x, z), P (z, y)   P  Q, x  z  y
        Order,
        abduce(metasub(N ame, M etaSub), P rog1, P rog3),         (b) Metarules with associated ordering constraints, where 
        prove(Body, P rog3, P rog4),                              is a pre-defined ordering over symbols in the signature. The
        prove(As, P rog4, P rog2).
                                                                  letters P , Q, and R denote existentially quantified higher-
       (a) Prolog code for generalised meta-interpreter           order variables. x, y, and z denote universally quantified
                                                                  first-order variables.

                        Figure 1: MetagolDF meta-interpreter (a) and universal metarules (b)

                                   predicate list = list of predicates in standard order
                                   for each permutation:
                                      pick two predicates in predicate list and swap their position
                                   for each predicate in predicate list:
                                      generate prolog code
                                      execute metagol learning task
                                      check learned definition for accuracy
                                      save learned definition as background knowledge


                                        Figure 2: Trial procedure for experiments

4.1    Metagol Interface
A user interface1 was created for manually entering data and configurations needed to run a learning task with
Metagol. The user can enter predicates and constants, fill in background knowledge, select metarules, enter a
predicate to learn, and enter the training set of positive and negative examples. Once the run is assembled, the
user can generate the Prolog code for running the task in Metagol, execute the code on the server, and view the
results. Finally, the definitions learned in the task can be saved in the database for the given user, and included
as background knowledge in subsequent learning tasks. This allows the user to progressively learn large logic
programs.

4.2     Algorithm of implementation used for experiments
Figure 2 shows the procedure used for each trial of the experiment. A Metagol learning task is executed for
each predicate, and the learned definition is used as additional background knowledge for each successive task.

5     Experiments
Hypotheses
Two hypotheses were tested: (1) learning a series of predicate definitions using the universal set of metarules
does not decrease performance in Metagol and (2) learning predicates in a randomized order would lead to lower
accuracy.

Materials
Learning was based on family relationships in Hindi [7]. In Hindi there are a number of terms without specific
words in English. For example in Hindi there are different terms for your father’s brother (taaoo) than for
your mother’s brother (maamaa). Hindi also has specific terms for complex concepts such as the daughter of
your mother’s brother (mameri). Family trees of 5000 individuals were randomly generated. The trees were
written to a Prolog file containing all of the facts for each individual in terms of the background predicates
male/1, female/1, father/2, and mother/2. This file could then be used as background knowledge for learning
the definitions of family relationships. In total 43 family relationship concepts were assembled2 , along with
their definitions in Prolog to be learned progressively. The concepts were placed into a reasonable order for
learning, where for example the simpler concepts brother and daughter are learned before the more complex
concept mother’s brother’s daughter.
    1 Available at https://github.com/metagol/metagol web interface
    2 All experimental code and materials available at https://github.com/metagol/ILP2017




                                                            19
Methods
In the first experiment the training set was 1% and test set 10% of the total number of positive and negative
examples for each predicate, randomly selected with replacement. Predicate accuracies and running times were
averaged over 50 trials. In the second experiment the predicates were learned in a randomized order. Predictive
accuracies were averaged over 50 trials for each increment of number of swaps. Positive and negative examples
were randomly sampled in equal number for each learning task (averaging 56 training examples and 226 test
examples) in order to give a default predicate accuracy of 50% for a majority class predicator. The experiments
were run on a Windows 7 operating system running YAP 6.2.2, the latest Metagol code from github3 (as of
2017-02-12), and with a test harness running on Java 8 (jre1.8.0 121). The MySQL instance shared by the web
interface and test harness was on version 10.1.19-MariaDB.

Results
Figure 3a shows that in experiment 1 the mean accuracy of the learned definitions stayed near 100%, and Figure
3b shows learning times remained consistently below 1s (with some outliers) throughout the series, confirming
the first hypothesis.




                    (a) Predictive accuracy                                    (b) Learning time

                                      Figure 3: Learning results for experiment 1
  Figure 4 shows the mean accuracy gradually decreased as increasing numbers of predicates were swapped
out of order, confirming the second hypothesis.




                                      Figure 4: Learning results for experiment 2


6     Conclusions and further work
This paper revisits issues related to Quinlan Cameron-Jones’ demonstration that the performance of systems
such as FOIL progressively degrades with increasing numbers of predicates. We show that given a reasonable
ordering, such as that provided to FOIL, Metagol’s performance does not degrade while progressively learning
    3 Available at https://github.com/metagol/metagol




                                                        20
43 Hindi family relationships. However, when the reasonable ordering over predicates learned is randomly
perturbed, predictive accuracy also progressively degrades.
   In further work we hope to investigate the degree to which dependent learning guarantees the discovery of a
“reasonable order” for learning a large set of predicate definitions.

References
 [1] Y. Bengio, J. Louradour, , R. Collobert, R., and J. Weston. Curriculum learning, 2009.
 [2] I. Bratko. Prolog Programming for Artificial Intelligence. Addison-Wesley, London, 1986.
 [3] A. Cropper and S.H. Muggleton. Logical minimisation of meta-rules within meta-interpretive learning. In Proceedings
     of the 24th International Conference on Inductive Logic Programming, pages 65–78. Springer-Verlag, 2015. LNAI
     9046.
 [4] A. Cropper and S.H. Muggleton. Learning higher-order logic programs through abstraction and invention. In
     Proceedings of the 25th International Joint Conference Artificial Intelligence (IJCAI 2016), pages 1418–1424. IJCAI,
     2016.
 [5] L. De Raedt and N. Lavrac. Multiple predicate learning in two inductive logic programming settings. Journal on
     Pure and Applied Logic, 4(2):227–254, 1996.
 [6] D. Lin, E. Dechter, K. Ellis, J.B. Tenenbaum, and S.H. Muggleton. Bias reformulation for one-shot function
     induction. In Proceedings of the 23rd European Conference on Artificial Intelligence (ECAI 2014), pages 525–530,
     Amsterdam, 2014. IOS Press.
 [7] R.S. McGregor. The Oxford Hindi-English Dictionary. Oxford University Press, 1993.
 [8] S.H. Muggleton. Inverse entailment and Progol. New Generation Computing, 13:245–286, 1995.
 [9] S.H. Muggleton, D. Lin, N. Pahlavi, and A. Tamaddoni-Nezhad. Meta-interpretive learning: application to gram-
     matical inference. Machine Learning, 94:25–49, 2014.
[10] S.H. Muggleton, D. Lin, and A. Tamaddoni-Nezhad. Meta-interpretive learning of higher-order dyadic datalog:
     Predicate invention revisited. Machine Learning, 100(1):49–73, 2015.
[11] S.H. Muggleton, L. De Raedt, D. Poole, I. Bratko, P. Flach, and K. Inoue. ILP turns 20: biography and future
     challenges. Machine Learning, 86(1):3–23, 2011.
[12] J.R. Quinlan. Learning logical definitions from relations. Machine Learning, 5:239–266, 1990.
[13] J.R. Quinlan. Determinate literals in inductive logic programming. In IJCAI-91: Proceedings of the Twelfth
     International Joint Conference on Artificial Intelligence, pages 746–750, San Mateo, CA:, 1991. Morgan-Kaufmann.
[14] J.R. Quinlan and R.M Cameron-Jones. FOIL: a midterm report. In P. Brazdil, editor, Proceedings of the 6th
     European Conference on Machine Learning, volume 667 of Lecture Notes in Artificial Intelligence, pages 3–20.
     Springer-Verlag, 1993.
[15] A. Srinivasan, S.H. Muggleton, , and R.D. King. Comparing the use of background knowledge by inductive logic
     programming systems. In L. De Raedt, editor, Proceedings of the Fifth International Inductive Logic Programming
     Workshop. Katholieke Universteit Leuven, 1995.




                                                      21