Introduction

The E ect of Predicate Order on Curriculum Learning in ILP

Hank Conn

Stephen H. Muggleton

Imperial College London

0 0 The second author acknowledges support from his Royal Academy of Engineering/Syngenta Research Chair at the Department of Computing at Imperial College London. Copyright c by the paper's authors. Copying permitted for private and academic purposes. In: Nicolas Lachiche, Christel Vrain (eds.): Late Breaking Papers of ILP 2017 , Orleans , France

1991

17 21

Development of e ective methods for learning large programs is arguably one of the hardest unsolved problems within ILP. The most obvious approach involves learning a sequence of predicate de nitions incrementally. This approach is known as Curriculum Learning. However, Quinlan and Cameron-Jones' paper from 1993 indicates di culties in this approach since the predictive accuracy of ILP systems, such as FOIL, rapidly degrades given a growing set of learned background predicates, even when a reasonable ordering over the predicate sequence is chosen. Limited progress was made in this problem until the recent advent of bias-reformulation methods within Meta-Interpretive Learning. In this paper we show empirically that given a well-ordered predicate sequence, relatively large sets of dyadic predicates can be learned incrementally using a state-of-the-art Meta-Interpretive Learning system which employs a Universal Set of metarules. However, further experiments show how progressive random permutations of the sequence rapidly degrades performance in a fashion comparable to Quinlan and Cameron-Jones's results. On the basis of these results we propose the need for further identi cation of methods for identifying well-ordered predicate sequences to address this issue.

Introduction

learning is its ability to automatically identify a good sequence ordering for the learning. A second aspect is that MIL, unlike FOIL, is guaranteed to nd minimal predicate de nitions, avoiding problems with over tting in the presence of large amounts of background knowledge. Since the minimal representation of the target theory either stays the same or shrinks monotonically with expanding background predicates, this can lead to reductions in search in the case that hypotheses are considered in increasing order of their size.

In this paper we explore the e ect that predicate sequence choice has on learning performance. In particular,

we show that a) with a set of inter-related family relations, MIL produces e cient and e ective learning in the case of a well chosen predicate sequence and b) performance degrades gradually with progressive random permutations of such an ordering. These results re-enforce the need for techniques, such as dependent learning, for addressing the problems of learning large logic programs.

This paper is organised as follows. Section 2 describes related work. The Meta-Interpretive Learning framework is described in Section 3. In Section 4 we describe the implementation of Metagol and the algorithm for running the experiments. The experiments are described in Section 5. We conclude and describe further work in Section 6.

Related work Induction of large programs from data is one of the long term aims of ILP [11]. However, even when learning

individual predicate de nitions the complexity of admissable search grows exponentially [ 8 ]. Although Quinlan's FOIL [ 12 ] provides e cient heuristic search, the lack of admissable search leads to problems with incompleteness associated with zero-gain literals [13] as well as hard issues relating to mutual recursion in multi-predicate learning [ 14, 5 ]. One initially promising avenue to avoid Quinlan and Cameron-Jones' problem with increasing background knowledge was presented by Srinivasan et al [ 15 ] who showed that using admissable search Progol [ 8 ] can simultaneously increase accuracy while decreasing search time. The reason is that increasing background knowledge reduces the minimal size of a consistent solution, allowing Progol to consequently reduce both the search size and the degree of over tting for single clause solutions as relevant background knowledge is provided incrementally.

Recent advances in the area of Meta-Interpretive Learning (MIL) [10] have demonstrated a way in which

higher-order background knowledge in the form of metarules and abstractions [ 4 ] can further constrain admissable hypothesis space search, leading to decreases in search time and increases in predictive accuracy. While Progol guarantees minimal solutions for single clause searches, Metagol [ 10 ] achieves minimal and admissable multi-clause predicate de nition searches by iterative deepening. However, MIL learns de nitions incrementally, which opens the question of e ects related to the order in which predicates are learned. Results in our experiments indicate that if the idealised ordering used in experiments is randomly permuted, predictive accuracy degrades rapidly. It is therefore necessary to consider how the order of predicate de nition learning should be selected automatically. Inital results in [ 6 ] indicate that a technique referred to as dependent learning can be e ective in selecting such an ordering, though it has still to be clari ed what the properties of a target theory are for dependent learning to have guaranteed e ectiveness. 3

Framework

MIL [ 9, 10 ] is a form of ILP based on an adapted Prolog meta-interpreter. Whereas a standard Prolog metainterpreter attempts to prove a goal by repeatedly fetching rst-order clauses whose heads unify with a given goal, a MIL learner attempts to prove a set of goals by repeatedly fetching higher-order metarules (Fig. 1b) whose heads unify with a given goal. The resulting meta-substitutions are saved in an abduction store, and can be re-used in later proofs. Following the proof of a set of goals, a hypothesis is formed by applying the meta-substitutions onto their corresponding metarules, allowing for a form of ILP which supports predicate invention and the learning of recursive theories. 4

Implementation

prove([]; P rog; P rog): prove([AtomjAs]; P rog1; P rog2) : metarule(Name; MetaSub; (Atom :- Body); Order); Order; abduce(metasub(Name; MetaSub); P rog1; P rog3); prove(Body; P rog3; P rog4); prove(As; P rog4; P rog2): (a) Prolog code for generalised meta-interpreter

R z y (b) Metarules with associated ordering constraints, where is a pre-de ned ordering over symbols in the signature. The letters P , Q, and R denote existentially quanti ed higherorder variables. x, y, and z denote universally quanti ed rst-order variables. predicate list = list of predicates in standard order for each permutation:

pick two predicates in predicate list and swap their position for each predicate in predicate list: generate prolog code execute metagol learning task check learned de nition for accuracy save learned de nition as background knowledge

Experiments Hypotheses Materials

Two hypotheses were tested: (1) learning a series of predicate de nitions using the universal set of metarules does not decrease performance in Metagol and (2) learning predicates in a randomized order would lead to lower accuracy.

Learning was based on family relationships in Hindi [ 7 ]. In Hindi there are a number of terms without speci c words in English. For example in Hindi there are di erent terms for your father's brother (taaoo) than for your mother's brother (maamaa). Hindi also has speci c terms for complex concepts such as the daughter of your mother's brother (mameri). Family trees of 5000 individuals were randomly generated. The trees were written to a Prolog le containing all of the facts for each individual in terms of the background predicates male/1, female/1, father/2, and mother/2. This le could then be used as background knowledge for learning the de nitions of family relationships. In total 43 family relationship concepts were assembled2, along with their de nitions in Prolog to be learned progressively. The concepts were placed into a reasonable order for learning, where for example the simpler concepts brother and daughter are learned before the more complex concept mother's brother's daughter.

1Available at https://github.com/metagol/metagol web interface 2All experimental code and materials available at https://github.com/metagol/ILP2017 Results

In the rst experiment the training set was 1% and test set 10% of the total number of positive and negative examples for each predicate, randomly selected with replacement. Predicate accuracies and running times were averaged over 50 trials. In the second experiment the predicates were learned in a randomized order. Predictive accuracies were averaged over 50 trials for each increment of number of swaps. Positive and negative examples were randomly sampled in equal number for each learning task (averaging 56 training examples and 226 test examples) in order to give a default predicate accuracy of 50% for a majority class predicator. The experiments were run on a Windows 7 operating system running YAP 6.2.2, the latest Metagol code from github3 (as of 2017-02-12), and with a test harness running on Java 8 (jre1.8.0 121). The MySQL instance shared by the web interface and test harness was on version 10.1.19-MariaDB.

Conclusions and further work

This paper revisits issues related to Quinlan Cameron-Jones' demonstration that the performance of systems such as FOIL progressively degrades with increasing numbers of predicates. We show that given a reasonable ordering, such as that provided to FOIL, Metagol's performance does not degrade while progressively learning

3Available at https://github.com/metagol/metagol

43 Hindi family relationships. However, when the reasonable ordering over predicates learned is randomly perturbed, predictive accuracy also progressively degrades.

In further work we hope to investigate the degree to which dependent learning guarantees the discovery of a \reasonable order" for learning a large set of predicate de nitions.

[1]

Bengio , J. Louradour, ,

Collobert , R. , and

Weston . Curriculum learning, 2009 .

[2] I. Bratko. Prolog Programming for Arti cial Intelligence . Addison-Wesley, London, 1986 .

[3]

Cropper and

S.H.

Muggleton . Logical minimisation of meta-rules within meta-interpretive learning . In Proceedings of the 24th International Conference on Inductive Logic Programming , pages 65 { 78 . Springer-Verlag, 2015 . LNAI 9046.

[4]

Cropper and

S.H.

Muggleton . Learning higher-order logic programs through abstraction and invention . In Proceedings of the 25th International Joint Conference Arti cial Intelligence (IJCAI 2016 ), pages 1418 { 1424 . IJCAI , 2016 .

[5]

De Raedt and

Lavrac . Multiple predicate learning in two inductive logic programming settings . Journal on Pure and Applied Logic , 4 ( 2 ): 227 { 254 , 1996 .

[6]

Lin ,

Dechter ,

Ellis ,

J.B.

Tenenbaum , and

S.H.

Muggleton . Bias reformulation for one-shot function induction . In Proceedings of the 23rd European Conference on Arti cial Intelligence (ECAI 2014 ), pages 525 { 530 , Amsterdam, 2014 . IOS Press.

[7]

R.S.

McGregor. The Oxford Hindi-English Dictionary . Oxford University Press, 1993 .

[8]

S.H.

Muggleton . Inverse entailment and Progol. New Generation Computing , 13 : 245 { 286 , 1995 .

[9]

S.H.

Muggleton ,

Lin ,

Pahlavi , and

Tamaddoni-Nezhad . Meta-interpretive learning: application to grammatical inference . Machine Learning , 94 : 25 { 49 , 2014 .

[10]

S.H.

Muggleton ,

Lin , and

Tamaddoni-Nezhad . Meta-interpretive learning of higher-order dyadic datalog: Predicate invention revisited . Machine Learning , 100 ( 1 ): 49 { 73 , 2015 .

[11]

S.H.

Muggleton , L. De Raedt,

Poole , I. Bratko, P. Flach, and

Inoue . ILP turns 20: biography and future challenges . Machine Learning , 86 ( 1 ):3{ 23 , 2011 .

[12]

J.R.

Quinlan . Learning logical de nitions from relations . Machine Learning , 5 : 239 { 266 , 1990 .

[14]

J.R.

Quinlan and

R.M

Cameron-Jones . FOIL: a midterm report . In P. Brazdil, editor, Proceedings of the 6th European Conference on Machine Learning , volume 667 of Lecture Notes in Arti cial Intelligence , pages 3 { 20. Springer-Verlag, 1993 .

[15]

Srinivasan ,

S.H.

Muggleton , , and

R.D.

King . Comparing the use of background knowledge by inductive logic programming systems . In L. De Raedt, editor, Proceedings of the Fifth International Inductive Logic Programming Workshop. Katholieke Universteit Leuven , 1995 .