=Paper=
{{Paper
|id=Vol-1195/long7
|storemode=property
|title=Toward an Improved Downward Refinement Operator for Inductive Logic Programming
|pdfUrl=https://ceur-ws.org/Vol-1195/long7.pdf
|volume=Vol-1195
|dblpUrl=https://dblp.org/rec/conf/cilc/Ferilli14
}}
==Toward an Improved Downward Refinement Operator for Inductive Logic Programming==
Toward an Improved Downward Refinement Operator for Inductive Logic Programming S. Ferilli1,2 1 Dipartimento di Informatica – Università di Bari stefano.ferilli@uniba.it 2 Centro Interdipartimentale per la Logica e sue Applicazioni – Università di Bari Abstract. In real-world supervised Machine Learning tasks, the learned theory can be deemed as valid only until there is evidence to the con- trary (i.e., new observations that are wrongly classified by the theory). In such a case, incremental approaches allow to revise the existing the- ory to account for the new evidence, instead of learning a new theory from scratch. In many cases, positive and negative examples are pro- vided in a mixed and unpredictable order, which requires generalization and specialization refinement operators to be available for revising the hypotheses in the existing theory when it is inconsistent with the new examples. The space of Datalog Horn clauses under the OI assumption allows the existence of refinement operators that fulfill desirable proper- ties. However, the versions of these operators currently available in the literature are not able to handle some refinement tasks. The objective of this work is paving the way for an improved version of the specialization operator, aimed at extending its applicability. 1 Introduction Supervised Machine Learning approaches based on First-Order Logic represen- tations are particularly indicated in real-world tasks in which the relationships among objects play a relevant role in the definition of the concepts of interest. Given an initial set of examples, a theory can be learned from them by providing a learning system with the whole ‘batch’ of examples. However, being inductive inference only falsity preserving, the learned theory can be deemed as valid only until there is evidence to the contrary (i.e., new observations that are wrongly classified by the theory). In such a case, either a new theory is to be learned from scratch using the new batch made up of both the old and the new examples, or the existing theory must be incrementally revised to account for the new evi- dence as well. To distinguish these two stages, we may call them ‘training’ and ‘tuning’, respectively. In extreme cases, the initial batch is not available at all, and learning must start and proceed incrementally from scratch. In many real cases, positive and negative examples are provided in a mixed and unpredictable order to the tuning phase, which requires two di↵erent refinement operator to be available for revising the hypotheses in the existing theory when it is inconsistent 99 S. Ferilli. Toward an Improved Downward Refinement Operator for Inductive Logic Programming with the new examples. A generalization operator is needed to refine a hypothe- sis that does not account for a positive example, while a specialization operator must be applied to refine a hypothesis that erroneously accounts for a negative example. So, the kind of modifications that are applied to the theory change its behavior non-monotonically. The research on incremental approaches is not very wide, due to the intrinsic complexity of learning in environments where the available information about the concepts to be learned is not completely known in advance, especially in a First-Order Logic (FOL) setting. Thus, the literature published some years ago still represents the state-of-the-art for several aspects of interest. The focus of this paper is on supervised incremental inductive learning of logic theories from examples, and specifically on the extension of existing spe- cialization operators. Indeed, while these operators have a satisfactory behavior when trying to add positive literals to a concept definition, the way they handle the addition of negative information has some shortcomings that, if solved, would allow a broader range of concepts to be learned. Here we point out these short- comings, and propose both improvements of the existing operator definitions, and extensions to them. Theoretical results on the new version of the operator are sketched, and an algorithm for it is provided and commented. The solution has been implemented and embedded in the multistrategy incremental learning system InTheLEx [3]. The next section lays the logic framework in which we cast our proposal; then, Sections 3 and 4 introduce the learning framework in general and the state-of-the-art specialization operator for it in particular. Sec- tion 5 describes our new proposal, and finally Section 6 concludes the paper. Due to lack of space, proofs of theoretical results will not be provided. 2 Preliminaries The logic framework in which we build our solution exploits Datalog [1, 4] as a representation language. Syntactically, it can be considered as a sublanguage of Prolog in which no function symbols are allowed. I.e., a Datalog term can only be a variable or a constant, which avoids potentially infinite nesting in terms and hence simplifies clause handling by the operators we will define in the following. The missing expressiveness of function symbols can be recovered by Flattening [9], a representational change that transforms a set of clauses containing function symbols into another, semantically equivalent to it, made up of function-free clauses1 . In a nutshell, each n-ary function symbol is associated to a new (n+1)-ary predicate, where the added argument represents the function result. Functions are replaced, in the literals in which they appear, by variables or constants representing their result. In the following, we will denote by body(C) and head(C) the set of literals in the body and the atom in the head of a Horn clause C, respectively. Pure 1 Flattening potentially generates an infinite function free program. This is not our case, where we aim at learning from examples, and thus our universe is limited by what we see in the examples. 100 S. Ferilli. Toward an Improved Downward Refinement Operator for Inductive Logic Programming Datalog does not allow the use of negation in the body of clauses. A first way to overcome this limitation is the Closed World Assumption (CWA): if a fact does not logically follows from a set of Datalog clauses, then we assume its negation to be true. This allows the deduction of negative facts, but not their use to infer other information. Conversely, real world description often needs rules containing negative information. Datalog¬ allows to use negated literals in clauses body, at the cost of a further safety condition: each variable occurring in a negated literal must occur in another positive literal of the body too. Since there can be several minimal Herbrand models instead of a single a least one, CWA cannot be used. In Stratified Datalog¬ this problem is solved by partitioning a program P in n sets P i (called layers) s.t.: 1. all rules that define the same predicate in P are in the same layer; 2. P 1 contains only clauses without negated literals, or whose negated literals correspond to predicates defined by facts in the knowledge base; 3. each layer P i , i > 1, contains only clauses whose negated literals are com- pletely defined in lower level layers (i.e., layers P j with j < i). Such a partition is called stratification, and P is called stratified. A stratified program P with stratification P 1 , . . . , P n is evaluated by growing layers, applying to each one CWA locally to the knowledge base made up by the original knowledge base and by all literals obtained by the evaluation of the previous layers. There may be di↵erent stratifications for a given program, but all are equiv- alent as regards the evaluation result. Moreover, not all programs are stratified. The following notion allows to know if they are. Definition 1 (Extended dependence graph) Let P be a Datalog¬ program. The extended dependence graph of P , EDG(P ), is a directed graph whose nodes represent predicates defined by rules in P , and there is an edge hp, qi if q occurs in the body of a rule defining p. An edge hp, qi is labeled with ¬ if there exists at least one rule having p as its head and ¬q in its body. A program P is stratified if EDG(P ) contains no cycles containing edges marked with ¬. The evaluation of a stratified program produces a minimal Herbrand model, called perfect model. A specific kind of negation is expressed by the inequality built-in predicate 6=. Using only this negation in Datalog yields an extension denoted by Datalog6= . 2.1 Object Identity We deal with Datalog under the Object identity (OI) assumption, defined as follows: Definition 2 (Object Identity) Within a clause, terms denoted with di↵erent symbols must be distinct. 101 S. Ferilli. Toward an Improved Downward Refinement Operator for Inductive Logic Programming This notion is the basis for the definition of an equational theory for Datalog clauses that adds one rewrite rule to the set of the axioms of Clark’s Equality Theory (CET) [6]: t 6= s 2 body(C) for each clause C in L and for all pairs t, s of distinct terms that occur in C (OI) where L denotes the language that consists of all the possible Datalog clauses built from a finite number of predicates. The (OI) rewrite rule can be viewed as an extension of both Reiter’s unique-names assumption [8] and axioms (7), (8) and (9) of CET to the variables of the language. DatalogOI is a sublanguage of Datalog6= resulting from the application of OI to Datalog. Under OI, any Datalog clause C generates a new Datalog6= clause COI consisting of two components, called core and constraints: – core(COI ) = C and – constraints(COI ) = {t 6= s | t, s 2 terms(C) ^ t, s distinct} are the inequalities generated by the (OI) rewrite rule. Formally, a DatalogOI program is made up of a set of Datalog6= clauses of the form l0 : l 1 , . . . , l n , c 1 , . . . , c m where the li ’s are as in Datalog, and the cj ’s are the inequalities generated by the (OI) rule and n 0. Nevertheless, DatalogOI has the same expressive power as Datalog, that is, for any Datalog program we can find a DatalogOI program equivalent to it [11]. 2.2 OI -subsumption Applying the OI assumption to the representation language causes the classical ordering relations among clauses to be modified, thus yielding a new structure of the corresponding search spaces for the refinement operators. The ordering relation defined by the notion of ✓-subsumption under OI upon Datalog clauses [2, 10] is ✓OI -subsumption. Definition 3 (✓OI -subsumption ordering) Let C, D be Datalog clauses. D ✓-subsumes C under OI (D ✓OI -subsumes C), written C OI D, i↵ 9 substi- tution s.t. DOI . ✓ COI . This means that D is more general than or equivalent to C (in a theory revision setting, D is an upward refinement of C and C is a downward refinement of D) under OI. C