=Paper= {{Paper |id=Vol-1195/long7 |storemode=property |title=Toward an Improved Downward Refinement Operator for Inductive Logic Programming |pdfUrl=https://ceur-ws.org/Vol-1195/long7.pdf |volume=Vol-1195 |dblpUrl=https://dblp.org/rec/conf/cilc/Ferilli14 }} ==Toward an Improved Downward Refinement Operator for Inductive Logic Programming== https://ceur-ws.org/Vol-1195/long7.pdf

Toward an Improved Downward Refinement
Operator for Inductive Logic Programming

S. Ferilli1,2
1
Dipartimento di Informatica – Università di Bari
stefano.ferilli@uniba.it
2
Centro Interdipartimentale per la Logica e sue Applicazioni – Università di Bari

Abstract. In real-world supervised Machine Learning tasks, the learned
theory can be deemed as valid only until there is evidence to the con-
trary (i.e., new observations that are wrongly classified by the theory).
In such a case, incremental approaches allow to revise the existing the-
ory to account for the new evidence, instead of learning a new theory
from scratch. In many cases, positive and negative examples are pro-
vided in a mixed and unpredictable order, which requires generalization
and specialization refinement operators to be available for revising the
hypotheses in the existing theory when it is inconsistent with the new
examples. The space of Datalog Horn clauses under the OI assumption
allows the existence of refinement operators that fulfill desirable proper-
ties. However, the versions of these operators currently available in the
literature are not able to handle some refinement tasks. The objective of
this work is paving the way for an improved version of the specialization
operator, aimed at extending its applicability.

1 Introduction

Supervised Machine Learning approaches based on First-Order Logic represen-
tations are particularly indicated in real-world tasks in which the relationships
among objects play a relevant role in the definition of the concepts of interest.
Given an initial set of examples, a theory can be learned from them by providing
a learning system with the whole ‘batch’ of examples. However, being inductive
inference only falsity preserving, the learned theory can be deemed as valid only
until there is evidence to the contrary (i.e., new observations that are wrongly
classified by the theory). In such a case, either a new theory is to be learned from
scratch using the new batch made up of both the old and the new examples, or
the existing theory must be incrementally revised to account for the new evi-
dence as well. To distinguish these two stages, we may call them ‘training’ and
‘tuning’, respectively. In extreme cases, the initial batch is not available at all,
and learning must start and proceed incrementally from scratch. In many real
cases, positive and negative examples are provided in a mixed and unpredictable
order to the tuning phase, which requires two di↵erent refinement operator to be
available for revising the hypotheses in the existing theory when it is inconsistent

99
S. Ferilli. Toward an Improved Downward Refinement Operator for Inductive Logic Programming

with the new examples. A generalization operator is needed to refine a hypothe-
sis that does not account for a positive example, while a specialization operator
must be applied to refine a hypothesis that erroneously accounts for a negative
example. So, the kind of modifications that are applied to the theory change
its behavior non-monotonically. The research on incremental approaches is not
very wide, due to the intrinsic complexity of learning in environments where the
available information about the concepts to be learned is not completely known
in advance, especially in a First-Order Logic (FOL) setting. Thus, the literature
published some years ago still represents the state-of-the-art for several aspects
of interest.
The focus of this paper is on supervised incremental inductive learning of
logic theories from examples, and specifically on the extension of existing spe-
cialization operators. Indeed, while these operators have a satisfactory behavior
when trying to add positive literals to a concept definition, the way they handle
the addition of negative information has some shortcomings that, if solved, would
allow a broader range of concepts to be learned. Here we point out these short-
comings, and propose both improvements of the existing operator definitions,
and extensions to them. Theoretical results on the new version of the operator
are sketched, and an algorithm for it is provided and commented. The solution
has been implemented and embedded in the multistrategy incremental learning
system InTheLEx [3]. The next section lays the logic framework in which we
cast our proposal; then, Sections 3 and 4 introduce the learning framework in
general and the state-of-the-art specialization operator for it in particular. Sec-
tion 5 describes our new proposal, and finally Section 6 concludes the paper.
Due to lack of space, proofs of theoretical results will not be provided.

2 Preliminaries
The logic framework in which we build our solution exploits Datalog [1, 4] as
a representation language. Syntactically, it can be considered as a sublanguage
of Prolog in which no function symbols are allowed. I.e., a Datalog term can
only be a variable or a constant, which avoids potentially infinite nesting in
terms and hence simplifies clause handling by the operators we will define in
the following. The missing expressiveness of function symbols can be recovered
by Flattening [9], a representational change that transforms a set of clauses
containing function symbols into another, semantically equivalent to it, made up
of function-free clauses1 . In a nutshell, each n-ary function symbol is associated
to a new (n+1)-ary predicate, where the added argument represents the function
result. Functions are replaced, in the literals in which they appear, by variables
or constants representing their result.
In the following, we will denote by body(C) and head(C) the set of literals
in the body and the atom in the head of a Horn clause C, respectively. Pure
1
Flattening potentially generates an infinite function free program. This is not our
case, where we aim at learning from examples, and thus our universe is limited by
what we see in the examples.

100
S. Ferilli. Toward an Improved Downward Refinement Operator for Inductive Logic Programming

Datalog does not allow the use of negation in the body of clauses. A first way to
overcome this limitation is the Closed World Assumption (CWA): if a fact does
not logically follows from a set of Datalog clauses, then we assume its negation
to be true. This allows the deduction of negative facts, but not their use to infer
other information. Conversely, real world description often needs rules containing
negative information. Datalog¬ allows to use negated literals in clauses body, at
the cost of a further safety condition: each variable occurring in a negated literal
must occur in another positive literal of the body too.
Since there can be several minimal Herbrand models instead of a single a
least one, CWA cannot be used. In Stratified Datalog¬ this problem is solved by
partitioning a program P in n sets P i (called layers) s.t.:

1. all rules that define the same predicate in P are in the same layer;
2. P 1 contains only clauses without negated literals, or whose negated literals
correspond to predicates defined by facts in the knowledge base;
3. each layer P i , i > 1, contains only clauses whose negated literals are com-
pletely defined in lower level layers (i.e., layers P j with j < i).

Such a partition is called stratification, and P is called stratified.
A stratified program P with stratification P 1 , . . . , P n is evaluated by growing
layers, applying to each one CWA locally to the knowledge base made up by
the original knowledge base and by all literals obtained by the evaluation of the
previous layers.
There may be di↵erent stratifications for a given program, but all are equiv-
alent as regards the evaluation result. Moreover, not all programs are stratified.
The following notion allows to know if they are.

Definition 1 (Extended dependence graph) Let P be a Datalog¬ program.
The extended dependence graph of P , EDG(P ), is a directed graph whose nodes
represent predicates defined by rules in P , and there is an edge hp, qi if q occurs
in the body of a rule defining p.
An edge hp, qi is labeled with ¬ if there exists at least one rule having p as its
head and ¬q in its body.

A program P is stratified if EDG(P ) contains no cycles containing edges marked
with ¬. The evaluation of a stratified program produces a minimal Herbrand
model, called perfect model.
A specific kind of negation is expressed by the inequality built-in predicate
6=. Using only this negation in Datalog yields an extension denoted by Datalog6= .

2.1 Object Identity

We deal with Datalog under the Object identity (OI) assumption, defined as
follows:

Definition 2 (Object Identity)
Within a clause, terms denoted with di↵erent symbols must be distinct.

101
S. Ferilli. Toward an Improved Downward Refinement Operator for Inductive Logic Programming

This notion is the basis for the definition of an equational theory for Datalog
clauses that adds one rewrite rule to the set of the axioms of Clark’s Equality
Theory (CET) [6]:
t 6= s 2 body(C) for each clause C in L and
for all pairs t, s of distinct terms that occur in C (OI)
where L denotes the language that consists of all the possible Datalog clauses
built from a finite number of predicates. The (OI) rewrite rule can be viewed as
an extension of both Reiter’s unique-names assumption [8] and axioms (7), (8)
and (9) of CET to the variables of the language.
DatalogOI is a sublanguage of Datalog6= resulting from the application of OI
to Datalog. Under OI, any Datalog clause C generates a new Datalog6= clause
COI consisting of two components, called core and constraints:
– core(COI ) = C and
– constraints(COI ) = {t 6= s | t, s 2 terms(C) ^ t, s distinct}
are the inequalities generated by the (OI) rewrite rule.
Formally, a DatalogOI program is made up of a set of Datalog6= clauses of
the form
l0 : l 1 , . . . , l n , c 1 , . . . , c m
where the li ’s are as in Datalog, and the cj ’s are the inequalities generated by
the (OI) rule and n 0. Nevertheless, DatalogOI has the same expressive power
as Datalog, that is, for any Datalog program we can find a DatalogOI program
equivalent to it [11].

2.2 OI -subsumption

Applying the OI assumption to the representation language causes the classical
ordering relations among clauses to be modified, thus yielding a new structure
of the corresponding search spaces for the refinement operators.
The ordering relation defined by the notion of ✓-subsumption under OI upon
Datalog clauses [2, 10] is ✓OI -subsumption.
Definition 3 (✓OI -subsumption ordering) Let C, D be Datalog clauses. D
✓-subsumes C under OI (D ✓OI -subsumes C), written C OI D, i↵ 9 substi-
tution s.t. DOI . ✓ COI . This means that D is more general than or equivalent
to C (in a theory revision setting, D is an upward refinement of C and C is a
downward refinement of D) under OI. C