=Paper= {{Paper |id=Vol-1195/long7 |storemode=property |title=Toward an Improved Downward Refinement Operator for Inductive Logic Programming |pdfUrl=https://ceur-ws.org/Vol-1195/long7.pdf |volume=Vol-1195 |dblpUrl=https://dblp.org/rec/conf/cilc/Ferilli14 }} ==Toward an Improved Downward Refinement Operator for Inductive Logic Programming== https://ceur-ws.org/Vol-1195/long7.pdf
      Toward an Improved Downward Refinement
      Operator for Inductive Logic Programming

                                        S. Ferilli1,2
                   1
                     Dipartimento di Informatica – Università di Bari
                              stefano.ferilli@uniba.it
    2
      Centro Interdipartimentale per la Logica e sue Applicazioni – Università di Bari




         Abstract. In real-world supervised Machine Learning tasks, the learned
         theory can be deemed as valid only until there is evidence to the con-
         trary (i.e., new observations that are wrongly classified by the theory).
         In such a case, incremental approaches allow to revise the existing the-
         ory to account for the new evidence, instead of learning a new theory
         from scratch. In many cases, positive and negative examples are pro-
         vided in a mixed and unpredictable order, which requires generalization
         and specialization refinement operators to be available for revising the
         hypotheses in the existing theory when it is inconsistent with the new
         examples. The space of Datalog Horn clauses under the OI assumption
         allows the existence of refinement operators that fulfill desirable proper-
         ties. However, the versions of these operators currently available in the
         literature are not able to handle some refinement tasks. The objective of
         this work is paving the way for an improved version of the specialization
         operator, aimed at extending its applicability.



1      Introduction

Supervised Machine Learning approaches based on First-Order Logic represen-
tations are particularly indicated in real-world tasks in which the relationships
among objects play a relevant role in the definition of the concepts of interest.
Given an initial set of examples, a theory can be learned from them by providing
a learning system with the whole ‘batch’ of examples. However, being inductive
inference only falsity preserving, the learned theory can be deemed as valid only
until there is evidence to the contrary (i.e., new observations that are wrongly
classified by the theory). In such a case, either a new theory is to be learned from
scratch using the new batch made up of both the old and the new examples, or
the existing theory must be incrementally revised to account for the new evi-
dence as well. To distinguish these two stages, we may call them ‘training’ and
‘tuning’, respectively. In extreme cases, the initial batch is not available at all,
and learning must start and proceed incrementally from scratch. In many real
cases, positive and negative examples are provided in a mixed and unpredictable
order to the tuning phase, which requires two di↵erent refinement operator to be
available for revising the hypotheses in the existing theory when it is inconsistent


                                         99
S. Ferilli. Toward an Improved Downward Refinement Operator for Inductive Logic Programming


    with the new examples. A generalization operator is needed to refine a hypothe-
    sis that does not account for a positive example, while a specialization operator
    must be applied to refine a hypothesis that erroneously accounts for a negative
    example. So, the kind of modifications that are applied to the theory change
    its behavior non-monotonically. The research on incremental approaches is not
    very wide, due to the intrinsic complexity of learning in environments where the
    available information about the concepts to be learned is not completely known
    in advance, especially in a First-Order Logic (FOL) setting. Thus, the literature
    published some years ago still represents the state-of-the-art for several aspects
    of interest.
        The focus of this paper is on supervised incremental inductive learning of
    logic theories from examples, and specifically on the extension of existing spe-
    cialization operators. Indeed, while these operators have a satisfactory behavior
    when trying to add positive literals to a concept definition, the way they handle
    the addition of negative information has some shortcomings that, if solved, would
    allow a broader range of concepts to be learned. Here we point out these short-
    comings, and propose both improvements of the existing operator definitions,
    and extensions to them. Theoretical results on the new version of the operator
    are sketched, and an algorithm for it is provided and commented. The solution
    has been implemented and embedded in the multistrategy incremental learning
    system InTheLEx [3]. The next section lays the logic framework in which we
    cast our proposal; then, Sections 3 and 4 introduce the learning framework in
    general and the state-of-the-art specialization operator for it in particular. Sec-
    tion 5 describes our new proposal, and finally Section 6 concludes the paper.
    Due to lack of space, proofs of theoretical results will not be provided.


    2     Preliminaries
    The logic framework in which we build our solution exploits Datalog [1, 4] as
    a representation language. Syntactically, it can be considered as a sublanguage
    of Prolog in which no function symbols are allowed. I.e., a Datalog term can
    only be a variable or a constant, which avoids potentially infinite nesting in
    terms and hence simplifies clause handling by the operators we will define in
    the following. The missing expressiveness of function symbols can be recovered
    by Flattening [9], a representational change that transforms a set of clauses
    containing function symbols into another, semantically equivalent to it, made up
    of function-free clauses1 . In a nutshell, each n-ary function symbol is associated
    to a new (n+1)-ary predicate, where the added argument represents the function
    result. Functions are replaced, in the literals in which they appear, by variables
    or constants representing their result.
        In the following, we will denote by body(C) and head(C) the set of literals
    in the body and the atom in the head of a Horn clause C, respectively. Pure
    1
        Flattening potentially generates an infinite function free program. This is not our
        case, where we aim at learning from examples, and thus our universe is limited by
        what we see in the examples.



                                            100
S. Ferilli. Toward an Improved Downward Refinement Operator for Inductive Logic Programming


    Datalog does not allow the use of negation in the body of clauses. A first way to
    overcome this limitation is the Closed World Assumption (CWA): if a fact does
    not logically follows from a set of Datalog clauses, then we assume its negation
    to be true. This allows the deduction of negative facts, but not their use to infer
    other information. Conversely, real world description often needs rules containing
    negative information. Datalog¬ allows to use negated literals in clauses body, at
    the cost of a further safety condition: each variable occurring in a negated literal
    must occur in another positive literal of the body too.
        Since there can be several minimal Herbrand models instead of a single a
    least one, CWA cannot be used. In Stratified Datalog¬ this problem is solved by
    partitioning a program P in n sets P i (called layers) s.t.:

     1. all rules that define the same predicate in P are in the same layer;
     2. P 1 contains only clauses without negated literals, or whose negated literals
        correspond to predicates defined by facts in the knowledge base;
     3. each layer P i , i > 1, contains only clauses whose negated literals are com-
        pletely defined in lower level layers (i.e., layers P j with j < i).

    Such a partition is called stratification, and P is called stratified.
    A stratified program P with stratification P 1 , . . . , P n is evaluated by growing
    layers, applying to each one CWA locally to the knowledge base made up by
    the original knowledge base and by all literals obtained by the evaluation of the
    previous layers.
        There may be di↵erent stratifications for a given program, but all are equiv-
    alent as regards the evaluation result. Moreover, not all programs are stratified.
    The following notion allows to know if they are.

    Definition 1 (Extended dependence graph) Let P be a Datalog¬ program.
    The extended dependence graph of P , EDG(P ), is a directed graph whose nodes
    represent predicates defined by rules in P , and there is an edge hp, qi if q occurs
    in the body of a rule defining p.
    An edge hp, qi is labeled with ¬ if there exists at least one rule having p as its
    head and ¬q in its body.

    A program P is stratified if EDG(P ) contains no cycles containing edges marked
    with ¬. The evaluation of a stratified program produces a minimal Herbrand
    model, called perfect model.
        A specific kind of negation is expressed by the inequality built-in predicate
    6=. Using only this negation in Datalog yields an extension denoted by Datalog6= .


    2.1    Object Identity

    We deal with Datalog under the Object identity (OI) assumption, defined as
    follows:

    Definition 2 (Object Identity)
    Within a clause, terms denoted with di↵erent symbols must be distinct.


                                            101
S. Ferilli. Toward an Improved Downward Refinement Operator for Inductive Logic Programming


    This notion is the basis for the definition of an equational theory for Datalog
    clauses that adds one rewrite rule to the set of the axioms of Clark’s Equality
    Theory (CET) [6]:
     t 6= s 2 body(C) for each clause C in L and
         for all pairs t, s of distinct terms that occur in C                                 (OI)
    where L denotes the language that consists of all the possible Datalog clauses
    built from a finite number of predicates. The (OI) rewrite rule can be viewed as
    an extension of both Reiter’s unique-names assumption [8] and axioms (7), (8)
    and (9) of CET to the variables of the language.
       DatalogOI is a sublanguage of Datalog6= resulting from the application of OI
    to Datalog. Under OI, any Datalog clause C generates a new Datalog6= clause
    COI consisting of two components, called core and constraints:
     – core(COI ) = C and
     – constraints(COI ) = {t 6= s | t, s 2 terms(C) ^ t, s distinct}
       are the inequalities generated by the (OI) rewrite rule.
       Formally, a DatalogOI program is made up of a set of Datalog6= clauses of
    the form
                             l0 : l 1 , . . . , l n , c 1 , . . . , c m
    where the li ’s are as in Datalog, and the cj ’s are the inequalities generated by
    the (OI) rule and n 0. Nevertheless, DatalogOI has the same expressive power
    as Datalog, that is, for any Datalog program we can find a DatalogOI program
    equivalent to it [11].

    2.2     OI -subsumption

    Applying the OI assumption to the representation language causes the classical
    ordering relations among clauses to be modified, thus yielding a new structure
    of the corresponding search spaces for the refinement operators.
        The ordering relation defined by the notion of ✓-subsumption under OI upon
    Datalog clauses [2, 10] is ✓OI -subsumption.
    Definition 3 (✓OI -subsumption ordering) Let C, D be Datalog clauses. D
    ✓-subsumes C under OI (D ✓OI -subsumes C), written C OI D, i↵ 9 substi-
    tution s.t. DOI . ✓ COI . This means that D is more general than or equivalent
    to C (in a theory revision setting, D is an upward refinement of C and C is a
    downward refinement of D) under OI. C