=Paper=
{{Paper
|id=Vol-1187/paper-05
|storemode=property
|title=Background Knowledge-enrichment for Bottom Clauses Improving
|pdfUrl=https://ceur-ws.org/Vol-1187/paper-05.pdf
|volume=Vol-1187
|dblpUrl=https://dblp.org/rec/conf/ilp/TexzocotetlaM13
}}
==Background Knowledge-enrichment for Bottom Clauses Improving==
<pdf width="1500px">https://ceur-ws.org/Vol-1187/paper-05.pdf</pdf>
<pre>
    Background knowledge-enrichment for bottom
                clauses improving.

            Orlando Muñoz Texzocotetla and René MacKinney-Romero

                         Departamento de Ingenierı́a Eléctrica
                         Universidad Autónoma Metropolitana
                              México D.F. 09340, México


        Abstract. In this paper we present a method to enrich the hypothesis
        language which is used to construct the bottom clause. This approach,
        which is embedded in the Aleph system, is based on numerical fixed
        subintervals and categorical subsets. Each subinterval/subset contains
        values, for each attribute which is predefined, that are related with the
        example to saturate. The enriched language allows to reduce the number
        of rules of the final theories and, in some cases, to improve the accuracy.

        Keywords: Artificial Intelligence, ILP, discretization, grouping, numerical
        attributes, categorical attributes


1     Introduction
Most of the ILP algorithms induce one clause at a time in order to find a theory
T . Each clause in T is usually constructed with a single numerical (or categorical)
value for each attribute1 . This may result in inaccurate theories with a lot of
rules. To overcome these drawbacks, some ILP systems use approaches which
are based in the following strategies:
    Discretization. Some ILP systems discretize numerical attributes in a global
way, and the final intervals are used during the learning process. TILDE [3] and
ECL-GSD [5] follow this strategy. In [9], a global binary discretization method
is presented, but in addition to this each categorical attribute is grouped in two
subsets. Propositionalization. These strategies consist in transforming a re-
lational problem into a propositional problem. The main objective it is to solve
a problem that was originally a relational problem, with faster and more effi-
cient propositional algorithms than the relational systems. LINUS and DINUS
[7] systems follow that procedure. Genetic algorithms. To deal with numerical
attributes, some systems implement genetic refinement operators which search
globally the best intervals. In [5], several refinement operators are presented, and
a genetic algorithm is used to test the intervals. Numerical Reasoning. In
[11], new relations (equalities, inequalities, regression models, etc.) are added a
priori into the background knowledge. These relations are lazily evaluated dur-
ing the bottom clause construction. Numerical reasoning is improved in [1], since
1
    We refer to the arguments of predicates as attributes.
24

it proposes to improve noise handling by mean of statistical-based techniques.
Regarding the categorical data, we think that using grouping algorithms for
categorical attributes can also improve the final theories.
    In this paper we present a method to enrich the hypothesis language which is
used to construct the bottom clause. This approach deals with both categorical
and numerical attributes, in a local or global way. The enriched language allows
to reduce the number of rules of the final theories and, in some cases, to improve
the accuracy. This paper is organized as follows: section 2 describes the proposed
method; section 3 shows the experimental results on several data sets; finally,
section 4 presents our conclusions and future work.


2     The method
To enrich the hypothesis language our method tests different numerical subinter-
vals (or categorical subsets) according to an evaluation function, then the best
qualified are added into the background knowledge. Due to the great amount
of subintervals/subsets that can be created we decided to use an evolutive ap-
proach to generate and evaluate them. This method was embedded in the basic
algorithm of Aleph as follows:
    0. State parameters. In addition to the mode declarations, types and deter-
minations, users can also declare the attributes to be discretized/grouped in any
of the two following ways:
      i. lit(. . . , Attr, . . .), goal(. . . , Class)
         Where Attr is the attribute to discretize/group respect to Class which
         can have more than two values. The predicate lit is into the background
         knowledge (it is possible to declare two or more attributes in lit), and goal
         is the target predicate.
     ii. lit(. . . , Attr, . . .)
         In this case the corresponding classes are positive and negative.
    1. Select an example e. In this step, a subset of values of Attr are related to
the example selected. For instance, let e = goal(a) be the example selected and
lit(a, 12), lit(a, 15), lit(a, 20), lit(b, 30), lit(b, 35) be five facts in the background
knowledge then the values 12, 15, 20 are related to e. If Attr is a numerical
attribute, then these values form a FixedSubinterval. If Attr is a categorical
attribute, then we call FixedSubset to these subset of values. These are fixed
because it will invariably be in all subintervals/subsets that will be tested.
     1.5 Discretization/Grouping. Before saturation each attribute declared by
the user is processed. Depending on the attribute type there are two possible
cases.
     i. Let Attr be a numerical attribute and F ixedSubinterval = [min, max] be the
     interval that contains all values in Attr that are related to e. Then the proposed
     genetic algorithm returns a set of numerical intervals In such that: |In | is defined
     by the user and In = {x | x = [min0 , max0 ] ∧ min0 ≤ min ∧ max ≤ max0 }.
         Furthermore each element in In represents a chromosome. To evolve the
     chromosomes several mutation operators are defined. These can enlarge or
     shrink an interval, either on the left side, on the right side, or on both sides. The
                                                                                                             25

   fitness function for a chromosome x is given by the information gain, see Eq. (1).
   In this case the GA maximizes the information gain to find better subintervals.
                                                            count(x)
                                IG(x) = ent(Int) −                    ent(x)                          (1)
                                                           count(Int)

       The corresponding entropy for some interval Int is:
                                               |S|
                                               X
                               ent(Int) = −          p(Int, j)log2 (p(Int, j))
                                               j=1

       where: |S| = number of classes, Int is the numerical interval of Attr, count(Int) =
   numeric values into interval Int, p(Int, j), is the proportion of values in Int that belong to
   the jth class.
       ii. Let Attr be a categorical attribute and FixedSubset be the set of all values
   related to e. The proposed GA searches the best |C| subsets of categorical values
   with the following constraint: C = {y | FixedSubset ⊆ y}.
       A chromosome is represented by a subset y. The mutation operators defined
   add a new categorical value in y, delete an element from the chromosome, or
   swap an element between the chromosome and the rest of categorical values, but
   maintaining the FixedSubset into each new chromosome. The crossover operator
   swaps an element between two chromosomes.
       In this case, the fitness function is based on the distance between two values
   of a categorical attribute. The fitness for a chromosome y is the total sum of
   the distances for each value pair in y. In addition to this, if the sum of distances
                                                                   1
   for two or more chromosomes is the same, then the value |y|        is added, because
   we want to minimize and to favor the subsets with more values. This fitness
   function, called DILCA [6], is showed in Eq. (2).
                                      X sX                                                 1
                     Fitness(y) =                      (P (ci | st ) − P (cj | st ))2 +               (2)
                                    ci cj ∈y   st ∈S
                                                                                          |y|

       where: ci , cj ∈ y such as i 6= j. S is the set of classes defined, P (ci | st ) is the conditional
   probability for ci given st , and P (cj | st ) is the conditional probability for cj given st .

    In both cases the user can give the number of chromosomes to be added.
    2. Build a bottom clause that entails the example e. The enriched language
is used to build a bottom clause. Consider the goal relation no payment(S)
which is true for those students who do not need to repay its loan. If literals in
background knowledge are unemployed(S), enrolled(S, School, U nits), where U nits
is numerical and School is categorical, then a bottom clause would be like this:

no_payment(A) :- unemployed(A),enrolled(A,B,C),
interval(C,[11.0,13.4]),interval(C,[11.7,12.5]),interval(C,[11.3,13.3]),
interval(C,[11.3,14.0]),interval(C,[11.0,13.5]),interval(C,[10.0,12.3]),
member(B,[occ,smc,uci,ucla,ucsd]),member(B,[smc,ucb,ucla,ucsd]).

    3. Search. Aleph uses several search strategies (best first, depth first search,
etc.), refinement operators, and evaluation functions (as mentioned earlier).
    4. Remove covered (redundant) examples and add the best clause found to
the current theory. Go to step 1.
    In the next section we present the experiments performed to compare the
accuracy of our method.
26

3    Experiments


In this section we compare our method (we call it Extended Aleph) with the
standard Aleph, and the lazy evaluation in Aleph [11]. Our method works as
the latter, namely with the bottom clause construction as our method. A 10-
fold, cross-validation was performed to compare the accuracy and the simplicity
(number of rules) of the final theories.
    From the UCI machine learning repository [2], we used Australian Credit
Approval, Pittsburgh Bridges, Japanese Credit Screening, German Credit, Iris,
Moral Reasoner, Ecoli, and Student Loan datasets. We also performed tests over
the well-known ILP datasets: Carcinogenesis [12], KRK illegal chess position
(KRKi) [8], and mutagenesis [10] (on “regression friendly” data).
    All these methods were executed in Yap Prolog version 6.0, with the global
parameters: minpos = 2, and noise = 5; the remaining of the parameters are
fixed by default in Aleph. All experiments were performed on a modern multicore
PC machine.
    Extended Aleph vs Aleph. Table 1, shows that our implementation im-
proves the accuracy in most databases, with the exception of two datasets: Pitts-
burgh Bridges and Moral Reasoner. We can also see (figure 1) that the increase
of the accuracy for the datasets Japanese Credit Screening and Iris is signifi-
cant, 15% and 7.8% respectively. Regarding the simplicity of the final theories,
extended Aleph system does not improve the simplicity with two datasets: Ger-
man Credit and KRKi, but with the other datasets (excepting Moral Reasoner)
our approach decreases the number of rules in the final theories. In particular,
with Iris and Student Loan the reduction is almost 50% (figure 2).
    Extended aleph vs Lazy Evaluation in Aleph. As in the previous
case, Extended Aleph improves the accuracy except in two datasets: Moral Rea-
soner and Student Loan (table 1). Furthermore, Lazy Evaluation overcomes our
method only in one single dataset: Student Loan, (figure 2). Although both have
the same accuracy.


                    Dataset                    Accuracy (%)       Number of rules
                                          Aleph Ext. Aleph Lazy Aleph Ext. Aleph Lazy
         Australian Credit Approval (Aus) 82        83     80.5 27.40   24.70    29.9
             Pittsburgh Bridges (Pitts)     89      89     88.7 13.2     13.1    13.4
         Japanese Credit Screening (Japan) 53       69      65 13.2      8.6     11.8
             German Credit (German)        70.9    71.2    70.8 51.4     52.8    53.4
                        Iris               85.9    93.7    89.1 26.9     13.7    22.6
              Moral Reasoner (Moral)        96      96      96    1        1      1
                       Ecoli                91      92      91 37.3      35.4    43.5
              Student Loan (Student)       93.7    97.6    97.6 46.6     25.7    21.5
             Carcinogenesis (Carcino)      58.6    61.3    56.3 21.9     18.6    28.9
                       KRKi                49.4    50.4    49.4 66.7     68.1    66.7
                Mutagenesis (Muta)         77.1    80.8    78.3 11.2       9     10.4
        Table 1. Predictive accuracies and number of rules of the analyzed datasets.
                                                                                 27

   In general, Extended Aleph improves or maintains the accuracy (figure 1),
and decreases the number of rules for most of datasets. With regard to the
running time, Extended Aleph significantly exceeds this measure with Carcino-
genesis: Aleph 36.1s, Lazy 222.33s, Ext. Aleph 698.9s; and Ecoli: Aleph 11.5s;
Lazy 63.2s; Ext. Aleph 707.5s.


            Fig. 1. Extended Aleph maintains or improves the accuracy.


        Fig. 2. Percentage reduction of the number of rules in final theories.


4   Conclusions and Future Work

With the obtained results, we can draw some conclusions: this method does not
guarantee to improve the accuracy and simplicity of the theories in all cases.
The improvement depends on the selected attributes, namely if an attribute is
relevant in the theory construction then its processing will help to improve the
final theory. ILP problems whose final theories do not need constants to be in-
duce on them can not benefit from this method. In this case, variables are useful
to better explain that concept. The size of the search space can be affected with
the proposed method. On one hand a discretized/grouped attribute which is
not relevant can generate a large amount of unnecessary candidate rules. On
28

the other hand a discretized/grouped attribute which is relevant can decrease
significantly the number of candidate rules. Users do not know in advance which
attributes will be most useful, therefore the intuition is an important element to
select each attribute. Our method allows to experiment easily with different at-
tributes. Finally, unlike other genetic approaches like SMART+ [4], our method
can deal with both categorical and numerical attributes.
    Some considerations for further work are as follows. Since not all cases were
successful it is necessary to test other fitness functions and more datasets. These
tests will help us to identify the factors which affect both accuracy and simplicity
of the final theories, as well as the search space size. Another path of research
is to look into multivariable predicates, if there are correlations between two or
more attributes. Thus, we want to investigate if these correlations are relevant
to improve the performance in ILP (accuracy and simplicity) and what kind of
problems can be treated with these predicates. Finally, we want to implement
in Aleph system these ideas and compare performance with other ILP systems
that handle data types.

References
 1. A. Alves, R. Camacho, and E. Oliveira, Improving numerical reasoning capa-
    bilities of inductive logic programming systems, in IBERAMIA, 2004, pp. 195–204.
 2. K. Bache and M. Lichman, UCI machine learning repository, 2013.
 3. H. Blockeel and L. D. Raedt, Lookahead and discretization in ilp, in In Pro-
    ceedings of the 7th International Workshop on Inductive Logic Programming,
    Springer-Verlag, 1997, pp. 77–85.
 4. M. Botta and A. Giordana, SMART+: A multi-strategy learning tool, in IJCAI,
    1993, pp. 937–945.
 5. F. Divina and E. Marchiori, Handling continuous attributes in an evolutionary
    inductive learner., IEEE Trans. Evolutionary Computation, 9 (2005), pp. 31–43.
 6. D. Ienco, R. G. Pensa, and R. Meo, Context-based distance learning for cate-
    gorical data clustering, in IDA, 2009, pp. 83–94.
 7. N. Lavrac, S. Dzeroski, and M. Grobelnik, Learning nonrecursive definitions
    of relations with linus, in Proceedings of the European Working Session on Machine
    Learning, EWSL ’91, London, UK, UK, 1991, Springer-Verlag, pp. 265–281.
 8. S. Muggleton, M. Bain, J. Hayes-michie, and D. Michie, An experimen-
    tal comparison of human and machine learning formalisms, in In Proceedings of
    the Sixth International Workshop on Machine Learning, Morgan Kaufmann, 1989,
    pp. 113–118.
 9. O. Muñoz and R. MacKinney-Romero, Multivalue in ILP, in 20th International
    Conference on Inductive Logic Programming, 2011.
10. Srinivasan, Muggleton, A. Srinivasan, and S. H. Muggleton, Mutagenesis:
    Ilp experiments in a non-determinate biological domain, in Proceedings of the 4th
    International Workshop on Inductive Logic Programming, volume 237 of GMD-
    Studien, 1994, pp. 217–232.
11. A. Srinivasan and R. Camacho, Experiments in numerical reasoning with in-
    ductive logic programming, Journal of Logic Programming.
12. A. Srinivasan, R. D. King, S. Muggleton, and M. J. E. Sternberg, Car-
    cinogenesis predictions using ilp, in ILP, 1997, pp. 273–287.

</pre>