=Paper=
{{Paper
|id=None
|storemode=property
|title=Dealing with Incompleteness and Vagueness in Inductive Logic Programming
|pdfUrl=https://ceur-ws.org/Vol-1068/paper-l12.pdf
|volume=Vol-1068
|dblpUrl=https://dblp.org/rec/conf/cilc/LisiS13
}}
==Dealing with Incompleteness and Vagueness in Inductive Logic Programming==
Dealing with Incompleteness and Vagueness in
Inductive Logic Programming
Francesca A. Lisi1 and Umberto Straccia2
1
Dipartimento di Informatica, Università degli Studi di Bari “Aldo Moro”, Italy
2
ISTI - CNR, Pisa, Italy
Abstract. Incompleteness and vagueness are inherent properties of knowl-
edge in several real world domains and are particularly pervading in those
domains where entities could be better described in natural language. In
order to deal with incomplete and vague structured knowledge, several
fuzzy extensions of Description Logics (DLs) have been proposed in the
literature. In this paper, we address the issues raised by incomplete and
vague knowledge in Inductive Logic Programming (ILP). We present a
novel ILP method for inducing fuzzy DL inclusion axioms from crisp DL
knowledge bases and discuss the results obtained in comparison with re-
lated works.
1 Introduction
Incompleteness and vagueness are inherent properties of knowledge in several real
world domains and are particularly pervading in those domains where entities
could be better described in natural language. The issues raised by incomplete
and vague knowledge have been traditionally addressed in the field of Knowledge
Representation (KR).
Incomplete knowledge. The Open World Assumption (OWA) is used in KR to
codify the informal notion that in general no single agent or observer has complete
knowledge. The OWA limits the kinds of inference and deductions an agent can
make to those that follow from statements that are known to the agent to be
true. In contrast, the Closed World Assumption (CWA) allows an agent to infer,
from its lack of knowledge of a statement being true, anything that follows from
that statement being false. Heuristically, the OWA applies when we represent
knowledge within a system as we discover it, and where we cannot guarantee that
we have discovered or will discover complete information. In the OWA, statements
about knowledge that are not included in or inferred from the knowledge explicitly
recorded in the system may be considered unknown, rather than wrong or false.
Description Logics (DLs) are KR formalisms compliant with the OWA, thus
turning out to be particularly suitable for representing incomplete knowledge [1].
Vague knowledge. It is well known that “classical” DLs are not appropriate to
deal with vague knowledge [20]. We recall for the inexpert reader that there has
been a long-lasting misunderstanding in the literature of artificial intelligence and
180 Francesca Alessandra Lisi and Umberto Straccia
uncertainty modelling, regarding the role of probability/possibility theory and
vague/fuzzy theory. A clarifying paper is [5]. Specifically, under uncertainty theory
fall all those approaches in which statements are true or false to some probability
or possibility (for example, “it will rain tomorrow”). That is, a statement is true
or false in any world/interpretation, but we are “uncertain” about which world to
consider as the right one, and thus we speak about, e.g., a probability distribution
or a possibility distribution over the worlds. On the other hand, under fuzzy theory
fall all those approaches in which statements (for example, “the car is long”) are
true to some degree, which is taken from a truth space (usually [0, 1]). That is,
an interpretation maps a statement to a truth degree, since we are unable to
establish whether a statement is entirely true or false due to the involvement
of vague concepts, such as “long car” (the degree to which the sentence is true
depends on the length of the car). Here, we shall focus on fuzzy logic only.
Learning in fuzzy DLs. Although a relatively important amount of work has
been carried out in the last years concerning the use of fuzzy DLs as ontology
languages [20] and the use of DLs as representation formalisms in Inductive Logic
Programming (ILP) [13], the problem of automatically managing the evolution of
fuzzy ontologies by applying ILP algorithms still remains relatively unaddressed.
Konstantopoulos and Charalambidis [9] propose an ad-hoc translation of fuzzy
Lukasiewicz ALC DL constructs into LP in order to apply a conventional ILP
method for rule learning. However, the method is not sound as it has been recently
shown that the mapping from fuzzy DLs to LP is incomplete [17] and entailment
in Lukasiewicz ALC is undecidable [4]. Iglesias and Lehmann [7] propose an
extension of DL-Learner [10] with some of the most up-to-date fuzzy ontology
tools, e.g. the fuzzyDL reasoner [2]. Notably, the resulting system can learn fuzzy
OWL DL 3 equivalence axioms from FuzzyOWL 2 ontologies. 4 However, it has
been tested only on a toy problem with crisp training examples and does not build
automatically fuzzy concrete domains. Lisi and Straccia [14] present SoftFoil, a
logic-based method for learning fuzzy EL inclusion axioms from fuzzy DL-Lite
ontologies (also, SoftFoil has not been implemented and tested).
Contribution of this paper. In this paper, we describe a novel method, named
Foil-DL, for learning fuzzy EL(D) inclusion axioms from any crisp DL knowl-
edge base. 5 Similarly to SoftFoil, it adapts the popular rule induction method
Foil [18]. However, Foil-DL differs from SoftFoil mainly by the fact that the
latter learns fuzzy EL inclusion axioms from fuzzy DL-Lite ontologies, while the
former learns fuzzy EL(D) inclusion axioms from any crisp DL ontology.
Structure of the paper. The paper is structured as follows. For the sake of self-
containment, Section 2 introduces some basic definitions we rely on. Section 3
describes the learning problem and the solution strategy of Foil-DL. Section 4
illustrates some results obtained in a comparative study between Foil-DL and
3
http://www.w3.org/TR/2009/REC-owl2-overview-20091027/
4
http://www.straccia.info/software/FuzzyOWL
5
DL stands for any DL.
Dealing with Incompleteness and Vagueness in Inductive Logic Programming 181
1 1 1 1
0 0 0 0
a b c d x a b c x a b x a b x
(a) (b) (c) (d)
Fig. 1. (a) Trapezoidal function trz (a, b, c, d), (b) triangular function tri(a, b, c), (c)
left-shoulder function ls(a, b), and (d) right-shoulder function rs(a, b).
DL-Learner on the popular ILP problem of Michalski’s trains. Section 5 concludes
the paper by discussing limits of the current work, related work and possible
directions of future work.
2 Preliminaries
Mathematical Fuzzy Logic. Fuzzy Logic is the logic of fuzzy sets. A fuzzy set A
over a countable crisp set X is a function A : X → [0, 1]. Let A and B be two fuzzy
sets. The standard fuzzy set operations conform to (A∩B)(x) = min(A(x), B(x)),
(A ∪ B)(x) = max(A(x), B(x)) and Ā(x) = 1 − A(x), while the inclusion degree
between A and B is defined typically as
P
x∈X (A ∩ B)(x)
deg(A, B) = P . (1)
x∈X A(x)
The trapezoidal (Fig. 1 (a)), the triangular (Fig. 1 (b)), the left-shoulder func-
tion, Fig. 1 (c)), and the right-shoulder function, Fig. 1 (d)) are frequently used
to specify membership functions of fuzzy sets. Although fuzzy sets have a greater
expressive power than classical crisp sets, their usefulness depend critically on
the capability to construct appropriate membership functions for various given
concepts in different contexts. The problem of constructing meaningful member-
ship functions is a difficult one (see, e.g., [8, Chapter 10]). However, one easy
and typically satisfactory method to define the membership functions is to uni-
formly partition the range of values into 5 or 7 fuzzy sets using either trapezoidal
functions, or triangular functions. The latter is the more used one, as it has less
parameters and is also the approach we adopt. For instance, the figure below il-
lustrates salary values (bounded by a minimum and maximum value), partitioned
uniformly into 5 fuzzy sets.
In Mathematical Fuzzy Logic [6], the convention prescribing that a statement is
either true or false is changed and is a matter of degree measured on an ordered
scale that is no longer {0, 1}, but e.g. [0, 1]. This degree is called degree of truth
182 Francesca Alessandra Lisi and Umberto Straccia
Table 1. Syntax and semantics of constructs for the ALC DL.
bottom (resp. top) concept ⊥ (resp. >) ∅ (resp. ∆I )
atomic concept A AI ⊆ ∆I
role R RI ⊆ ∆I × ∆I
individual a aI ∈ ∆I
concept negation ¬C ∆I \ C I
concept intersection C1 u C2 C1I ∩ C2I
concept union C1 t C2 C1I ∪ C2I
value restriction ∀R.C {x ∈ ∆I | ∀y (x, y) ∈ RI → y ∈ C I }
existential restriction ∃R.C {x ∈ ∆I | ∃y (x, y) ∈ RI ∧ y ∈ C I }
general concept inclusion C1 v C2 C1I ⊆ C2I
concept assertion a:C aI ∈ C I
role assertion (a, b) : R (aI , bI ) ∈ RI
of the logical statement φ in the interpretation I. For us, fuzzy statements have
the form hφ, αi, where α ∈ (0, 1] and φ is a statement, encoding that the degree
of truth of φ is greater or equal α.
A fuzzy interpretation I maps each atomic statement pi into [0, 1] and is then
extended inductively to all statements: I(φ ∧ ψ) = I(φ) ⊗ I(ψ), I(φ ∨ ψ) =
I(φ) ⊕ I(ψ), I(φ → ψ) = I(φ) ⇒ I(ψ), I(¬φ) = I(φ), I(∃x.φ(x)) =
supy∈∆I I(φ(y)), I(∀x.φ(x)) = inf y∈∆I I(φ(y)), where ∆I is the domain of I,
and ⊗, ⊕, ⇒, and are so-called t-norms, t-conorms, implication functions,
and negation functions, respectively, which extend the Boolean conjunction, dis-
junction, implication, and negation, respectively, to the fuzzy case. One usually
distinguishes three different logics, namely Lukasiewicz, Gödel, and Product log-
ics [6]. Any other continuous t-norm can be obtained from them. The combination
functions in Gödel logic are defined as follows:
( (
1 if a ≤ b 1 if a = 0
a⊗b = min(a, b), a⊕b = max(a, b), a ⇒ b = , a= . (2)
b otherwise 0 otherwise
The notions of satisfiability and logical consequence are defined in the standard
way, where a fuzzy interpretation I satisfies a fuzzy statement hφ, αi or I is a
model of hφ, αi, denoted as I |= hφ, αi, iff I(φ) ≥ α.
Fuzzy Description Logics. Description Logics (DLs) are a family of decidable First
Order Logic (FOL) fragments that allow for the specification of structured knowl-
edge in terms of classes (concepts), instances (individuals), and binary relations
between instances (roles) [1]. Complex concepts (denoted with C) can be defined
from atomic concepts (A) and roles (R) by means of the constructors available
for the DL in hand. The set of constructors for the ALC DL is reported in Table
1. A DL Knowledge Base (KB) K = hT , Ai is a pair where T is the so-called
Terminological Box (TBox) and A is the so-called Assertional Box (ABox). The
TBox is a finite set of General Concept Inclusion (GCI) axioms which represent
Dealing with Incompleteness and Vagueness in Inductive Logic Programming 183
is-a relations between concepts, whereas the ABox is a finite set of assertions (or
facts) that represent instance-of relations between individuals (resp. couples of
individuals) and concepts (resp. roles). Thus, when a DL-based ontology language
is adopted, an ontology is nothing else than a TBox, and a populated ontology
corresponds to a whole KB (i.e., encompassing also an ABox).
The semantics of DLs can be defined directly with set-theoretic formalizations
(as shown in Table 1 for the case of ALC) or through a mapping to FOL (as
shown in [3]). An interpretation I = (∆I , ·I ) for a DL KB consists of a domain
∆I and a mapping function ·I . For instance, I maps a concept C into a set of
individuals C I ⊆ ∆I , i.e. I maps C into a function C I : ∆I → {0, 1} (either
an individual belongs to the extension of C or does not belong to it). Under the
Unique Names Assumption (UNA) [19], individuals are mapped to elements of
∆I such that aI 6= bI if a 6= b. However UNA does not hold by default in DLs.
An interpretation I is a model of a KB K iff it satisfies all axioms and assertions
in T and A . In DLs a KB represents many different interpretations, i.e. all its
models. This is coherent with the OWA that holds in FOL semantics. A DL KB
is satisfiable if it has at least one model.
The main reasoning task for a DL KB K is the consistency check which
tries to prove the satisfiability of K. Another well known reasoning service in
DLs is instance check, i.e., the check of whether an ABox assertion is a logical
implication of a DL KB. A more sophisticated version of instance check, called
instance retrieval, retrieves, for a DL KB K, all (ABox) individuals that are
instances of the given (possibly complex) concept expression C, i.e., all those
individuals a such that K entails that a is an instance of C.
Concerning fuzzy DLs, several fuzzy extensions of DLs have been proposed
(see the survey in [15]). We recap here the fuzzy variant of the DL ALC(D) [21].
A fuzzy concrete domain or fuzzy datatype theory D = h∆D , · D i consists of
a datatype domain ∆D and a mapping · D that assigns to each data value an
element of ∆D , and to every n-ary datatype predicate d an n-ary fuzzy relation
over ∆D . We will restrict to unary datatypes as usual in fuzzy DLs. Therefore, · D
maps indeed each datatype predicate into a function from ∆D to [0, 1]. Typical
examples of datatype predicates d are the well known membership functions
d := ls(a, b) | rs(a, b) | tri(a, b, c) | trz(a, b, c, d) | ≥v | ≤v | =v ,
where e.g. ls(a, b) is the left-shoulder membership function and ≥v corresponds
to the crisp set of data values that are greater or equal than the value v.
In ALC(D), each role is either an object property (denoted with R) or a
datatype property (denoted with T ). Complex concepts are built according to the
following syntactic rules:
C → > | ⊥ | A | C1 u C2 | C1 t C2 | ¬C | C1 → C2 | ∃R.C | ∀R.C | ∃T.d | ∀T.d . (3)
Axioms in a fuzzy ALC(D) KB K = hT , Ai are graded, e.g. a GCI is of the form
hC1 v C2 , αi (i.e. C1 is a sub-concept of C2 to degree at least α). We may omit
the truth degree α of an axiom; in this case α = 1 is assumed.
Concerning the semantics, let us fix a fuzzy logic. In fuzzy DLs, I maps
C into a function C I : ∆I → [0, 1] and, thus, an individual belongs to the
extension of C to some degree in [0, 1], i.e. C I is a fuzzy set. Specifically, a fuzzy
184 Francesca Alessandra Lisi and Umberto Straccia
interpretation is a pair I = (∆I , ·I ) consisting of a nonempty (crisp) set ∆I
(the domain) and of a fuzzy interpretation function ·I that assigns: (i) to each
atomic concept A a function AI : ∆I → [0, 1]; (ii) to each object property R a
function RI : ∆I × ∆I → [0, 1]; (iii) to each data type property T a function
T I : ∆I × ∆D → [0, 1]; (iv) to each individual a an element aI ∈ ∆I ; and (v) to
each concrete value v an element v I ∈ ∆D .
Now, ·I is extended to concepts as specified below (where x ∈ ∆I ):
⊥I (x) = 0, >I (x) = 1,
(C u D)I (x) = C I (x) ⊗ DI (x), (C t D)I (x) = C I (x) ⊕ DI (x),
(¬C)I (x) = C I (x), (C → D)I (x) = C I (x) ⇒ DI (x),
(∀R.C)I (x) = inf y∈∆I {RI (x, y) ⇒ C I (y)}, (∃R.C)I (x) = supy∈∆I {RI (x, y) ⊗ C I (y)},
(∀T.d)I (x) = inf y∈∆D {T I (x, y) ⇒ dD (y)}, (∃T.d)I (x) = supy∈∆D {T I (x, y) ⊗ dD (y)} .
Hence, for every concept C we get a function C I : ∆I → [0, 1].
The satisfiability of axioms is then defined by the following conditions: (i) I
satisfies an axiom ha:C, αi if C I (aI ) ≥ α; (ii) I satisfies an axiom h(a, b):R, αi
I
if RI (aI , bI ) ≥ α; (iii) I satisfies an axiom hC v D, αi if (C v D) ≥ α where
I
(C v D) = inf x∈∆I {C I (x) ⇒ DI (x)}. I is a model of K iff I satisfies each
axiom in K. We say that K entails an axiom hτ, αi, denoted K |= hτ, αi, if any
model of K satisfies hτ, αi. The best entailment degree of τ w.r.t. K, denoted
bed(K, τ ), is defined as
bed(K, τ ) = sup{α | K |= hτ, αi} . (4)
3 Learning fuzzy EL(D) axioms with Foil-DL
3.1 The problem statement
The problem considered in this paper concerns the automated induction of fuzzy
EL(D) 6 GCI axioms providing a sufficient condition for a given atomic concept
H. It can be cast as a rule learning problem, provided that positive and negative
examples of H are available. This problem can be formalized as follows.
Given:
– a consistent crisp DL KB K = hT , Ai (the background theory);
– an atomic concept H (the target concept);
– a set E = E + ∪ E − of crisp concept assertions labelled as either positive or
negative examples for H (the training set);
– a set LH of fuzzy EL(D) GCI axioms (the language of hypotheses)
the goal is to find a set H ⊂ LH (a hypothesis) such that: ∀e ∈ E + , K ∪ H |= e
(completeness), and ∀e ∈ E − , K ∪ H 6|= e (consistency).
Here we assume that K ∩ E = ∅. Also, the language LH is given implicitly by
means of syntactic restrictions over a given alphabet. In particular, the alphabet
underlying LH is a subset of the alphabet for the language LK of the background
6
EL(D) is a fragment of ALC(D).
Dealing with Incompleteness and Vagueness in Inductive Logic Programming 185
theory. However, LH differs from LK as for the form of axioms. Please note that
we do not make any specific assumption about the DL the background theory
refers to. Two further restrictions hold naturally. One is that K 6|= E + since, in
such a case, H would not be necessary to explain E + . The other is that K∪H 6|= ⊥,
which means that K∪H is a consistent theory, i.e. has a model. An axiom φ ∈ LH
covers an example e ∈ E iff K ∪ {φ} |= e.
The training examples. Given the target concept H, the training set E consists
of concept assertions of the form a:H, where a is an individual occurring in K.
Note that both K and E is crisp. Also, E is split into E + and E − . Note that, under
OWA, E − consists of all those individuals which can be proved to be instance of
¬H. On the other hand, under CWA, E − is the collection of individuals, which
cannot be proved to be instance of H.
The language of hypotheses. Given the target concept H, the hypotheses to
be induced are fuzzy GCIs of the form
BvH , (5)
where the left-hand side is defined according to the following EL(D) syntax
B −→ > | A | ∃R.B | ∃T.d | B1 u B2 . (6)
The language LH generated by this syntax is potentially infinite due, e.g., to the
nesting of existential restrictions yielding to complex concept expressions such as
∃R1 .(∃R2 . . . .(∃Rn .(C)) . . .). LH is made finite by imposing further restrictions
on the generation process such as the maximal number of conjuncts and the depth
of existential nesting allowed in the left-hand side. Also, note that the learnable
GCIs do not have an explicit truth degree. However, as we shall see later on,
once we have learned a fuzzy GCI of the form (5), we attach to it a confidence
degree that is obtained by means of the cf function (see Eq. (8)). Finally, note
that the syntactic restrictions of Eq. (6) w.r.t. Eq. (3) allow for a straightforward
translation of the inducible axioms into rules of the kind “if x is a C1 and . . . and
x is a Cn then x is an H”, which corresponds to the usual pattern in fuzzy rule
induction (in our case, B v H is seen as a rule “if B then H”) .
3.2 The solution strategy
The solution proposed for the learning problem defined in Section 3.1 is inspired
by Foil. Foil is a popular ILP algorithm for learning sets of rules which performs
a greedy search in order to maximise a gain function [18].
In Foil-DL, the learning strategy of Foil (i.e., the so-called sequential cover-
ing approach) is kept. The function Learn-Sets-of-Axioms (reported in Figure
2) carries on inducing axioms until all positive examples are covered. When an
axiom is induced (step 3.), the positive examples covered by the axiom (step
5.) are removed from E (step 6.). In order to induce an axiom, the function
186 Francesca Alessandra Lisi and Umberto Straccia
function Learn-Sets-of-Axioms(K, H, E + , E − , LH ): H
begin
1. H := ∅;
2. while E + 6= ∅ do
3. φ := Learn-One-Axiom(K, H, E + , E − , LH );
4. H := H ∪ {φ};
5. Eφ+ := {e ∈ E + |K ∪ φ |= e};
6. E + := E + \ Eφ+ ;
7. endwhile
8. return H
end
Fig. 2. Foil-DL: Learning a set of GCI axioms.
Learn-One-Axiom (reported in Figure 3) starts with the most general axiom
(i.e. > v H) and specializes it by applying the refinement rules implemented in
the function Refine (step 7.). The iterated specialization of the axiom continues
until the axiom does not cover any negative example and its confidence degree is
greater than a fixed threshold (θ). The confidence degree of axioms being gener-
ated with Refine allows for evaluating the information gain obtained on each
refinement step by calling the function Gain (step 9.).
Due to the peculiarities of the language of hypotheses in Foil-DL, necessary
changes are made to Foil as concerns the functions Refine and Gain. Details
about these novel features are provided in the next two subsections.
The refinement operator. The function Refine implements a specialization
operator with the following refinement rules:
AddA : adds an atomic concept A
Add∃R.> : adds a complex concept ∃R.> by existential role restriction
Add∃T.d : adds a complex concept ∃T.d by existential role restriction
SubstA : replaces an atomic concept A with another atomic concept A0 s.t. A0 v A
At each refinement step (i.e. at each call of Refine), the rules are applied first
to the left-hand side of the axiom being specialized and then recursively to the
range of all the conjuncts defined with existential role restriction. For example,
let us consider that H is the target concept, A, A0 , B, R, R0 , T are concepts and
properties occurring in K, and A0 v A holds in K. Under these assumptions, the
axiom ∃R.B v H is specialized into the following axioms:
– A u ∃R.B v H, B u ∃R.B v H, A0 u ∃R.B v H;
– ∃R0 .> u ∃R.B v H, ∃T.d u ∃R.B v H;
– ∃R.(B u A) v H, ∃R.(B u A0 ) v H;
– ∃R.(B u ∃R.>) v H, ∃R.(B u ∃R0 .>) v H, ∃R.(B u ∃T.d) v H.
The application of the refinement rules is not blind. It takes the background
theory into account in order to avoid the generation of redundant or useless
Dealing with Incompleteness and Vagueness in Inductive Logic Programming 187
function Learn-One-Axiom(K, H, E + , E − , LH ): φ
begin
1. B := >;
2. φ := B v H;
3. Eφ− := E − ;
4. while cf (φ) < θ or Eφ− 6= ∅ do
5. Bbest := B;
6. maxgain := 0;
7. Φ := Refine(φ, LH )
8. foreach φ0 ∈ Φ do
9. gain := Gain(φ0 , φ);
10. if gain ≥ maxgain then
11. maxgain := gain;
12. Bbest := B 0 ;
13. endif
14. endforeach
15. φ := Bbest v H;
16. Eφ− := {e ∈ E − |K ∪ φ |= e};
17. endwhile
18. return φ
end
Fig. 3. Foil-DL: Learning one GCI axiom.
hypotheses. For example, if the concept B 0 is the range of R0 in K, the func-
tion Refine adds the conjunct ∃R0 .B 0 instead of ∃R0 .>. One such “informed”
refinement operator is able to perform “cautious” big steps in the search space.
Note that a specialization operator reduces the number of examples covered
by a GCI. More precisely, the aim of a refinement step is to reduce the number
of covered negative examples, while still keeping some covered positive examples.
Since learned GCIs cover only positive examples, K will remain consistent after
the addition of a learned GCI.
The heuristic. The function Gain implements an information-theoretic crite-
rion for selecting the best candidate at each refinement step according to the
following formula:
Gain(φ0 , φ) = p ∗ (log2 (cf (φ0 )) − log2 (cf (φ))) , (7)
where p is the number of positive examples covered by the axiom φ that are still
covered by φ0 . Thus, the gain is positive iff φ0 is more informative in the sense
of Shannon’s information theory, i.e. iff the confidence degree (cf ) increases. If
there are some refinements, which increase the confidence degree, the function
Gain tends to favour those that offer the best compromise between the confidence
degree and the number of examples covered. Here, cf for an axiom φ of the form
(5) is computed as a sort of fuzzy set inclusion degree (see Eq. (1)) between the
fuzzy set represented by concept B and the (crisp) set represented by concept
H. More formally:
188 Francesca Alessandra Lisi and Umberto Straccia
P
a∈Ind+ bed(K, a:B)
H (A)
cf (φ) = cf (B v H) = (8)
|IndH (A)|
where Ind+H (A) (resp., IndH (A) ) is the set of individuals occurring in A and
involved in Eφ+ (resp., Eφ+ ∪ Eφ− ) such that bed(K, a:B) > 0. We remind the reader
that bed(K, a:B) denotes the best entailment degree of the concept assertion a:B
w.r.t. K as defined in Eq. (4). Note that for individuals a ∈ Ind+ H (A), K |= a:H
holds and, thus, bed(K, a:B u H) = bed(K, a:B). Also, note that, even if K is crisp,
the possible occurrence of fuzzy concrete domains in expressions of the form ∃T.d
in B may imply that both bed(K, B v H) 6∈ {0, 1} and bed(K, a:B) 6∈ {0, 1}.
3.3 The implementation
A variant of Foil-DL has been implemented in the fuzzyDL-Learner 7 system
and provided with two GUIs: One is a stand-alone Java application, the other is
a tab widget plug-in for the ontology editor Protégé 8 (release 4.2).
Several implementation choices have been made. Notably, fuzzy GCIs in LH
are interpreted under Gödel semantics (see Eq. (2)). However, since K and E are
represented in crisp DLs, we have used a classical DL reasoner, together with
a specialised code, to compute the confidence degree of fuzzy GCIs. Therefore,
the system relies on the services of DL reasoners to solve all the deductive infer-
ence problems necessary to Foil-DL to work, namely instance retrieval, instance
check and subclasses retrieval. In particular, the sets Ind+H (A) and IndH (A) are
computed by posing instance retrieval problems to the DL reasoner. Conversely,
bed(K, a:∃T.d) can be computed from the derived T -fillers v of a, and applying
the fuzzy membership function of d to v. The examples covered by a GCI, and,
thus, the entailment tests in Learn-Sets-of-Axioms and Learn-One-Axiom,
have been determined in a similar way.
The implementation of Foil-DL features several optimizations w.r.t. the so-
lution strategy presented in Section 3.2. Notably, the search in the hypothesis
space can be optimized by enabling a backtracking mode. This option allows to
overcome one of the main limits of Foil, i.e. the sequential covering strategy.
Because it performs a greedy search, formulating a sequence of rules without
backtracking, Foil does not guarantee to find the smallest or best set of rules
that explain the training examples. Also, learning rules one by one could lead
to less and less interesting rules. To reduce the risk of a suboptimal choice at
any search step, the greedy search can be replaced in Foil-DL by a beam search
which maintains a list of k best candidates at each step instead of a single best
candidate. Additionally, to guarantee termination, we provide two parameters to
limit the search space: namely, the maximal number of conjuncts and the maxi-
mal depth of existential nesting allowed in a fuzzy GCI. In fact, the computation
may end without covering all positive examples.
7
http://straccia.info/software/FuzzyDL-Learner
8
http://protege.stanford.edu/
Dealing with Incompleteness and Vagueness in Inductive Logic Programming 189
Fig. 4. Michalski’s example of eastbound trains (left) and westbound trains (right)
(illustration taken from [16]).
4 Comparing Foil-DL and DL-Learner
In this section we report the results of a comparison between Foil-DL and DL-
Learner on a very popular learning task in ILP proposed 20 years ago by Ryszard
Michalski [16] and illustrated in Figure 4. Here, 10 trains are described, out of
which 5 are eastbound and 5 are westbound. The aim of the learning problem is
to find the discriminating features between these two classes.
For the purpose of this comparative study, we have considered two slightly
different versions, trains2 and trains3, of an ontology encoding the original Trains
data set. 9 The former has been adapted from the version distributed with DL-
Learner in order to be compatible with Foil-DL. Notably, the target classes
EastTrain and WestTrain have become part of the terminology and several
class assertion axioms have been added for representing positive and negative
examples. The metrics for trains2 are reported in Table 2. The ontology does
not encompass any data property. Therefore, no fuzzy concept can be generated
when learning GCIs from trains2 with Foil-DL. However, the ontology can be
slightly modified in order to test the fuzzy concept generation feature of Foil-
DL. Note that in trains2 cars can be classified according to the classes LongCar
and ShortCar. Instead of one such crisp classification, we may want a fuzzy
classification of cars. This is made possible by removing LongCar and ShortCar
(together with the related class assertion axioms) from trains2 and introducing
the data property hasLenght with domain Car and range double (together with
several data property assertions). The resulting ontology, called trains3, presents
the metrics reported in Table 2.
DL-Learner 10 features several algorithms. Among them, the closest to Foil-
DL is ELTL since it implements a refinement operator for concept learning in
EL [12]. Conversely, CELOE learns class expressions in the more expressive OWL
DL [11]. Both work only under OWA and deal only with crisp DLs.
4.1 Results on the ontology trains2
Trial with Foil-DL. The settings for this experiment allow for the generation
of hypotheses with up to 5 conjuncts and 2 levels of existential nestings. Under
9
http://archive.ics.uci.edu/ml/datasets/Trains
10
http://dl-learner.org/Projects/DLLearner
190 Francesca Alessandra Lisi and Umberto Straccia
Table 2. Ontology metrics for trains2.owl and trains3.owl according to Protégé.
# logical axioms # classes # object prop. # data prop. # individuals DL expressivity
trains2 345 32 5 0 50 ALCO
trains3 343 30 5 1 50 ALCO(D)
these restrictions, the GCI axioms learned by Foil-DL for the target concept
EastTrain are:
Confidence Axiom
1,000 3CarTrain and hasCar some (2LoadCar) subclass of EastTrain
1,000 3CarTrain and hasCar some (3WheelsCar) subclass of EastTrain
1,000 hasCar some (ElipseShapeCar) subclass of EastTrain
1,000 hasCar some (HexagonLoadCar) subclass of EastTrain
whereas the following GCI axioms are returned by Foil-DL for WestTrain:
Confidence Axiom
1,000 2CarTrain subclass of WestTrain
1,000 hasCar some (JaggedCar) subclass of WestTrain
The algorithm returns the same GCIs under both OWA and CWA. Note that an
important difference between learning in DLs and standard ILP is that the former
works under OWA whereas the latter under CWA. In order to complete the Trains
example we would have to introduce definitions and/or assertions to model the
closed world. However, the CWA holds naturally in this example, because we
have complete knowledge of the world, and thus the knowledge completion was
not necessary. This explains the behaviour of Foil-DL which correctly induces
the same hypotheses in spite of the opposite semantic assumptions.
Trial with ELTL. For the target concept EastTrain, the class expression learned
by ELTL is the following :
EXISTS hasCar.(ClosedCar AND ShortCar) (accuracy: 1.0)
whereas the following finding has been returned for the target concept WestTrain:
EXISTS hasCar.LongCar (accuracy: 0.8)
The latter is not fully satisfactory as for the example coverage.
Trial with CELOE. For the target concept EastTrain, CELOE learns several
class expressions of which the most accurate is:
hasCar some (ClosedCar and ShortCar) (accuracy: 1.0)
whereas, for the target concept WestTrain, the most accurate among the ones
found is the following:
hasCar only (LongCar or OpenCar) (accuracy: 1.0)
Note that the former coincide with the corresponding result obtained with ELTL
while the latter is a more accurate variant of the corresponding class expression
returned by ELTL. The increase in example coverage is due to the augmented
expressive power of the DL supported in CELOE.
Dealing with Incompleteness and Vagueness in Inductive Logic Programming 191
- hasLenght_low: hasLenght, triangular(23.0,32.0,41.0)
- hasLenght_fair: hasLenght, triangular(32.0,41.0,50.0)
- hasLenght_high: hasLenght, triangular(41.0,50.0,59.0)
- hasLenght_veryhigh: hasLenght, rightShoulder(50.0,59.0)
- hasLenght_verylow: hasLenght, leftShoulder(23.0,32.0)
Fig. 5. Fuzzy concepts derived by Foil-DL from the data property hasLenght.
4.2 Results on the ontology trains3
Trial with Foil-DL. The outcomes for the target concepts EastTrain and
WestTrain remain unchanged when Foil-DL is run on trains3 with the same
configuration of the first trial. Yet, fuzzy concepts are automatically generated
by Foil-DL from the data property hasLenght (see Figure 5). However, from
the viewpoint of discriminant power, these concepts are weaker than the other
crisp concepts occurring in the ontology. In order to make the fuzzy concepts
emerge during the generation of hypotheses, we have appropriately biased the
language of hypotheses. In particular, by enabling only the use of object and
data properties in LH , Foil-DL returns the following axiom for EastTrain:
Confidence Axiom
1,000 hasCar some (hasLenght_fair) and hasCar some (hasLenght_veryhigh)
and hasCar some (hasLenght_verylow) subclass of EastTrain
Conversely, for WestTrain, a lighter bias is sufficient to make fuzzy concepts
appear in the learned axioms. In particular, by disabling the class 2CarTrain in
LH , Foil-DL returns the following axioms:
Confidence Axiom
1,000 hasCar some (2WheelsCar and 3LoadCar) and hasCar some (3LoadCar and CircleLoadCar)
subclass of WestTrain
1,000 hasCar some (0LoadCar) subclass of WestTrain
1,000 hasCar some (JaggedCar) subclass of WestTrain
1,000 hasCar some (2LoadCar and hasLenght_high) subclass of WestTrain
1,000 hasCar some (ClosedCar and hasLenght_fair) subclass of WestTrain
Trial with ELTL. For the target class EastTrain, ELTL returns a class expression
which leaves some positive example uncovered (incomplete hypothesis):
(EXISTS hasCar.TriangleLoadCar AND EXISTS hasCar.ClosedCar) (accuracy: 0.9)
whereas, for the target concept WestTrain, it returns an overly general hypothesis
which covers also negative examples (inconsistent hypothesis):
TOP (accuracy: 0.5)
This bad performance of ELTL on trains3 is due to the low expressivity of EL
and to the fact that the classes LongCar and ShortCar, which appeared to be
discriminant in the first trial, do not occur in trains3 and thus can not be used
anymore for building hypotheses.
Trial with CELOE. The most accurate class expression found by CELOE for the
target concept EastTrain is:
((not 2CarTrain) and hasCar some ClosedCar) (accuracy: 1.0)
However, interestingly, CELOE learns also the following class expressions contain-
ing classes obtained by numerical restriction from the data property hasLenght:
192 Francesca Alessandra Lisi and Umberto Straccia
hasCar some (ClosedCar and hasLenght <= 48.5) (accuracy: 1.0)
hasCar some (ClosedCar and hasLenght <= 40.5) (accuracy: 1.0)
hasCar some (ClosedCar and hasLenght <= 31.5) (accuracy: 1.0)
These “interval classes” are just a step back from the fuzzification which, con-
versely, Foil-DL is able to do. It is acknowledged that using fuzzy sets in place of
“intervall classes” improves the readability of the induced knowledge about the
data. As for the target concept WestTrain, the most accurate class expression
among the ones found by CELOE is:
(2CarTrain or hasCar some JaggedCar) (accuracy: 1.0)
Once again, the augmented expressivity increases the effectiveness of DL-Learner.
5 Conclusions and future work
We have described a novel method, named Foil-DL, which addresses the prob-
lem of learning fuzzy EL(D) GCI axioms from crisp DL assertions. The method
extends Foil in a twofold direction: from crisp to fuzzy and from rules to GCIs.
Notably, vagueness is captured by the definition of confidence degree reported in
(8) and incompleteness is dealt with the OWA. Also, thanks to the variable-free
syntax of DLs, the learnable GCIs are highly understandable by humans and
translate easily into natural language sentences. In particular, Foil-DL present
the learned axioms according to the user-friendly presentation style of the Manch-
ester OWL syntax 11 (the same used in Protégé).
We would like to stress the fact that Foil-DL provides a different solution
from SoftFoil [14] as for the KR framework, the refinement operator and the
heuristic. Also, unlike SoftFoil, Foil-DL has been implemented and tested.
The experimental results are quite promising and encourage the application of
Foil-DL to more challenging real-world problems. Notably, in spite of the low
expressivity of EL, Foil-DL has turned out to be robust mainly due to the re-
finement operator and to the fuzzification facilities. Note that a fuzzy OWL 2
version of the trains’ problem (ontology fuzzytrains v1.5.owl) 12 has been de-
veloped by Iglesias for testing the fuzzy extension of CELOE proposed in [7].
However, Foil-DL can not handle fuzzy OWL 2 constructs such as fuzzy classes
obtained by existential restriction of fuzzy datatypes, fuzzy concept assertions,
and fuzzy role assertions. Therefore, it has been necessary to prepare an ad-hoc
ontology (trains3) for comparing Foil-DL and DL-Learner.
For the future, we intend to conduct a more extensive empirical evaluation of
Foil-DL, which could suggest directions of improvement of the method towards
more effective formulations of, e.g., the information gain function and the refine-
ment operator as well as of the search strategy and the halt conditions employed
in Learn-One-Axiom. Also, it can be interesting to analyse the impact of the
different fuzzy logics on the learning process. Eventually, we shall investigate
about learning fuzzy GCI axioms from FuzzyOWL 2 ontologies, by coupling the
learning algorithm to the fuzzyDL reasoner, instead of learning from crisp OWL
2 data by using a classical DL reasoner.
11
http://www.w3.org/TR/owl2-manchester-syntax/
12
Available at http://wiki.aksw.org/Projects/DLLearner/fuzzyTrains.
Dealing with Incompleteness and Vagueness in Inductive Logic Programming 193
References
1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (eds.):
The Description Logic Handbook: Theory, Implementation and Applications (2nd
ed.). Cambridge University Press (2007)
2. Bobillo, F., Straccia, U.: fuzzyDL: An expressive fuzzy description logic reasoner.
In: 2008 Int. Conf. on Fuzzy Systems. pp. 923–930. IEEE Computer Society (2008)
3. Borgida, A.: On the relative expressiveness of description logics and predicate logics.
Artificial Intelligence Journal 82, 353–367 (1996)
4. Cerami, M., Straccia, U.: On the (un)decidability of fuzzy description logics under
Lukasiewicz t-norm. Information Sciences 227, 1–21 (2013)
5. Dubois, D., Prade, H.: Possibility theory, probability theory and multiple-valued
logics: A clarification. Annals of Mathematics and Artificial Intelligence 32(1-4),
35–66 (2001)
6. Hájek, P.: Metamathematics of Fuzzy Logic. Kluwer (1998)
7. Iglesias, J., Lehmann, J.: Towards integrating fuzzy logic capabilities into an
ontology-based inductive logic programming framework. In: Proc. of the 11th Int.
Conf. on Intelligent Systems Design and Applications. IEEE Press (2011)
8. Klir, G.J., Yuan, B.: Fuzzy sets and fuzzy logic: theory and applications. Prentice-
Hall, Inc., Upper Saddle River, NJ, USA (1995)
9. Konstantopoulos, S., Charalambidis, A.: Formulating description logic learning as
an inductive logic programming task. In: Proc. of the 19th IEEE Int. Conf. on Fuzzy
Systems. pp. 1–7. IEEE Press (2010)
10. Lehmann, J.: DL-Learner: Learning Concepts in Description Logics. Journal of
Machine Learning Research 10, 2639–2642 (2009)
11. Lehmann, J., Auer, S., Bühmann, L., Tramp, S.: Class expression learning for on-
tology engineering. Journal of Web Semantics 9(1), 71–81 (2011)
12. Lehmann, J., Haase, C.: Ideal Downward Refinement in the EL Description Logic.
In: De Raedt, L. (ed.) ILP 2009. Revised Papers. Lecture Notes in Computer Sci-
ence, vol. 5989, pp. 73–87. Springer (2010)
13. Lisi, F.A.: A formal characterization of concept learning in description logics. In:
Kazakov, Y., Lembo, D., Wolter, F. (eds.) Proc. of the 2012 Int. Workshop on
Description Logics. CEUR Workshop Proceedings, vol. 846. CEUR-WS.org (2012)
14. Lisi, F.A., Straccia, U.: A logic-based computational method for the automated
induction of fuzzy ontology axioms. Fundamenta Informaticae 124, 1–17 (2013)
15. Lukasiewicz, T., Straccia, U.: Managing Uncertainty and Vagueness in Description
Logics for the Semantic Web. Journal of Web Semantics 6(4), 291–308 (2008)
16. Michalski, R.: Pattern recognition as a rule-guided inductive inference. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 2(4), 349–361 (1980)
17. Motik, B., Rosati, R.: A faithful integration of description logics with logic pro-
gramming. In: Veloso, M. (ed.) IJCAI 2007, Proc. of the 20th Int. Joint Conf. on
Artificial Intelligence. pp. 477–482 (2007)
18. Quinlan, J.R.: Learning logical definitions from relations. Machine Learning 5, 239–
266 (1990)
19. Reiter, R.: Equality and domain closure in first order databases. Journal of ACM
27, 235–249 (1980)
20. Straccia, U.: Reasoning within fuzzy description logics. Journal of Artificial Intelli-
gence Research 14, 137–166 (2001)
21. Straccia, U.: Description logics with fuzzy concrete domains. In: Bachus, F.,
Jaakkola, T. (eds.) 21st Conf. on Uncertainty in Artificial Intelligence. pp. 559–
567. AUAI Press (2005)