=Paper=
{{Paper
|id=Vol-201/paper-13
|storemode=property
|title=Reasoning by Analogy in Description Logics Through Instance-based Learning
|pdfUrl=https://ceur-ws.org/Vol-201/25.pdf
|volume=Vol-201
|dblpUrl=https://dblp.org/rec/conf/swap/dAmatoFE06
}}
==Reasoning by Analogy in Description Logics Through Instance-based Learning==
Reasoning by Analogy in Description Logics
through Instance-based Learning
Claudia d’Amato, Nicola Fanizzi, Floriana Esposito
Dipartimento di Informatica, Università degli Studi di Bari
Campus Universitario, Via Orabona 4, 70125 Bari, Italy
{claudia.damato | fanizzi | esposito}@di.uniba.it
Abstract— This work presents a method founded in instance- It has been observed that, adopting richer representations,
based learning for inductive (memory-based) reasoning on the structural properties have less and less impact in assessing
ABoxes. The method, which exploits a semantic dissimilarity semantic similarity. Hence, the vision of similarity based only
measure between concepts and instances, can be employed both
to answer class membership queries and to predict new assertions on a structural (graph-based) approach, such as in [11], [12]
that may be not logically entailed by the knowledge base. In a may fall short. We have proposed some dissimilarity measures
preliminary experimentation, we show that the method is sound for non trivial DL languages, based on the semantics conveyed
and it is actually able to induce new assertions that might be by the ABox assertions, which are suitable for being used
acquired in the knowledge base. in instance-based methods [13], [14]. These measures elicit
the underlying semantics by querying the knowledge base
I. I NTRODUCTION
for assessing the concept extensions, estimated through their
Most of the research on ontology reasoning has been retrieval [15], as also hinted in [16]. Besides, the overall
focusing on methods based on deductive reasoning. However, similarity is also (partially) influenced by the concepts which
important tasks that are likely to be provided by new genera- are related through role restrictions. Moreover, in many other
tion knowledge-based systems, such as construction, revision, typical tasks (e.g. conceptual clustering or definition), it is
population and evolution are likely to be supported also by necessary to assess the similarity between concepts (resp.
inductive methods. This has brought to an increasing interest individuals). By recurring to the notion of most specific
in machine learning and knowledge discovery methods applied concept of an individual with respect to an ABox [15], as
to ontological representations (see [1], [2] and, more recently, representatives of the individuals at the concept level, the
[3], [4], [5], [6]). measures for concepts can be extended to such cases.
We propose an algorithm which is based on a notion of This analogical reasoning procedure like this may be em-
concept similarity for performing a form of lazy learning on ployed to answering class membership queries through analog-
typical ontological representations. Namely, by combining an ical rather than deductive reasoning which may be more robust
instance-based (analogical) approach with a notion of semantic with respect to noise and is likely to suggest new knowledge
dissimilarity, this paper intends to demonstrate the applicabil- (which was not logically derivable). Specifically we developed
ity of inductive reasoning in this field which can be considered the method also for an application of semantic web service
another form of approximate reasoning (see discussion in discovery where services are annotated in DLs.
[7]). In particular, we have adapted the general instance- Another application might regard supporting various tasks
based learning approach like the k-Nearest Neighbor [8] to for the knowledge engineer, such as the acquisition of can-
the specific multi-relational setting for ontology languages. didate assertions for enriching ontologies with partially pop-
A couple of technical problems had to be solved for this ulated ABoxes: the outcomes given by the procedure can be
adaptation to ontology representations: 1) the Open World utilized as recommendations. Indeed, as we show in the ex-
Assumption (OWA) that is made in this context; 2) in this perimentation, the newly induced assertions are quite accurate
multi-class problem setting disjunction cannot be assumed by (commission errors, i.e. predicting a concept erroneously, were
default. rarely observed). In turn, the outcomes of the procedure may
The standard ontology languages are founded in Description also trigger other related tasks such as induction (revision) of
Logics (henceforth DLs) as they borrow the language con- (faulty) knowledge bases.
structors for expressing complex concept definitions. Instance- The paper is organized as follows. In the next section,
based methods depend on a similarity measure defined on the representation language is briefly presented. Two concept
the instance space. In this perspective, a variety of measures dissimilarity measures are recalled and exploited in a modified
for concept representations have been proposed (see [9] for version of the k-NN classification procedure. The results of a
a survey). As pointed out in [10], most of these measures preliminary experimental evaluation of the method using these
focus on the similarity of atomic concepts within hierarchies or two measures are shown and, finally, possible developments
simple ontologies, based on a few relations. Thus, it becomes of the method are examined.
necessary to investigate similarity in more complex languages.
II. ALC K NOWLEDGE BASES • prim(Di ) is the set of all (negated) primitive concepts
occurring at the top level of Di ;
We recall the basics of ALC [17], a logic which adopts
constructors supported by the standard ontology languages • valR (Di ) is the conjunction C1i u · · · u Cni in the value
(see the DL handbook [15] for a thorough reference). restriction of role R, if any (otherwise valR (Di ) = >);
In DLs, descriptions are inductively defined starting with a
set NC of primitive concept names and a set NR of primitive • exR (Di ) is the set of concepts in the existential restric-
roles. Complex descriptions are built using primitive concepts tions of the role R.
and roles and the constructors showed in the following. The
For any R, every sub-description in exR (Di ) and valR (Di ) is
semantics of the descriptions is defined by an interpretation
in normal form.
I = (∆I , ·I ), where ∆I is a non-empty set, the domain of
In the following, let L = ALC/≡ be the quotient set of
the interpretation, and ·I is the interpretation function that
ALC descriptions in normal form.
maps each A ∈ NC to a set AI ⊆ ∆I and each R ∈ NR to
R I ⊆ ∆I × ∆I . III. D ISSIMILARITY M EASURES IN D ESCRIPTION L OGICS
The top concept > is interpreted as the whole domain of We recall two definition of dissimilarity measures for ALC
objects ∆I , while the bottom concept ⊥ corresponds to ∅. descriptions expressed in normal form [13], [14]. These mea-
Complex descriptions can be built in ALC using the following sures are based on both the structure and the semantics of the
constructors. The language supports full negation: given any concept descriptions.
description C, denoted ¬C, it amounts to ∆I \ C I . Concept
conjunction, denoted C1 u C2 , yields the extension C1I ∩ C2I ; A. Overlap Dissimilarity Measure
dually, concept disjunction, denoted C1 t C2 , yields the union The first measure, is derived from a measure of the overlap
C1I ∪ C2I . Finally, the existential restriction, denoted ∃R.C, is between concepts, that can be defined as follows:
interpreted as {x ∈ ∆I | ∃y ∈ ∆I ((x, y) ∈ RI ∧ y ∈ C I )} Definition 1 (overlap function): Let I be the canonical in-
and the value restriction ∀R.C has as its extension the set terpretation of the ABox A. The overlap function f is defined1 :
{x ∈ ∆I | ∀y ∈ ∆I ((x, y) ∈ RI → y ∈ C I )}. f : LF× L 7→ R+ defined
n Fm for descriptions C, D ∈ L, with
The main inference is subsumption between concepts based C = i=1 Ci and D = j=1 Dj
on their semantics: given two descriptions C and D, C disjunctive level:
subsumes D, denoted by C w D, iff for every interpretation
I it holds that C I ⊇ DI . When C w D and D w C then ∞ C≡D
they are equivalent, denoted with C ≡ D. f (C, D) := ft (C, D) = 0 C uD ≡⊥
maxi,j fu (Ci , Dj ) otherwise
A knowledge base K = hT , Ai contains a TBox T and an
ABox A: conjunctive level:
• T is the set of definitions C ≡ D, meaning that, for every fu (Ci , Dj ) := fP (prim(Ci ), prim(Dj ))+
interpretation I, C I = DI , where C is the concept name +λ(f∀ (Ci , Dj ) + f∃ (Ci , Dj ))
and D is its description constructed as above;
• A contains concept and role assertions about the world- with λ ∈ [0, 1].
state, e.g. C(a) and R(a, b), meaning that, for every I, primitive concepts:
aI ∈ C I and (aI , bI ) ∈ RI . |R(P1 ) ∪ R(P2 )|
fP (P1 , P2 ) :=
A related inference used in the following is instance check- |(R(P1 ) ∪ R(P2 )) \ (R(P1 ) ∩ R(P2 ))|
ing, that is deciding whether an individual is an instance of a T I
where R(P ) = A∈P A and fP (P1 , P2 ) = ∞ when
concept [18], [15]. Conversely, it may be necessary to find the
R(P1 ) = R(P2 ).
concepts which an individual belongs to (realization problem),
value restrictions:
especially the most specific one (see Def. 5). X
Semantically equivalent (yet syntactically different) descrip- f∀ (Ci , Dj ) := ft (valR (Ci ), valR (Dj ))
tions can be given for the same concept. Nevertheless, equiv- R∈NR
alent concepts can be reduced to a normal form by means of existential restrictions:
rewriting rules that preserve their equivalence [19]:
N
A description D is in ALC normal form iff D ≡ ⊥ (then X X
f∃ (Ci , Dj ) := max ft (Cik , Djp )
D := ⊥) or if D ≡ > (then D := >) or if D = D1 t · · · t Dn p=1,...,M
R∈NR k=1
(∀i = 1, . . . , n, Di 6≡ ⊥) with
where Cik ∈ exR (Ci ) and Djp ∈ exR (Dj ) and we suppose
l l l w.l.o.g. that N = |exR (Ci )| ≥ |exR (Dj )| = M , otherwise the
Di = Au ∀R.valR (Di ) u ∃R.E indices N and M as well as Ci and Dj are to be exchanged
A∈prim(Di ) R∈NR E∈exR (Di ) in the formula above.
where: 1 The name A of the ABox is omitted for simplicity.
The function f represents a measure of the overlap between dissimilarity is inversely proportional to their overlap, since
two descriptions expressed in ALC normal form. It measures in this case f (C, D) > 1 and consequently 0 < d(C, D) < 1.
the similarity of the input concepts based on the similarity
B. A Dissimilarity Measure Based on Information Content
between their extensions (approximated with the retrieval) and
also on the similarity of the concepts reached by the role As discussed in [12], a measure of concept (dis)similarity
restrictions. Namely, It is defined recursively beginning from can be derived from the notion of Information Content (IC)
the top level of the descriptions (a disjunctive level) up to that, in turn, depends on the probability of an individual to
the bottom level represented by (conjunctions of) primitive belong to a certain concept. Now, differently from other works,
concepts. which assume that a probability distribution for the concepts
Overlap at the disjunctive level is treated as the maximum in an ontology is known, we derive it from the knowledge
overlap between the disjunctive forms of the input concepts. At base, from the distribution that can be estimated therein [16],
conjunctive levels, instead of simply considering the similarity [10].
like in Tversky’s measure [20] (as we do for prim’s), that In order to approximate this probability for a certain concept
could turn out to be null in case of disjoint retrieval sets for C, we recur to its extension w.r.t. the considered ABox in a
the input concepts, we distinguish the overlaps between the fixed interpretation. Namely, we chose the canonical interpre-
prims and those between the concepts in the scope of the tation IA , which is the one adopting the set of individuals
various role restrictions2 ; the contribution of the overlap of mentioned in the ABox as its domain and the identity as
concepts reached through role restrictions can be penalized by its interpretation function [15]. Now, given a concept C its
tweaking the parameter λ. The measure for primitive concepts probability is estimated by:
resembles Tversky’s measure and it represents a semantic pr(C) = |C IA |/|∆IA |
baseline since it depends on the semantics of the knowledge
base, as conveyed by the ABox assertions. This is in line with Finally, we can compute the information content of a concept,
to the ideas in [16], [10], where semantics is elicited as a employing this probability:
probability distribution over the domain of the interpretation.
IC(C) = − log pr(C)
As for the role restrictions overlap, for the universal restric-
tions we simply add the overlap measures varying the role, A measure of the concept dissimilarity is now formally
while for the existential restrictions the measure is trickier: defined [14]:
borrowing the idea of the existential mappings we consider, per Definition 3 (IC dissimilarity): Let A be an ABox with
role, all possible matches between the concepts in the scope canonical interpretation I. The information content dissimilar-
of existential restrictions, then we consider the maximal sum ity measure is a function g : L × L 7→ R+ defined recursively
of overlaps resulting from all the matches. for any descriptions C, D ∈ L, with
Fntwo normal formFconcept
m
Now, it is possible to derive a dissimilarity measure based C = i=1 Ci and D = j=1 Dj
on f as follows: disjunctive level:
Definition 2 (overlap dissimilarity measure): The overlap
0 if C ≡ D
dissimilarity measure is a function d : L × L 7→ [0,F1] such ∞ if C u D = ⊥
n g(C, D) := gt (C, D) =
that, givenFtwo concept descriptions C, D ∈ L, C = i=1 Ci min gu (Ci , Dj )
otherwise
m
1 ≤ i ≤ n
and D = j=1 Dj : 1 ≤ j ≤ m
conjunctive level:
1 if f (C, D) = 0
d(C, D) := 0 if f (C, D) = ∞ gu (Ci , Dj ) := gP (prim(Ci ), prim(Dj ))+
1
f (C,D) otherwise +λ(g∀ (Ci , Dj ) + g∃ (Ci , Dj ))
Function d simply measures the level of dissimilarity be-
tween two concepts as the inverse of the overlap function with λ ∈ [0, 1].
f . Particularly, if f (C, D) = 0, i.e. there is no overlap primitive concepts:
between the considered concepts, then d must indicate that
∞ if Pi u Pj ≡ ⊥
the two concepts are totally different, indeed d(C, D) = 1, gP (Pi , Pj ) :=
the maximum value of its range. If f (C, D) = ∞ this means IC(Pi uPj )+1
otherwise
IC(LCS(Pi ,Pj ))+1
that the two concepts are totally overlapped and consequently
d(C, D) = 0 that means that the two concept are indistin- value restrictions:
guishable, indeed d assumes the minimum value of its range. X
g∀ (Ci , Dj ) := gt (valR (Ci ), valR (Dj ))
If the considered concepts have a partial overlap then their
R∈NR
2 We tried also other solutions, such as considering minima or products of
existential restrictions:
the three overlap measures, yet experimentally this did not yield a better
N
performance whereas the computation time was increased. For practical X X
reasons, the maximal measure ∞ has been replaced with large numbers g∃ (Ci , Dj ) := min gt (Cik , Djp )
p=1,...,M
depending on the cardinality of the set of individuals in the ABox: |Ind(A)|. R∈NR k=1
where Cik ∈ exR (Ci ) and Djp ∈ exR (Dj ) and we suppose D. Discussion
w.l.o.g. that N = |exR (Ci )| ≥ |exR (Dj )| = M , otherwise the
We proved in [13], [14] that these measures are really
indices N and M are to be exchanged in the formula above.
dissimilarity measures according to the formal definition [22],
The function g represents a measure of the dissimilarity considering that their input (what is actually compared) is
between two descriptions expressed in ALC normal form. equivalence classes in L, i.e. ALC concept descriptions with
It is defined recursively beginning from the top level of the same normal form.
the descriptions (a disjunctive level) up to the bottom level
As previously mentioned, we have also attempted slightly
represented by (conjunctions of) primitive concepts.
different measure definitions (e.g. considering minima or prod-
Now g has values in [0, ∞]. It may be useful to derive a
ucts at the conjunctive level that sound more intuitive) which
normalized dissimilarity measure as shown in the following.
experimentally did not prove more effective than the simple
Definition 4 (normalized IC dissimilarity): Let A be an
(additive) definition given above.
ABox with canonical interpretation I. The normalized in-
The computational complexity of the measures depends
formation content dissimilarity measure is the function d :
on the complexity of the required reasoning services for
L × L 7→ [0, 1], such that
Fn given the conceptFmdescriptions in computing the concepts retrieval. Namely both subsumption
ALC normal form C = i=1 Ci and D = j=1 Dj , let
and instance-checking are P-space for the ALC logic[18], yet
such inference can be computed once and preliminarily, before
0 if g(C, D) = 0
d(C, D) := 1 if g(C, D) = ∞ the measures are computed for the method we will present in
1− 1 otherwise the following.
g(C,D)
Obviously when computing the dissimilarity measures for
C. Measuring the Dissimilarity between Individuals cases involving individuals, the extra cost of computing MSCs
The notion of Most Specific Concept is commonly exploited (or their approximations) has to be added.
for lifting individuals to the concept level. Nevertheless, in practical applications, these computations
Definition 5 (most specific concept): Given an ABox A may be efficiently carried out exploiting the statistics that
and an individual a, the most specific concept of a w.r.t. A is are maintained by the DBMSs query optimizers. Besides, the
the concept C, denoted MSCA (a), such that A |= C(a) and counts that are necessary for computing the concept extensions
for any other concept D such that A |= D(a), it holds that could be estimated by means of the probability distribution
C v D. over the domain.
In case of cyclic ABoxes expressed in a DL with existen-
tial restrictions the MSC may not be expressed by a finite IV. A N EAREST N EIGHBOR C LASSIFICATION P ROCEDURE
description [15], yet it may be often approximated. IN D ESCRIPTION L OGICS
On performing experiments related to another similarity We briefly review the basics of the k-Nearest Neighbor
measure exclusively based on concept extensions [21], we method (k-NN) and propose how to exploit the classification
noticed that, recurring to the MSC for lifting individuals to procedure for inductive reasoning. In this lazy approach to
the concept level, just falls short: indeed the MSCs may be learning, a notion of distance (or dissimilarity) measure for
too specific and unable to include other (similar) individuals the instance space is employed to classify a new instance.
in their extensions. By comparing descriptions reduced to the Let xq be the instance that requires a classification. Using
normal form we have given a more structural definition of a dissimilarity measure, the set of k nearest pre-classified
dissimilarity. However, since MSCs are computed from the instances is selected. The objective is to learn a discrete-valued
same ABox assertions, reflecting the current knowledge state, target function h : IS 7→ V from a space of instances IS to
this guarantees that structurally similar representations will be a set of values V = {v1 , . . . , vs }. In its simplest setting, the
obtained for semantically similar concepts. k-NN algorithm approximates h for new instances xq on the
Let us recall that, given the ABox, it is possible to calculate ground of the value that h assumes in the neighborhood of
the most specific concept of an individual a w.r.t. the ABox, xq , i.e. the k closest instances to the new instance in terms
MSC(a) or at least its approximation MSCk (a) up to a of a dissimilarity measure. Precisely, it is assigned according
certain description depth k. In some cases these are equivalent to the value which is voted by the majority of instances in
concepts but in general we have that MSCk (a) w MSC(a). the neighborhood. The the classification function ĥ can be
Given two individuals a and b in the ABox, we consider formally defined as follows:
MSCk (a) and MSCk (b) (supposed in normal form). Now, in
order to assess the dissimilarity between the individuals, d k
X
measure can be applied to these descriptions: ĥ(xq ) := argmax δ(v, h(xi ))
v∈V i=1
d(a, b) := d(MSCk (a), MSCk (b))
where δ is a function that returns 1 in case of matching argu-
This may turn out to be handy in several tasks, namely both ments and 0 otherwise. Note that the hypothesized function ĥ
in inductive reasoning (construction, repairing of knowledge is defined only extensionally, that is the k-NN method does not
bases) and in information retrieval. return an intensional classification model (e.g. a function or a
concept definition), it merely gives an answer for new query be interpreted negatively; rather, it should count as neutral
instances to be classified, employing the procedure above. information. Thus, one can still adopt the decision procedure
This simple formulation does not take into account the sim- in Eq. (2), however another value set has to be adopted for
ilarity among instances, except when selecting the instances to the hj ’s, namely V = {−1, 0, +1}, where the three values
be included in the neighborhood. Therefore a modified setting denote, respectively, positive occurrence, absence and negative
is generally adopted, weighting the vote on the grounds of the occurrence (positive for the concept negation). Formally:
similarity of the instance:
+1 Cj (x) ∈ A
k
X hj (x) = 0 Cj (x) 6∈ A ∧ ¬Cj (x) 6∈ A
ĥ(xq ) := argmax wi δ(v, h(xi )) (1)
−1 ¬Cj (x) ∈ A
v∈V i=1
Again, a more complex procedure may be devised by simply
where, usually, wi = 1/d(xi , xq ) or wi = 1/d(xi , xq )2 . substituting the notion of occurrence (absence) of assertions in
Usually this method is employed to classify vectors of (from) the ABox with one based on logic entailment (denoted
features in some n-dimensional instance space (e.g. often with `) from the knowledge base, i.e. K ` Cj (x), K 6` Cj (x)
IS = Rn ). Let us now turn to adapt the method to the more nor K 6` ¬Cj (x) and K ` ¬Cj (x), respectively. Although this
complex context of DLs descriptions. Preliminarily, it should may help reaching the precision of deductive reasoning, it is
be observed that a strong assumption of this setting is that it also much more computationally expensive, since the simple
can be employed to assign a value (e.g. a class) to a query lookup in the ABox must be replaced with instance checking.
instance among a set of values which can be regarded as a set
of pairwise disjoint concepts/classes. This is an assumption From a computational viewpoint this procedure could be
that cannot be always valid. In this case, indeed, an individual implemented to provide an answer even more efficiently than
could be an instance of more than one concept. with a standard deductive reasoner. Indeed, once the re-
Let us consider a new value set V = {C1 , . . . , Cs } of trieval of the primitive concepts is computed, the dissimilarity
concepts Cj that may be assigned to a query instance xq . measures can be easily computed by means of a dynamic
If they were to be considered as disjoint (like in the standard programming algorithm. Besides, the various measures could
machine learning setting), the decision procedure would adopt be maintained in an ad hoc data structure which would allow
the hypothesis function defined in Eq. (1), with the query for an efficient retrieval of the nearest neighbors, such as the
instance assigned the single concept voted by the weighted kD-trees or ball trees [23]
majority of instances in the neighborhood.
V. E XPERIMENTS
In the general case considered in this paper, when the
disjointness of the classes cannot be assumed (unless explicitly A. Experimental Setting
stated in the TBox), one can adopt another answering proce- In order to assess the validity of the presented method
dure, decomposing the multi-class problem into smaller binary with the dissimilarity measures proposed in Sect. III, we have
classification problems (one per concept). Therefore, a simple applied it to the instance classification problem, with four
binary value set (V = {−1, +1}) is to be employed. Then, different ontologies represented in OWL: FSM, S URFACE -
for each single concept (say Cj ), a hypothesis ĥj is computed WATER -M ODEL from the Protégé library4 , the F INANCIAL
on the fly during the classification phase: ontology5 employed as a testbed for the P ELLET reasoner and
k a small FAMILY ontology written in our lab. Although they
X δ(v, hj (xi )) are represented in languages that are different from ALC, we
ĥj (xq ) := argmax ∀j ∈ {1, . . . , s} (2)
v∈V i=1
d(xq , xi )2 simply discarded these details when constructing the MSC
approximations to be able to apply the presented measures.
where each function hj , simply indicates the occurrence (+1)
FAMILY is an ALCF ontology describing kinship rela-
or absence (−1) of the corresponding assertion in the ABox for
tionships. It is made up of 14 concepts (both primitive
the k training instances xi : Cj (xi ) ∈ A. Alternately3 , hj may
and defined), some of them are declared to be disjoint, 5
return +1 when Cj (xi ) is logically entailed by the knowledge
object properties, 39 distinct individual names. Most of the
base K, and −1 otherwise.
individuals are asserted to be instances of more than one
The problem with non-explicitly disjoint concepts is also
concept, and are involved in more than one role assertions.
related to the Closed World Assumption usually made in the
This ontology has been written to have a small yet more
context of Information Retrieval and Machine Learning. That
complex case with respect to the following ones. Indeed,
is the reason for adapting the standard setting to cope both with
while the other ontologies are more regular, i.e. only some
the case of non-disjoint concepts and with the OWA which is
concepts are employed in the assertions (the others are defined
commonly made in the Semantic Web context. To deal with
only intensionally), in the FAMILY ontology every concept
the OWA, the absence of information on whether a certain
instance x belongs to the extension of concept Cj should not 4 See the webpage:
http://protege.stanford.edu/plugins/owl/owl-library
3 For the sake of simplicity and efficiency, this case will not be considered 5 See the webpage: http://www.cs.put.poznan.pl/
in the following. alawrynowicz/financial.owl
TABLE I
has at least one instance asserted. The same happens for the
AVERAGE RESULTS OF THE EXPERIMENTS WITH THE METHOD
assertions on roles; particularly, there are some cases where
EMPLOYING THE MEASURE BASED ON OVERLAP.
role assertions constitute a chain from an individual to another
one, by means of other intermediate assertions. Match Commission Omission Induction
The FSM ontology describes the domain of finite state Rate Rate Rate Rate
FAMILY .654±.174 .000±.000 .231±.173 .115±.107
machines using the SOF(D) language. It is made up of FSM .974±.044 .026±.044 .000±.000 .000±.000
20 (both primitive and defined) concepts (some of them are S.-W.-M. .820±.241 .000±.000 .064±.111 .116±.246
explicitly declared to be disjoint), 10 object properties, 7 F INANCIAL .807±.091 .024±.076 .000±.001 .169±.076
datatype properties, 37 distinct individual names. About half
of the individuals are asserted as instances of a single concept
and are not involved in any role (object property). or not) while it was to be classified as an instance of that
S URFACE -WATER -M ODEL is an ALCOF(D) ontology de- concept;
scribing the domain of the surface water and the water quality • commission error rate: amount of individuals (analog-
models. It is based on the Surface-water Models Information ically) labeled as instances of a concept, while they
Clearinghouse (SMIC) of the USGS. Namely, it is an ontology (logically) belong to that concept or vice-versa
of numerical models for surface water flow and water quality • induction rate: amount of individuals that were found to
simulation. The application domain of these models comprises belong to a concept or its negation, while this information
lakes, oceans, estuaries etc.. These models are classified based is not logically derivable from the knowledge base
on their availability, application domain, dimensions, partial We report the average rates obtained over all the concepts in
differential equation solver, and characteristics types. It is each ontology and also their standard deviation.
made up of 19 concepts (both primitive and defined) without
any specification about disjointness, 9 object properties, 115 B. Experiments Employing the Overlap Measure
distinct individual names; each of them is an instance of a By looking at Tab. I reporting the experimental outcomes
single class and only some of them are involved in object with the dissimilarity measure based on the overlap (see Def.
properties. 2), preliminarily it is important to note that, for every ontology,
F INANCIAL is an ALCIF ontology that describes the the commission error was quite low. This means that the
domain of eBanking. It is made up of 60 (both primitive and classifier did not make critical mistakes i.e. cases when an
defined) concepts (some of them are declared to be disjoint), individual is deemed as an instance of a concept while it really
17 object properties, and no datatype property. It contains is an instance of another disjoint concept.
17941 distinct individual names. From the original ABox, we In particular, by looking at the outcomes related to the
randomly extracted assertions for 652 individuals. FAMILY ontology, it can be observed that the match rate
The classification method was applied to all the individuals is the lowest while the highest rate of omission errors was
in each ontology; namely, the individuals were checked to reported. This may be due to two facts: 1) very few individuals
assess if they were instances of the concepts in the ontology were available w.r.t. the number of concepts7 ; 2) sparse data
through the analogical method. The performance was evalu- situation: instances are irregularly spread over the concepts,
ated comparing its responses to those returned by a standard that is they might be instances of different concepts, which
reasoner6 as a baseline. are sometimes disjoint. Hence the MSC approximations that
Specifically, for each individual in the ontology the MSC were computed also resulted very different one from another,
is computed and enlisted in the set of training (or test) which reduces the possibility of significantly matching similar
examples. Each example is classified applying the adapted k- MSCs. This is a known drawback of the Nearest-Neighbor
NN method p presented in the previous section. As a value of k methods. Nevertheless, as mentioned above, it is important to
we chose |Ind(A)|, as advised in the instance-based learning note that the algorithm did not make any commission error
literature. and it is able to infer new knowledge (11%).
The experiment has been repeated twice adopting a leave- As regards the FSM ontology, we have observed the maxi-
one-out cross validation procedure with both the dissimilarity mum match rate with respect to the classification given by the
measures defined in Section III. logic reasoner. Moreover, differently from the other ontologies,
For each concept in the ontology, we measured the follow- both the omission error rate and induction rate were null. A
ing parameters for the evaluation: very limited percentage of incorrect classification cases was
• match rate: number of cases of individuals that got observed. These outcomes were probably due to the fact that
exactly the same classification by both classifiers with individuals in this ontology are quite regularly divided by the
respect to the overall number of individuals; assertions on concepts and roles, so computing their MSCs,
• omission error rate: amount of unlabeled individuals (our
these are all very similar to each other and so the amount
method could not determine whether it was an instance 7 Instance-based methods make an intensive use of the information about
the individuals and improve their performance with the increase of the number
6 We employed P ELLET : http://pellet.owldl.com of instances considered.
TABLE II
it is well known that the NN procedure becomes more and
AVERAGE RESULTS OF THE EXPERIMENTS WITH THE METHOD
more precise as more instances can be considered. The price
EMPLOYING THE MEASURE BASED ON INFORMATION CONTENT.
to be paid was a longer computation time.
Match Commission Omission Induction
Rate Rate Rate Rate VI. C ONCLUSIONS AND F UTURE W ORK
FAMILY .608±.230 .000±.000 .330±.216 .062±.217 In this work we have coupled with a method for inductive
FSM .899±.178 .096±.179 .000±.000 .005±.024
S.-W.-M. .820±.241 .000±.000 .064±.111 .116±.246 inference on ABoxes with two composite concept similar-
F INANCIAL .807±.091 .024±.076 .000±.001 .169±.046 ity measures. The method is based on the classification in
analogy with the majority of neighbor training individuals.
It can be naturally exploited for predicting/suggesting missing
of information they convey is very low. A choice of a lower information about individuals thus enabling a sort of inductive
number k of neighbors could probably help committing those retrieval. For example, it may be employed a query to the
residual errors. KB may be issued. Specifically we are targeting also the task
For the same reasons, also for the S URFACE -WATER - of service discovery when for semantically annotated web
M ODEL ontology quite a high rate of matching classifications service. Even more so the outcomes of our method may be
was reported (yet less then with the previous ontology); decisive for enabling a series of other inductive tasks such as
moreover, some cases of omission error (6%) were observed. clustering, case-based reasoning, , etc..
The induction rate was about 12% which means that for The proposed method is able to induce new assertions, in
this ontology our classifier always assigned individuals to addition to the knowledge already derivable by means of a
the correct concepts but, in some cases, it could also induce reasoner. Then it seems to be able to enhance the standard
new assertions. Since this rate represents assertions that were instance-based classification. Particularly, an increase in accu-
not logically deducible from the ontology and yet they were racy was observed when the instances increase in number and
inferred inductively by the analogical classifier, these figures are homogeneously spread.
would be a positive outcome (provided this knowledge were The presented measure can be refined tweaking the weight-
deemed as correct by an expert). Particularly, in this case the ing factor λ for decreasing the impact of the dissimilarity
increase of the induction rate has been due to the presence of between nested sub-concepts in the descriptions on the de-
assertions of mutual disjointness for some of the concepts. termination of the overall value.
Results are no different also for the case of the experiments Lately, we have defined another measure for ALN [24].
with F INANCIAL ontology that largely exceeds the others Hence a natural extension may concern the definition of
in terms of number of concepts and individuals. Namely, dissimilarity measures in more expressive DLs languages. For
the observed match rate is again above the 80% and the example, a normal form for ALCN can be obtained based on
rest of the cases are comprised in the induction rate (17%), those for ALN and ALCN . Then, by exploiting the notion of
leaving a limited margin to residual errors. This corroborates existential mappings [25], the measure presented in this paper
a fact about the NN learners, that is their reaching better may be extended to more expressive DLs. We are currently
and better performance in the limit, as long as new training developing other kind of semantic similarities based on the
instances become available. Actually, performing a 10-fold idea of the hypothesis-driven distance.
cross validation we obtained almost the same results. Besides, as mentioned, this method could be extended
with different (yet still computationally tractable) answering
C. Experiments with the Information Content Measure procedures grounded on statistical inference (non-parametric
The average results obtained by adopting the procedure with tests based on ranked distances), in order to accept answers
the measure based on information content (see Def. 4) are as correct with a high degree of confidence. Furthermore,
reported in Table II. the k-NN method in its classical form is particularly suitable
By analyzing this table it is possible to note that no sensible for the automated induction of missing values for (scalar or
variation was observed in the classifications performed using numeric) datatype properties of an individual as an estimate
the first dissimilarity measure. Particularly, with both measures derived from the values of the datatypes for the surrounding
the method correctly classified all the individuals, without individuals.
commission errors. The reason is that, in most of the cases, Kernels are another means to express a notion of similarity
the individuals of these ontologies are instances of one concept is some unknown feature space. We are working at the
only and they are involved in a few roles (object properties). definition of kernel functions on DLs representations [26], thus
Some of the figures are slightly lower than those observed in allowing the exploitation of kernel methods efficiency (e.g. the
the other experiment: this is confirmed by a higher variability. support vector machines) in a multi-relational setting.
Surprisingly, the results on the larger ontologies (S.-W.-M.
and F INANCIAL) perfectly match those obtained with the other ACKNOWLEDGMENTS
measure which is probably due to the fact that we used a leave- This work was partially supported by the regional interest
one-out cross-validation mode which yielded a high value for projects DIPIS (DIstributed Production as Innovative System)
the number k of training instances for the neighborhood and and DDTA (Distretto Digitale Tessile Abbigliamento) in the
context of the tasks related to the semantic web services [22] H. Bock., Analysis of Symbolic Data: Exploratory Methods for Extract-
discovery. ing Statistical Information from Complex Data. Springer-Verlag, 1999.
[23] I. H. Witten and E. Frank, Data Mining, 2nd ed. Morgan Kaufmann,
2005.
R EFERENCES [24] N. Fanizzi and C. d’Amato, “A similarity measure for the ALN
description logic,” in Proceedings of Convegno Italiano di Logica
[1] W. Cohen and H. Hirsh, “Learning the CLASSIC description logic,” Computazionale, CILC06, Bari, Italy, 2006, http://cilc2006.di.uniba.it/
in Proceedings of the 4th International Conference on the Principles download/camera/15 Fanizzi CILC06.pdf.
of Knowledge Representation and Reasoning, P. Torasso, J. Doyle, and [25] R. Kusters and R. Molitor, “Computing least common subsumers in
E. Sandewall, Eds. Morgan Kaufmann, 1994, pp. 121–133. ALEN ,” in Proceedings of the International Joint Conference on
[2] J.-U. Kietz and K. Morik, “A polynomial approach to the constructive Artificial Intelligence, IJCAI2001, B. Nebel, Ed., 2001, pp. 219–224.
induction of structural knowledge,” Machine Learning, vol. 14, no. 2, [26] N. Fanizzi and C. d’Amato, “A declarative kernel for ALC concept
pp. 193–218, 1994. descriptions.” in Proceedings of the 16th International Symposium on
[3] S. Staab, A. Maedche, C. Nedellec, and P. Wiemer-Hastings, Eds., Methodologies for Intelligent Systems, ISMIS 2006, ser. LNAI, F. Es-
ECAI2000 Workshop on Ontology Learning, ser. CEUR WS Proceed- posito, Z. Ras, D. Malerba, and G. Semeraro, Eds., vol. 4203. Springer,
ings, 2000, vol. 31. 2006, pp. 322–331.
[4] S. Staab, A. Maedche, C. Nedellec, and E. Hovy, Eds., IJCAI2001
Workshop on Ontology Learning, 2001.
[5] F. Esposito, N. Fanizzi, L. Iannone, I. Palmisano, and G. Semeraro.,
“Knowledge-intensive induction of terminologies from metadata,” in
ISWC2004, Proceedings of the 3rd International Semantic Web Con-
ference, ser. LNCS, F. van Harmelen, S. McIlraith, and D. Plexousakis,
Eds., vol. 3298. Springer, 2004, pp. 441–455.
[6] M. d’Aquin, J. Lieber, and A. Napoli, “Decentralized case-based rea-
soning for the Semantic Web,” in Proceedings of the 4th International
Semantic Web Conference, ISWC2005, ser. LNCS, Y. Gil, V. Motta,
E. Benjamins, and M. A. Musen, Eds., no. 3279. Springer, 2005, pp.
142–155.
[7] P. Hitzler and D. Vrandec̆ić, “Resolution-based approximate reasoning
for OWL DL,” in Proceedings of the 4th International Semantic Web
Conference, ISWC2005, ser. LNCS, Y. Gil, V. Motta, E. Benjamins, and
M. A. Musen, Eds., no. 3279. Springer, 2005, pp. 383–397.
[8] T. Mitchell, Machine Learning. McGraw-Hill, 1997.
[9] M. Rodrı́guez, “Assessing semantic similarity between spatial entity
classes,” Ph.D. dissertation, University of Maine, 1997.
[10] A. Borgida, T. Walsh, and H. Hirsh, “Towards measuring similarity in
description logics,” in Working Notes of the International Description
Logics Workshop, ser. CEUR Workshop Proceedings, Edinburgh, UK,
2005.
[11] M. W. Bright, A. R. Hurson, and S. H. Pakzad, “Automated resolution
of semantic heterogeneity in multidatabases.” ACM Transaction on
Database Systems, vol. 19, no. 2, pp. 212–253, 1994.
[12] P. Resnik, “Semantic similarity in a taxonomy: An information-based
measure and its application to problems of ambiguity in natural lan-
guage,” Journal of Artificial Intelligence Research, vol. 11, pp. 95–130,
1999.
[13] C. d’Amato, N. Fanizzi, and F. Esposito, “A dissimilarity measure for
the ALC description logic,” in Semantic Web Applications and Perspec-
tives, 2nd Italian Semantic Web Workshop SWAP2005, P. Bouquet and
G. Tummarello, Eds., vol. 166. Trento, Italy: CEUR, 2005.
[14] ——, “A dissimilarity measure for ALC concept descriptions,” in
Proceedings of the 21st Annual ACM Symposium of Applied Computing,
SAC2006, H. Haddad, Ed. Dijon, France: ACM, 2006, pp. 1695–1699.
[15] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-
Schneider, Eds., The Description Logic Handbook. Cambridge Uni-
versity Press, 2003.
[16] F. Bacchus, “Lp, a logic for representing and reasoning with statistical
knowledge,” Computational Intelligence, vol. 6, pp. 209–231, 1990.
[17] M. Schmidt-Schauß and G. Smolka, “Attributive concept descriptions
with complements,” Artificial Intelligence, vol. 48, no. 1, pp. 1–26, 1991.
[18] F. M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf, “Deduction in
concept languages: From subsumption to instance checking,” Journal
of Logic and Computation, vol. 4, no. 4, pp. 423–452, 1994. [Online].
Available: citeseer.ist.psu.edu/donini94deduction.html
[19] S. Brandt, R. Küsters, and A.-Y. Turhan, “Approximation and difference
in description logics,” in Proceedings of the International Conference on
Knowledge Representation, D. Fensel, F. Giunchiglia, D. McGuinness,
and M.-A. Williams, Eds. Morgan Kaufmann, 2002, pp. 203–214.
[20] A. Tversky, “Features od similarity,” Psycological Review, vol. 84, no. 4,
pp. 327–352, 1997.
[21] C. d’Amato, N. Fanizzi, and F. Esposito, “A semantic similarity measure
for expressive description logics,” in Proceedings of Convegno Italiano
di Logica Computazionale, CILC05, A. Pettorossi, Ed., Rome, Italy,
2005.