Reasoning by Analogy in Description Logics
             through Instance-based Learning
                                  Claudia d’Amato, Nicola Fanizzi, Floriana Esposito
                               Dipartimento di Informatica, Università degli Studi di Bari
                                Campus Universitario, Via Orabona 4, 70125 Bari, Italy
                                   {claudia.damato | fanizzi | esposito}@di.uniba.it

   Abstract— This work presents a method founded in instance-         It has been observed that, adopting richer representations,
based learning for inductive (memory-based) reasoning on           the structural properties have less and less impact in assessing
ABoxes. The method, which exploits a semantic dissimilarity        semantic similarity. Hence, the vision of similarity based only
measure between concepts and instances, can be employed both
to answer class membership queries and to predict new assertions   on a structural (graph-based) approach, such as in [11], [12]
that may be not logically entailed by the knowledge base. In a     may fall short. We have proposed some dissimilarity measures
preliminary experimentation, we show that the method is sound      for non trivial DL languages, based on the semantics conveyed
and it is actually able to induce new assertions that might be     by the ABox assertions, which are suitable for being used
acquired in the knowledge base.                                    in instance-based methods [13], [14]. These measures elicit
                                                                   the underlying semantics by querying the knowledge base
                      I. I NTRODUCTION
                                                                   for assessing the concept extensions, estimated through their
   Most of the research on ontology reasoning has been             retrieval [15], as also hinted in [16]. Besides, the overall
focusing on methods based on deductive reasoning. However,         similarity is also (partially) influenced by the concepts which
important tasks that are likely to be provided by new genera-      are related through role restrictions. Moreover, in many other
tion knowledge-based systems, such as construction, revision,      typical tasks (e.g. conceptual clustering or definition), it is
population and evolution are likely to be supported also by        necessary to assess the similarity between concepts (resp.
inductive methods. This has brought to an increasing interest      individuals). By recurring to the notion of most specific
in machine learning and knowledge discovery methods applied        concept of an individual with respect to an ABox [15], as
to ontological representations (see [1], [2] and, more recently,   representatives of the individuals at the concept level, the
[3], [4], [5], [6]).                                               measures for concepts can be extended to such cases.
   We propose an algorithm which is based on a notion of              This analogical reasoning procedure like this may be em-
concept similarity for performing a form of lazy learning on       ployed to answering class membership queries through analog-
typical ontological representations. Namely, by combining an       ical rather than deductive reasoning which may be more robust
instance-based (analogical) approach with a notion of semantic     with respect to noise and is likely to suggest new knowledge
dissimilarity, this paper intends to demonstrate the applicabil-   (which was not logically derivable). Specifically we developed
ity of inductive reasoning in this field which can be considered   the method also for an application of semantic web service
another form of approximate reasoning (see discussion in           discovery where services are annotated in DLs.
[7]). In particular, we have adapted the general instance-            Another application might regard supporting various tasks
based learning approach like the k-Nearest Neighbor [8] to         for the knowledge engineer, such as the acquisition of can-
the specific multi-relational setting for ontology languages.      didate assertions for enriching ontologies with partially pop-
A couple of technical problems had to be solved for this           ulated ABoxes: the outcomes given by the procedure can be
adaptation to ontology representations: 1) the Open World          utilized as recommendations. Indeed, as we show in the ex-
Assumption (OWA) that is made in this context; 2) in this          perimentation, the newly induced assertions are quite accurate
multi-class problem setting disjunction cannot be assumed by       (commission errors, i.e. predicting a concept erroneously, were
default.                                                           rarely observed). In turn, the outcomes of the procedure may
   The standard ontology languages are founded in Description      also trigger other related tasks such as induction (revision) of
Logics (henceforth DLs) as they borrow the language con-           (faulty) knowledge bases.
structors for expressing complex concept definitions. Instance-       The paper is organized as follows. In the next section,
based methods depend on a similarity measure defined on            the representation language is briefly presented. Two concept
the instance space. In this perspective, a variety of measures     dissimilarity measures are recalled and exploited in a modified
for concept representations have been proposed (see [9] for        version of the k-NN classification procedure. The results of a
a survey). As pointed out in [10], most of these measures          preliminary experimental evaluation of the method using these
focus on the similarity of atomic concepts within hierarchies or   two measures are shown and, finally, possible developments
simple ontologies, based on a few relations. Thus, it becomes      of the method are examined.
necessary to investigate similarity in more complex languages.
                II. ALC K NOWLEDGE BASES                               •    prim(Di ) is the set of all (negated) primitive concepts
                                                                            occurring at the top level of Di ;
   We recall the basics of ALC [17], a logic which adopts
constructors supported by the standard ontology languages              •    valR (Di ) is the conjunction C1i u · · · u Cni in the value
(see the DL handbook [15] for a thorough reference).                        restriction of role R, if any (otherwise valR (Di ) = >);
   In DLs, descriptions are inductively defined starting with a
set NC of primitive concept names and a set NR of primitive            •  exR (Di ) is the set of concepts in the existential restric-
roles. Complex descriptions are built using primitive concepts            tions of the role R.
and roles and the constructors showed in the following. The
                                                                     For any R, every sub-description in exR (Di ) and valR (Di ) is
semantics of the descriptions is defined by an interpretation
                                                                     in normal form.
I = (∆I , ·I ), where ∆I is a non-empty set, the domain of
                                                                        In the following, let L = ALC/≡ be the quotient set of
the interpretation, and ·I is the interpretation function that
                                                                     ALC descriptions in normal form.
maps each A ∈ NC to a set AI ⊆ ∆I and each R ∈ NR to
R I ⊆ ∆I × ∆I .                                                      III. D ISSIMILARITY M EASURES IN D ESCRIPTION L OGICS
   The top concept > is interpreted as the whole domain of             We recall two definition of dissimilarity measures for ALC
objects ∆I , while the bottom concept ⊥ corresponds to ∅.            descriptions expressed in normal form [13], [14]. These mea-
Complex descriptions can be built in ALC using the following         sures are based on both the structure and the semantics of the
constructors. The language supports full negation: given any         concept descriptions.
description C, denoted ¬C, it amounts to ∆I \ C I . Concept
conjunction, denoted C1 u C2 , yields the extension C1I ∩ C2I ;      A. Overlap Dissimilarity Measure
dually, concept disjunction, denoted C1 t C2 , yields the union          The first measure, is derived from a measure of the overlap
C1I ∪ C2I . Finally, the existential restriction, denoted ∃R.C, is   between concepts, that can be defined as follows:
interpreted as {x ∈ ∆I | ∃y ∈ ∆I ((x, y) ∈ RI ∧ y ∈ C I )}               Definition 1 (overlap function): Let I be the canonical in-
and the value restriction ∀R.C has as its extension the set          terpretation of the ABox A. The overlap function f is defined1 :
{x ∈ ∆I | ∀y ∈ ∆I ((x, y) ∈ RI → y ∈ C I )}.                         f : LF× L 7→ R+ defined
                                                                              n                Fm for descriptions C, D ∈ L, with
   The main inference is subsumption between concepts based          C = i=1 Ci and D = j=1 Dj
on their semantics: given two descriptions C and D, C                    disjunctive level:
subsumes D, denoted by C w D, iff for every interpretation                                       
I it holds that C I ⊇ DI . When C w D and D w C then                                              ∞                       C≡D
they are equivalent, denoted with C ≡ D.                              f (C, D) := ft (C, D) =       0                   C uD ≡⊥
                                                                                                    maxi,j fu (Ci , Dj ) otherwise
                                                                                                 
   A knowledge base K = hT , Ai contains a TBox T and an
ABox A:                                                                conjunctive level:
  • T is the set of definitions C ≡ D, meaning that, for every                 fu (Ci , Dj ) := fP (prim(Ci ), prim(Dj ))+
    interpretation I, C I = DI , where C is the concept name                                    +λ(f∀ (Ci , Dj ) + f∃ (Ci , Dj ))
    and D is its description constructed as above;
  • A contains concept and role assertions about the world-          with λ ∈ [0, 1].
    state, e.g. C(a) and R(a, b), meaning that, for every I,           primitive concepts:
    aI ∈ C I and (aI , bI ) ∈ RI .                                                                   |R(P1 ) ∪ R(P2 )|
                                                                           fP (P1 , P2 ) :=
   A related inference used in the following is instance check-                           |(R(P1 ) ∪ R(P2 )) \ (R(P1 ) ∩ R(P2 ))|
ing, that is deciding whether an individual is an instance of a                           T        I
                                                                     where R(P ) =          A∈P A     and fP (P1 , P2 ) = ∞ when
concept [18], [15]. Conversely, it may be necessary to find the
                                                                     R(P1 ) = R(P2 ).
concepts which an individual belongs to (realization problem),
                                                                       value restrictions:
especially the most specific one (see Def. 5).                                                 X
   Semantically equivalent (yet syntactically different) descrip-            f∀ (Ci , Dj ) :=        ft (valR (Ci ), valR (Dj ))
tions can be given for the same concept. Nevertheless, equiv-                                     R∈NR
alent concepts can be reduced to a normal form by means of             existential restrictions:
rewriting rules that preserve their equivalence [19]:
                                                                                                    N
   A description D is in ALC normal form iff D ≡ ⊥ (then                                          X X
                                                                               f∃ (Ci , Dj ) :=                max ft (Cik , Djp )
D := ⊥) or if D ≡ > (then D := >) or if D = D1 t · · · t Dn                                                  p=1,...,M
                                                                                                  R∈NR k=1
(∀i = 1, . . . , n, Di 6≡ ⊥) with
                                                                   where Cik ∈ exR (Ci ) and Djp ∈ exR (Dj ) and we suppose
           l              l                       l                  w.l.o.g. that N = |exR (Ci )| ≥ |exR (Dj )| = M , otherwise the
Di =                Au        ∀R.valR (Di ) u           ∃R.E       indices N and M as well as Ci and Dj are to be exchanged
      A∈prim(Di )     R∈NR                     E∈exR (Di )           in the formula above.
where:                                                                 1 The name A of the ABox is omitted for simplicity.
   The function f represents a measure of the overlap between                   dissimilarity is inversely proportional to their overlap, since
two descriptions expressed in ALC normal form. It measures                      in this case f (C, D) > 1 and consequently 0 < d(C, D) < 1.
the similarity of the input concepts based on the similarity
                                                                                B. A Dissimilarity Measure Based on Information Content
between their extensions (approximated with the retrieval) and
also on the similarity of the concepts reached by the role                         As discussed in [12], a measure of concept (dis)similarity
restrictions. Namely, It is defined recursively beginning from                  can be derived from the notion of Information Content (IC)
the top level of the descriptions (a disjunctive level) up to                   that, in turn, depends on the probability of an individual to
the bottom level represented by (conjunctions of) primitive                     belong to a certain concept. Now, differently from other works,
concepts.                                                                       which assume that a probability distribution for the concepts
   Overlap at the disjunctive level is treated as the maximum                   in an ontology is known, we derive it from the knowledge
overlap between the disjunctive forms of the input concepts. At                 base, from the distribution that can be estimated therein [16],
conjunctive levels, instead of simply considering the similarity                [10].
like in Tversky’s measure [20] (as we do for prim’s), that                         In order to approximate this probability for a certain concept
could turn out to be null in case of disjoint retrieval sets for                C, we recur to its extension w.r.t. the considered ABox in a
the input concepts, we distinguish the overlaps between the                     fixed interpretation. Namely, we chose the canonical interpre-
prims and those between the concepts in the scope of the                        tation IA , which is the one adopting the set of individuals
various role restrictions2 ; the contribution of the overlap of                 mentioned in the ABox as its domain and the identity as
concepts reached through role restrictions can be penalized by                  its interpretation function [15]. Now, given a concept C its
tweaking the parameter λ. The measure for primitive concepts                    probability is estimated by:
resembles Tversky’s measure and it represents a semantic                                            pr(C) = |C IA |/|∆IA |
baseline since it depends on the semantics of the knowledge
base, as conveyed by the ABox assertions. This is in line with                  Finally, we can compute the information content of a concept,
to the ideas in [16], [10], where semantics is elicited as a                    employing this probability:
probability distribution over the domain of the interpretation.
                                                                                                     IC(C) = − log pr(C)
   As for the role restrictions overlap, for the universal restric-
tions we simply add the overlap measures varying the role,                         A measure of the concept dissimilarity is now formally
while for the existential restrictions the measure is trickier:                 defined [14]:
borrowing the idea of the existential mappings we consider, per                    Definition 3 (IC dissimilarity): Let A be an ABox with
role, all possible matches between the concepts in the scope                    canonical interpretation I. The information content dissimilar-
of existential restrictions, then we consider the maximal sum                   ity measure is a function g : L × L 7→ R+ defined recursively
of overlaps resulting from all the matches.                                     for any                             descriptions C, D ∈ L, with
                                                                                      Fntwo normal formFconcept
                                                                                                           m
   Now, it is possible to derive a dissimilarity measure based                  C = i=1 Ci and D = j=1 Dj
on f as follows:                                                                   disjunctive level:
                                                                                                          
   Definition 2 (overlap dissimilarity measure): The overlap                                              
                                                                                                                     0                if C ≡ D
dissimilarity measure is a function d : L × L 7→ [0,F1] such                                                         ∞            if C u D = ⊥
                                                                                                          
                                                        n                       g(C, D) := gt (C, D) =
that, givenFtwo concept descriptions C, D ∈ L, C = i=1 Ci                                                     min gu (Ci , Dj )
                                                                                                                                       otherwise
                                                                                                          
             m                                                                                            
                                                                                                          1 ≤ i ≤ n
and D = j=1 Dj :                                                                                                1 ≤ j ≤ m


                                                                                  conjunctive level:
                      
                           1        if f (C, D) = 0
         d(C, D) :=         0        if f (C, D) = ∞                                    gu (Ci , Dj ) := gP (prim(Ci ), prim(Dj ))+
                             1
                      
                         f (C,D)     otherwise                                                           +λ(g∀ (Ci , Dj ) + g∃ (Ci , Dj ))
   Function d simply measures the level of dissimilarity be-
tween two concepts as the inverse of the overlap function                       with λ ∈ [0, 1].
f . Particularly, if f (C, D) = 0, i.e. there is no overlap                       primitive concepts:
between the considered concepts, then d must indicate that                                           
                                                                                                          ∞                      if Pi u Pj ≡ ⊥
the two concepts are totally different, indeed d(C, D) = 1,                         gP (Pi , Pj ) :=
the maximum value of its range. If f (C, D) = ∞ this means                                            IC(Pi uPj )+1
                                                                                                                                  otherwise
                                                                                                           IC(LCS(Pi ,Pj ))+1
that the two concepts are totally overlapped and consequently
d(C, D) = 0 that means that the two concept are indistin-                         value restrictions:
guishable, indeed d assumes the minimum value of its range.                                                  X
                                                                                        g∀ (Ci , Dj ) :=           gt (valR (Ci ), valR (Dj ))
If the considered concepts have a partial overlap then their
                                                                                                            R∈NR
   2 We tried also other solutions, such as considering minima or products of
                                                                                  existential restrictions:
the three overlap measures, yet experimentally this did not yield a better
                                                                                                              N
performance whereas the computation time was increased. For practical                                       X X
reasons, the maximal measure ∞ has been replaced with large numbers                     g∃ (Ci , Dj ) :=                    min    gt (Cik , Djp )
                                                                                                                       p=1,...,M
depending on the cardinality of the set of individuals in the ABox: |Ind(A)|.                              R∈NR k=1
where Cik ∈ exR (Ci ) and Djp ∈ exR (Dj ) and we suppose             D. Discussion
w.l.o.g. that N = |exR (Ci )| ≥ |exR (Dj )| = M , otherwise the
                                                                        We proved in [13], [14] that these measures are really
indices N and M are to be exchanged in the formula above.
                                                                     dissimilarity measures according to the formal definition [22],
   The function g represents a measure of the dissimilarity          considering that their input (what is actually compared) is
between two descriptions expressed in ALC normal form.               equivalence classes in L, i.e. ALC concept descriptions with
It is defined recursively beginning from the top level of            the same normal form.
the descriptions (a disjunctive level) up to the bottom level
                                                                        As previously mentioned, we have also attempted slightly
represented by (conjunctions of) primitive concepts.
                                                                     different measure definitions (e.g. considering minima or prod-
   Now g has values in [0, ∞]. It may be useful to derive a
                                                                     ucts at the conjunctive level that sound more intuitive) which
normalized dissimilarity measure as shown in the following.
                                                                     experimentally did not prove more effective than the simple
   Definition 4 (normalized IC dissimilarity): Let A be an
                                                                     (additive) definition given above.
ABox with canonical interpretation I. The normalized in-
                                                                        The computational complexity of the measures depends
formation content dissimilarity measure is the function d :
                                                                     on the complexity of the required reasoning services for
L × L 7→ [0, 1], such that
                         Fn given the conceptFmdescriptions in       computing the concepts retrieval. Namely both subsumption
ALC normal form C = i=1 Ci and D = j=1 Dj , let
                                                                     and instance-checking are P-space for the ALC logic[18], yet
                                                                     such inference can be computed once and preliminarily, before
                   
                    0                 if g(C, D) = 0
       d(C, D) :=     1                if g(C, D) = ∞                the measures are computed for the method we will present in
                    1− 1              otherwise                     the following.
                           g(C,D)
                                                                        Obviously when computing the dissimilarity measures for
C. Measuring the Dissimilarity between Individuals                   cases involving individuals, the extra cost of computing MSCs
   The notion of Most Specific Concept is commonly exploited         (or their approximations) has to be added.
for lifting individuals to the concept level.                           Nevertheless, in practical applications, these computations
   Definition 5 (most specific concept): Given an ABox A             may be efficiently carried out exploiting the statistics that
and an individual a, the most specific concept of a w.r.t. A is      are maintained by the DBMSs query optimizers. Besides, the
the concept C, denoted MSCA (a), such that A |= C(a) and             counts that are necessary for computing the concept extensions
for any other concept D such that A |= D(a), it holds that           could be estimated by means of the probability distribution
C v D.                                                               over the domain.
   In case of cyclic ABoxes expressed in a DL with existen-
tial restrictions the MSC may not be expressed by a finite           IV. A N EAREST N EIGHBOR C LASSIFICATION P ROCEDURE
description [15], yet it may be often approximated.                                 IN D ESCRIPTION L OGICS
   On performing experiments related to another similarity              We briefly review the basics of the k-Nearest Neighbor
measure exclusively based on concept extensions [21], we             method (k-NN) and propose how to exploit the classification
noticed that, recurring to the MSC for lifting individuals to        procedure for inductive reasoning. In this lazy approach to
the concept level, just falls short: indeed the MSCs may be          learning, a notion of distance (or dissimilarity) measure for
too specific and unable to include other (similar) individuals       the instance space is employed to classify a new instance.
in their extensions. By comparing descriptions reduced to the           Let xq be the instance that requires a classification. Using
normal form we have given a more structural definition of            a dissimilarity measure, the set of k nearest pre-classified
dissimilarity. However, since MSCs are computed from the             instances is selected. The objective is to learn a discrete-valued
same ABox assertions, reflecting the current knowledge state,        target function h : IS 7→ V from a space of instances IS to
this guarantees that structurally similar representations will be    a set of values V = {v1 , . . . , vs }. In its simplest setting, the
obtained for semantically similar concepts.                          k-NN algorithm approximates h for new instances xq on the
   Let us recall that, given the ABox, it is possible to calculate   ground of the value that h assumes in the neighborhood of
the most specific concept of an individual a w.r.t. the ABox,        xq , i.e. the k closest instances to the new instance in terms
MSC(a) or at least its approximation MSCk (a) up to a                of a dissimilarity measure. Precisely, it is assigned according
certain description depth k. In some cases these are equivalent      to the value which is voted by the majority of instances in
concepts but in general we have that MSCk (a) w MSC(a).              the neighborhood. The the classification function ĥ can be
   Given two individuals a and b in the ABox, we consider            formally defined as follows:
MSCk (a) and MSCk (b) (supposed in normal form). Now, in
order to assess the dissimilarity between the individuals, d                                             k
                                                                                                         X
measure can be applied to these descriptions:                                        ĥ(xq ) := argmax         δ(v, h(xi ))
                                                                                                 v∈V     i=1
              d(a, b) := d(MSCk (a), MSCk (b))
                                                                     where δ is a function that returns 1 in case of matching argu-
   This may turn out to be handy in several tasks, namely both       ments and 0 otherwise. Note that the hypothesized function ĥ
in inductive reasoning (construction, repairing of knowledge         is defined only extensionally, that is the k-NN method does not
bases) and in information retrieval.                                 return an intensional classification model (e.g. a function or a
concept definition), it merely gives an answer for new query                       be interpreted negatively; rather, it should count as neutral
instances to be classified, employing the procedure above.                         information. Thus, one can still adopt the decision procedure
   This simple formulation does not take into account the sim-                     in Eq. (2), however another value set has to be adopted for
ilarity among instances, except when selecting the instances to                    the hj ’s, namely V = {−1, 0, +1}, where the three values
be included in the neighborhood. Therefore a modified setting                      denote, respectively, positive occurrence, absence and negative
is generally adopted, weighting the vote on the grounds of the                     occurrence (positive for the concept negation). Formally:
similarity of the instance:                                                                        
                                                                                                    +1         Cj (x) ∈ A
                                       k
                                       X                                                  hj (x) =       0      Cj (x) 6∈ A ∧ ¬Cj (x) 6∈ A
                 ĥ(xq ) := argmax            wi δ(v, h(xi ))               (1)                    
                                                                                                       −1       ¬Cj (x) ∈ A
                                 v∈V    i=1
                                                                                      Again, a more complex procedure may be devised by simply
where, usually, wi = 1/d(xi , xq ) or wi = 1/d(xi , xq )2 .                        substituting the notion of occurrence (absence) of assertions in
   Usually this method is employed to classify vectors of                          (from) the ABox with one based on logic entailment (denoted
features in some n-dimensional instance space (e.g. often                          with `) from the knowledge base, i.e. K ` Cj (x), K 6` Cj (x)
IS = Rn ). Let us now turn to adapt the method to the more                         nor K 6` ¬Cj (x) and K ` ¬Cj (x), respectively. Although this
complex context of DLs descriptions. Preliminarily, it should                      may help reaching the precision of deductive reasoning, it is
be observed that a strong assumption of this setting is that it                    also much more computationally expensive, since the simple
can be employed to assign a value (e.g. a class) to a query                        lookup in the ABox must be replaced with instance checking.
instance among a set of values which can be regarded as a set
of pairwise disjoint concepts/classes. This is an assumption                          From a computational viewpoint this procedure could be
that cannot be always valid. In this case, indeed, an individual                   implemented to provide an answer even more efficiently than
could be an instance of more than one concept.                                     with a standard deductive reasoner. Indeed, once the re-
   Let us consider a new value set V = {C1 , . . . , Cs } of                       trieval of the primitive concepts is computed, the dissimilarity
concepts Cj that may be assigned to a query instance xq .                          measures can be easily computed by means of a dynamic
If they were to be considered as disjoint (like in the standard                    programming algorithm. Besides, the various measures could
machine learning setting), the decision procedure would adopt                      be maintained in an ad hoc data structure which would allow
the hypothesis function defined in Eq. (1), with the query                         for an efficient retrieval of the nearest neighbors, such as the
instance assigned the single concept voted by the weighted                         kD-trees or ball trees [23]
majority of instances in the neighborhood.
                                                                                                         V. E XPERIMENTS
   In the general case considered in this paper, when the
disjointness of the classes cannot be assumed (unless explicitly                   A. Experimental Setting
stated in the TBox), one can adopt another answering proce-                           In order to assess the validity of the presented method
dure, decomposing the multi-class problem into smaller binary                      with the dissimilarity measures proposed in Sect. III, we have
classification problems (one per concept). Therefore, a simple                     applied it to the instance classification problem, with four
binary value set (V = {−1, +1}) is to be employed. Then,                           different ontologies represented in OWL: FSM, S URFACE -
for each single concept (say Cj ), a hypothesis ĥj is computed                    WATER -M ODEL from the Protégé library4 , the F INANCIAL
on the fly during the classification phase:                                        ontology5 employed as a testbed for the P ELLET reasoner and
                           k                                                       a small FAMILY ontology written in our lab. Although they
                           X δ(v, hj (xi ))                                        are represented in languages that are different from ALC, we
   ĥj (xq ) := argmax                              ∀j ∈ {1, . . . , s}     (2)
                   v∈V     i=1
                                  d(xq , xi )2                                     simply discarded these details when constructing the MSC
                                                                                   approximations to be able to apply the presented measures.
where each function hj , simply indicates the occurrence (+1)
                                                                                      FAMILY is an ALCF ontology describing kinship rela-
or absence (−1) of the corresponding assertion in the ABox for
                                                                                   tionships. It is made up of 14 concepts (both primitive
the k training instances xi : Cj (xi ) ∈ A. Alternately3 , hj may
                                                                                   and defined), some of them are declared to be disjoint, 5
return +1 when Cj (xi ) is logically entailed by the knowledge
                                                                                   object properties, 39 distinct individual names. Most of the
base K, and −1 otherwise.
                                                                                   individuals are asserted to be instances of more than one
   The problem with non-explicitly disjoint concepts is also
                                                                                   concept, and are involved in more than one role assertions.
related to the Closed World Assumption usually made in the
                                                                                   This ontology has been written to have a small yet more
context of Information Retrieval and Machine Learning. That
                                                                                   complex case with respect to the following ones. Indeed,
is the reason for adapting the standard setting to cope both with
                                                                                   while the other ontologies are more regular, i.e. only some
the case of non-disjoint concepts and with the OWA which is
                                                                                   concepts are employed in the assertions (the others are defined
commonly made in the Semantic Web context. To deal with
                                                                                   only intensionally), in the FAMILY ontology every concept
the OWA, the absence of information on whether a certain
instance x belongs to the extension of concept Cj should not                         4 See the webpage:
                                                                                   http://protege.stanford.edu/plugins/owl/owl-library
   3 For the sake of simplicity and efficiency, this case will not be considered     5 See     the     webpage: http://www.cs.put.poznan.pl/
in the following.                                                                  alawrynowicz/financial.owl
                                                                                                    TABLE I
has at least one instance asserted. The same happens for the
                                                                         AVERAGE RESULTS OF THE EXPERIMENTS WITH THE METHOD
assertions on roles; particularly, there are some cases where
                                                                              EMPLOYING THE MEASURE BASED ON OVERLAP.
role assertions constitute a chain from an individual to another
one, by means of other intermediate assertions.                                         Match Commission Omission Induction
   The FSM ontology describes the domain of finite state                                 Rate      Rate      Rate      Rate
                                                                               FAMILY .654±.174 .000±.000 .231±.173 .115±.107
machines using the SOF(D) language. It is made up of                              FSM .974±.044 .026±.044 .000±.000 .000±.000
20 (both primitive and defined) concepts (some of them are                   S.-W.-M. .820±.241 .000±.000 .064±.111 .116±.246
explicitly declared to be disjoint), 10 object properties, 7               F INANCIAL .807±.091 .024±.076 .000±.001 .169±.076
datatype properties, 37 distinct individual names. About half
of the individuals are asserted as instances of a single concept
and are not involved in any role (object property).                     or not) while it was to be classified as an instance of that
   S URFACE -WATER -M ODEL is an ALCOF(D) ontology de-                  concept;
scribing the domain of the surface water and the water quality       • commission error rate: amount of individuals (analog-
models. It is based on the Surface-water Models Information             ically) labeled as instances of a concept, while they
Clearinghouse (SMIC) of the USGS. Namely, it is an ontology             (logically) belong to that concept or vice-versa
of numerical models for surface water flow and water quality         • induction rate: amount of individuals that were found to
simulation. The application domain of these models comprises            belong to a concept or its negation, while this information
lakes, oceans, estuaries etc.. These models are classified based        is not logically derivable from the knowledge base
on their availability, application domain, dimensions, partial     We report the average rates obtained over all the concepts in
differential equation solver, and characteristics types. It is     each ontology and also their standard deviation.
made up of 19 concepts (both primitive and defined) without
any specification about disjointness, 9 object properties, 115     B. Experiments Employing the Overlap Measure
distinct individual names; each of them is an instance of a           By looking at Tab. I reporting the experimental outcomes
single class and only some of them are involved in object          with the dissimilarity measure based on the overlap (see Def.
properties.                                                        2), preliminarily it is important to note that, for every ontology,
   F INANCIAL is an ALCIF ontology that describes the              the commission error was quite low. This means that the
domain of eBanking. It is made up of 60 (both primitive and        classifier did not make critical mistakes i.e. cases when an
defined) concepts (some of them are declared to be disjoint),      individual is deemed as an instance of a concept while it really
17 object properties, and no datatype property. It contains        is an instance of another disjoint concept.
17941 distinct individual names. From the original ABox, we           In particular, by looking at the outcomes related to the
randomly extracted assertions for 652 individuals.                 FAMILY ontology, it can be observed that the match rate
   The classification method was applied to all the individuals    is the lowest while the highest rate of omission errors was
in each ontology; namely, the individuals were checked to          reported. This may be due to two facts: 1) very few individuals
assess if they were instances of the concepts in the ontology      were available w.r.t. the number of concepts7 ; 2) sparse data
through the analogical method. The performance was evalu-          situation: instances are irregularly spread over the concepts,
ated comparing its responses to those returned by a standard       that is they might be instances of different concepts, which
reasoner6 as a baseline.                                           are sometimes disjoint. Hence the MSC approximations that
   Specifically, for each individual in the ontology the MSC       were computed also resulted very different one from another,
is computed and enlisted in the set of training (or test)          which reduces the possibility of significantly matching similar
examples. Each example is classified applying the adapted k-       MSCs. This is a known drawback of the Nearest-Neighbor
NN method   p presented in the previous section. As a value of k   methods. Nevertheless, as mentioned above, it is important to
we chose |Ind(A)|, as advised in the instance-based learning       note that the algorithm did not make any commission error
literature.                                                        and it is able to infer new knowledge (11%).
   The experiment has been repeated twice adopting a leave-           As regards the FSM ontology, we have observed the maxi-
one-out cross validation procedure with both the dissimilarity     mum match rate with respect to the classification given by the
measures defined in Section III.                                   logic reasoner. Moreover, differently from the other ontologies,
   For each concept in the ontology, we measured the follow-       both the omission error rate and induction rate were null. A
ing parameters for the evaluation:                                 very limited percentage of incorrect classification cases was
  • match rate: number of cases of individuals that got            observed. These outcomes were probably due to the fact that
    exactly the same classification by both classifiers with       individuals in this ontology are quite regularly divided by the
    respect to the overall number of individuals;                  assertions on concepts and roles, so computing their MSCs,
  • omission error rate: amount of unlabeled individuals (our
                                                                   these are all very similar to each other and so the amount
    method could not determine whether it was an instance             7 Instance-based methods make an intensive use of the information about
                                                                   the individuals and improve their performance with the increase of the number
  6 We employed P ELLET : http://pellet.owldl.com                  of instances considered.
                            TABLE II
                                                                     it is well known that the NN procedure becomes more and
     AVERAGE RESULTS OF THE EXPERIMENTS WITH THE METHOD
                                                                     more precise as more instances can be considered. The price
    EMPLOYING THE MEASURE BASED ON INFORMATION CONTENT.
                                                                     to be paid was a longer computation time.
                    Match Commission Omission Induction
                     Rate      Rate      Rate      Rate                        VI. C ONCLUSIONS AND F UTURE W ORK
           FAMILY .608±.230 .000±.000 .330±.216 .062±.217               In this work we have coupled with a method for inductive
              FSM .899±.178 .096±.179 .000±.000 .005±.024
         S.-W.-M. .820±.241 .000±.000 .064±.111 .116±.246            inference on ABoxes with two composite concept similar-
       F INANCIAL .807±.091 .024±.076 .000±.001 .169±.046            ity measures. The method is based on the classification in
                                                                     analogy with the majority of neighbor training individuals.
                                                                     It can be naturally exploited for predicting/suggesting missing
of information they convey is very low. A choice of a lower          information about individuals thus enabling a sort of inductive
number k of neighbors could probably help committing those           retrieval. For example, it may be employed a query to the
residual errors.                                                     KB may be issued. Specifically we are targeting also the task
   For the same reasons, also for the S URFACE -WATER -              of service discovery when for semantically annotated web
M ODEL ontology quite a high rate of matching classifications        service. Even more so the outcomes of our method may be
was reported (yet less then with the previous ontology);             decisive for enabling a series of other inductive tasks such as
moreover, some cases of omission error (6%) were observed.           clustering, case-based reasoning, , etc..
The induction rate was about 12% which means that for                   The proposed method is able to induce new assertions, in
this ontology our classifier always assigned individuals to          addition to the knowledge already derivable by means of a
the correct concepts but, in some cases, it could also induce        reasoner. Then it seems to be able to enhance the standard
new assertions. Since this rate represents assertions that were      instance-based classification. Particularly, an increase in accu-
not logically deducible from the ontology and yet they were          racy was observed when the instances increase in number and
inferred inductively by the analogical classifier, these figures     are homogeneously spread.
would be a positive outcome (provided this knowledge were               The presented measure can be refined tweaking the weight-
deemed as correct by an expert). Particularly, in this case the      ing factor λ for decreasing the impact of the dissimilarity
increase of the induction rate has been due to the presence of       between nested sub-concepts in the descriptions on the de-
assertions of mutual disjointness for some of the concepts.          termination of the overall value.
   Results are no different also for the case of the experiments        Lately, we have defined another measure for ALN [24].
with F INANCIAL ontology that largely exceeds the others             Hence a natural extension may concern the definition of
in terms of number of concepts and individuals. Namely,              dissimilarity measures in more expressive DLs languages. For
the observed match rate is again above the 80% and the               example, a normal form for ALCN can be obtained based on
rest of the cases are comprised in the induction rate (17%),         those for ALN and ALCN . Then, by exploiting the notion of
leaving a limited margin to residual errors. This corroborates       existential mappings [25], the measure presented in this paper
a fact about the NN learners, that is their reaching better          may be extended to more expressive DLs. We are currently
and better performance in the limit, as long as new training         developing other kind of semantic similarities based on the
instances become available. Actually, performing a 10-fold           idea of the hypothesis-driven distance.
cross validation we obtained almost the same results.                   Besides, as mentioned, this method could be extended
                                                                     with different (yet still computationally tractable) answering
C. Experiments with the Information Content Measure                  procedures grounded on statistical inference (non-parametric
   The average results obtained by adopting the procedure with       tests based on ranked distances), in order to accept answers
the measure based on information content (see Def. 4) are            as correct with a high degree of confidence. Furthermore,
reported in Table II.                                                the k-NN method in its classical form is particularly suitable
   By analyzing this table it is possible to note that no sensible   for the automated induction of missing values for (scalar or
variation was observed in the classifications performed using        numeric) datatype properties of an individual as an estimate
the first dissimilarity measure. Particularly, with both measures    derived from the values of the datatypes for the surrounding
the method correctly classified all the individuals, without         individuals.
commission errors. The reason is that, in most of the cases,            Kernels are another means to express a notion of similarity
the individuals of these ontologies are instances of one concept     is some unknown feature space. We are working at the
only and they are involved in a few roles (object properties).       definition of kernel functions on DLs representations [26], thus
Some of the figures are slightly lower than those observed in        allowing the exploitation of kernel methods efficiency (e.g. the
the other experiment: this is confirmed by a higher variability.     support vector machines) in a multi-relational setting.
   Surprisingly, the results on the larger ontologies (S.-W.-M.
and F INANCIAL) perfectly match those obtained with the other                            ACKNOWLEDGMENTS
measure which is probably due to the fact that we used a leave-        This work was partially supported by the regional interest
one-out cross-validation mode which yielded a high value for         projects DIPIS (DIstributed Production as Innovative System)
the number k of training instances for the neighborhood and          and DDTA (Distretto Digitale Tessile Abbigliamento) in the
context of the tasks related to the semantic web services                          [22] H. Bock., Analysis of Symbolic Data: Exploratory Methods for Extract-
discovery.                                                                              ing Statistical Information from Complex Data. Springer-Verlag, 1999.
                                                                                   [23] I. H. Witten and E. Frank, Data Mining, 2nd ed. Morgan Kaufmann,
                                                                                        2005.
                               R EFERENCES                                         [24] N. Fanizzi and C. d’Amato, “A similarity measure for the ALN
                                                                                        description logic,” in Proceedings of Convegno Italiano di Logica
 [1] W. Cohen and H. Hirsh, “Learning the CLASSIC description logic,”                   Computazionale, CILC06, Bari, Italy, 2006, http://cilc2006.di.uniba.it/
     in Proceedings of the 4th International Conference on the Principles               download/camera/15 Fanizzi CILC06.pdf.
     of Knowledge Representation and Reasoning, P. Torasso, J. Doyle, and          [25] R. Kusters and R. Molitor, “Computing least common subsumers in
     E. Sandewall, Eds. Morgan Kaufmann, 1994, pp. 121–133.                             ALEN ,” in Proceedings of the International Joint Conference on
 [2] J.-U. Kietz and K. Morik, “A polynomial approach to the constructive               Artificial Intelligence, IJCAI2001, B. Nebel, Ed., 2001, pp. 219–224.
     induction of structural knowledge,” Machine Learning, vol. 14, no. 2,         [26] N. Fanizzi and C. d’Amato, “A declarative kernel for ALC concept
     pp. 193–218, 1994.                                                                 descriptions.” in Proceedings of the 16th International Symposium on
 [3] S. Staab, A. Maedche, C. Nedellec, and P. Wiemer-Hastings, Eds.,                   Methodologies for Intelligent Systems, ISMIS 2006, ser. LNAI, F. Es-
     ECAI2000 Workshop on Ontology Learning, ser. CEUR WS Proceed-                      posito, Z. Ras, D. Malerba, and G. Semeraro, Eds., vol. 4203. Springer,
     ings, 2000, vol. 31.                                                               2006, pp. 322–331.
 [4] S. Staab, A. Maedche, C. Nedellec, and E. Hovy, Eds., IJCAI2001
     Workshop on Ontology Learning, 2001.
 [5] F. Esposito, N. Fanizzi, L. Iannone, I. Palmisano, and G. Semeraro.,
     “Knowledge-intensive induction of terminologies from metadata,” in
     ISWC2004, Proceedings of the 3rd International Semantic Web Con-
     ference, ser. LNCS, F. van Harmelen, S. McIlraith, and D. Plexousakis,
     Eds., vol. 3298. Springer, 2004, pp. 441–455.
 [6] M. d’Aquin, J. Lieber, and A. Napoli, “Decentralized case-based rea-
     soning for the Semantic Web,” in Proceedings of the 4th International
     Semantic Web Conference, ISWC2005, ser. LNCS, Y. Gil, V. Motta,
     E. Benjamins, and M. A. Musen, Eds., no. 3279. Springer, 2005, pp.
     142–155.
 [7] P. Hitzler and D. Vrandec̆ić, “Resolution-based approximate reasoning
     for OWL DL,” in Proceedings of the 4th International Semantic Web
     Conference, ISWC2005, ser. LNCS, Y. Gil, V. Motta, E. Benjamins, and
     M. A. Musen, Eds., no. 3279. Springer, 2005, pp. 383–397.
 [8] T. Mitchell, Machine Learning. McGraw-Hill, 1997.
 [9] M. Rodrı́guez, “Assessing semantic similarity between spatial entity
     classes,” Ph.D. dissertation, University of Maine, 1997.
[10] A. Borgida, T. Walsh, and H. Hirsh, “Towards measuring similarity in
     description logics,” in Working Notes of the International Description
     Logics Workshop, ser. CEUR Workshop Proceedings, Edinburgh, UK,
     2005.
[11] M. W. Bright, A. R. Hurson, and S. H. Pakzad, “Automated resolution
     of semantic heterogeneity in multidatabases.” ACM Transaction on
     Database Systems, vol. 19, no. 2, pp. 212–253, 1994.
[12] P. Resnik, “Semantic similarity in a taxonomy: An information-based
     measure and its application to problems of ambiguity in natural lan-
     guage,” Journal of Artificial Intelligence Research, vol. 11, pp. 95–130,
     1999.
[13] C. d’Amato, N. Fanizzi, and F. Esposito, “A dissimilarity measure for
     the ALC description logic,” in Semantic Web Applications and Perspec-
     tives, 2nd Italian Semantic Web Workshop SWAP2005, P. Bouquet and
     G. Tummarello, Eds., vol. 166. Trento, Italy: CEUR, 2005.
[14] ——, “A dissimilarity measure for ALC concept descriptions,” in
     Proceedings of the 21st Annual ACM Symposium of Applied Computing,
     SAC2006, H. Haddad, Ed. Dijon, France: ACM, 2006, pp. 1695–1699.
[15] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-
     Schneider, Eds., The Description Logic Handbook. Cambridge Uni-
     versity Press, 2003.
[16] F. Bacchus, “Lp, a logic for representing and reasoning with statistical
     knowledge,” Computational Intelligence, vol. 6, pp. 209–231, 1990.
[17] M. Schmidt-Schauß and G. Smolka, “Attributive concept descriptions
     with complements,” Artificial Intelligence, vol. 48, no. 1, pp. 1–26, 1991.
[18] F. M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf, “Deduction in
     concept languages: From subsumption to instance checking,” Journal
     of Logic and Computation, vol. 4, no. 4, pp. 423–452, 1994. [Online].
     Available: citeseer.ist.psu.edu/donini94deduction.html
[19] S. Brandt, R. Küsters, and A.-Y. Turhan, “Approximation and difference
     in description logics,” in Proceedings of the International Conference on
     Knowledge Representation, D. Fensel, F. Giunchiglia, D. McGuinness,
     and M.-A. Williams, Eds. Morgan Kaufmann, 2002, pp. 203–214.
[20] A. Tversky, “Features od similarity,” Psycological Review, vol. 84, no. 4,
     pp. 327–352, 1997.
[21] C. d’Amato, N. Fanizzi, and F. Esposito, “A semantic similarity measure
     for expressive description logics,” in Proceedings of Convegno Italiano
     di Logica Computazionale, CILC05, A. Pettorossi, Ed., Rome, Italy,
     2005.