Reasoning by Analogy in Description Logics through Instance-based Learning Claudia d’Amato, Nicola Fanizzi, Floriana Esposito Dipartimento di Informatica, Università degli Studi di Bari Campus Universitario, Via Orabona 4, 70125 Bari, Italy {claudia.damato | fanizzi | esposito}@di.uniba.it Abstract— This work presents a method founded in instance- It has been observed that, adopting richer representations, based learning for inductive (memory-based) reasoning on the structural properties have less and less impact in assessing ABoxes. The method, which exploits a semantic dissimilarity semantic similarity. Hence, the vision of similarity based only measure between concepts and instances, can be employed both to answer class membership queries and to predict new assertions on a structural (graph-based) approach, such as in [11], [12] that may be not logically entailed by the knowledge base. In a may fall short. We have proposed some dissimilarity measures preliminary experimentation, we show that the method is sound for non trivial DL languages, based on the semantics conveyed and it is actually able to induce new assertions that might be by the ABox assertions, which are suitable for being used acquired in the knowledge base. in instance-based methods [13], [14]. These measures elicit the underlying semantics by querying the knowledge base I. I NTRODUCTION for assessing the concept extensions, estimated through their Most of the research on ontology reasoning has been retrieval [15], as also hinted in [16]. Besides, the overall focusing on methods based on deductive reasoning. However, similarity is also (partially) influenced by the concepts which important tasks that are likely to be provided by new genera- are related through role restrictions. Moreover, in many other tion knowledge-based systems, such as construction, revision, typical tasks (e.g. conceptual clustering or definition), it is population and evolution are likely to be supported also by necessary to assess the similarity between concepts (resp. inductive methods. This has brought to an increasing interest individuals). By recurring to the notion of most specific in machine learning and knowledge discovery methods applied concept of an individual with respect to an ABox [15], as to ontological representations (see [1], [2] and, more recently, representatives of the individuals at the concept level, the [3], [4], [5], [6]). measures for concepts can be extended to such cases. We propose an algorithm which is based on a notion of This analogical reasoning procedure like this may be em- concept similarity for performing a form of lazy learning on ployed to answering class membership queries through analog- typical ontological representations. Namely, by combining an ical rather than deductive reasoning which may be more robust instance-based (analogical) approach with a notion of semantic with respect to noise and is likely to suggest new knowledge dissimilarity, this paper intends to demonstrate the applicabil- (which was not logically derivable). Specifically we developed ity of inductive reasoning in this field which can be considered the method also for an application of semantic web service another form of approximate reasoning (see discussion in discovery where services are annotated in DLs. [7]). In particular, we have adapted the general instance- Another application might regard supporting various tasks based learning approach like the k-Nearest Neighbor [8] to for the knowledge engineer, such as the acquisition of can- the specific multi-relational setting for ontology languages. didate assertions for enriching ontologies with partially pop- A couple of technical problems had to be solved for this ulated ABoxes: the outcomes given by the procedure can be adaptation to ontology representations: 1) the Open World utilized as recommendations. Indeed, as we show in the ex- Assumption (OWA) that is made in this context; 2) in this perimentation, the newly induced assertions are quite accurate multi-class problem setting disjunction cannot be assumed by (commission errors, i.e. predicting a concept erroneously, were default. rarely observed). In turn, the outcomes of the procedure may The standard ontology languages are founded in Description also trigger other related tasks such as induction (revision) of Logics (henceforth DLs) as they borrow the language con- (faulty) knowledge bases. structors for expressing complex concept definitions. Instance- The paper is organized as follows. In the next section, based methods depend on a similarity measure defined on the representation language is briefly presented. Two concept the instance space. In this perspective, a variety of measures dissimilarity measures are recalled and exploited in a modified for concept representations have been proposed (see [9] for version of the k-NN classification procedure. The results of a a survey). As pointed out in [10], most of these measures preliminary experimental evaluation of the method using these focus on the similarity of atomic concepts within hierarchies or two measures are shown and, finally, possible developments simple ontologies, based on a few relations. Thus, it becomes of the method are examined. necessary to investigate similarity in more complex languages. II. ALC K NOWLEDGE BASES • prim(Di ) is the set of all (negated) primitive concepts occurring at the top level of Di ; We recall the basics of ALC [17], a logic which adopts constructors supported by the standard ontology languages • valR (Di ) is the conjunction C1i u · · · u Cni in the value (see the DL handbook [15] for a thorough reference). restriction of role R, if any (otherwise valR (Di ) = >); In DLs, descriptions are inductively defined starting with a set NC of primitive concept names and a set NR of primitive • exR (Di ) is the set of concepts in the existential restric- roles. Complex descriptions are built using primitive concepts tions of the role R. and roles and the constructors showed in the following. The For any R, every sub-description in exR (Di ) and valR (Di ) is semantics of the descriptions is defined by an interpretation in normal form. I = (∆I , ·I ), where ∆I is a non-empty set, the domain of In the following, let L = ALC/≡ be the quotient set of the interpretation, and ·I is the interpretation function that ALC descriptions in normal form. maps each A ∈ NC to a set AI ⊆ ∆I and each R ∈ NR to R I ⊆ ∆I × ∆I . III. D ISSIMILARITY M EASURES IN D ESCRIPTION L OGICS The top concept > is interpreted as the whole domain of We recall two definition of dissimilarity measures for ALC objects ∆I , while the bottom concept ⊥ corresponds to ∅. descriptions expressed in normal form [13], [14]. These mea- Complex descriptions can be built in ALC using the following sures are based on both the structure and the semantics of the constructors. The language supports full negation: given any concept descriptions. description C, denoted ¬C, it amounts to ∆I \ C I . Concept conjunction, denoted C1 u C2 , yields the extension C1I ∩ C2I ; A. Overlap Dissimilarity Measure dually, concept disjunction, denoted C1 t C2 , yields the union The first measure, is derived from a measure of the overlap C1I ∪ C2I . Finally, the existential restriction, denoted ∃R.C, is between concepts, that can be defined as follows: interpreted as {x ∈ ∆I | ∃y ∈ ∆I ((x, y) ∈ RI ∧ y ∈ C I )} Definition 1 (overlap function): Let I be the canonical in- and the value restriction ∀R.C has as its extension the set terpretation of the ABox A. The overlap function f is defined1 : {x ∈ ∆I | ∀y ∈ ∆I ((x, y) ∈ RI → y ∈ C I )}. f : LF× L 7→ R+ defined n Fm for descriptions C, D ∈ L, with The main inference is subsumption between concepts based C = i=1 Ci and D = j=1 Dj on their semantics: given two descriptions C and D, C disjunctive level: subsumes D, denoted by C w D, iff for every interpretation  I it holds that C I ⊇ DI . When C w D and D w C then  ∞ C≡D they are equivalent, denoted with C ≡ D. f (C, D) := ft (C, D) = 0 C uD ≡⊥ maxi,j fu (Ci , Dj ) otherwise  A knowledge base K = hT , Ai contains a TBox T and an ABox A: conjunctive level: • T is the set of definitions C ≡ D, meaning that, for every fu (Ci , Dj ) := fP (prim(Ci ), prim(Dj ))+ interpretation I, C I = DI , where C is the concept name +λ(f∀ (Ci , Dj ) + f∃ (Ci , Dj )) and D is its description constructed as above; • A contains concept and role assertions about the world- with λ ∈ [0, 1]. state, e.g. C(a) and R(a, b), meaning that, for every I, primitive concepts: aI ∈ C I and (aI , bI ) ∈ RI . |R(P1 ) ∪ R(P2 )| fP (P1 , P2 ) := A related inference used in the following is instance check- |(R(P1 ) ∪ R(P2 )) \ (R(P1 ) ∩ R(P2 ))| ing, that is deciding whether an individual is an instance of a T I where R(P ) = A∈P A and fP (P1 , P2 ) = ∞ when concept [18], [15]. Conversely, it may be necessary to find the R(P1 ) = R(P2 ). concepts which an individual belongs to (realization problem), value restrictions: especially the most specific one (see Def. 5). X Semantically equivalent (yet syntactically different) descrip- f∀ (Ci , Dj ) := ft (valR (Ci ), valR (Dj )) tions can be given for the same concept. Nevertheless, equiv- R∈NR alent concepts can be reduced to a normal form by means of existential restrictions: rewriting rules that preserve their equivalence [19]: N A description D is in ALC normal form iff D ≡ ⊥ (then X X f∃ (Ci , Dj ) := max ft (Cik , Djp ) D := ⊥) or if D ≡ > (then D := >) or if D = D1 t · · · t Dn p=1,...,M R∈NR k=1 (∀i = 1, . . . , n, Di 6≡ ⊥) with   where Cik ∈ exR (Ci ) and Djp ∈ exR (Dj ) and we suppose l l l w.l.o.g. that N = |exR (Ci )| ≥ |exR (Dj )| = M , otherwise the Di = Au ∀R.valR (Di ) u ∃R.E  indices N and M as well as Ci and Dj are to be exchanged A∈prim(Di ) R∈NR E∈exR (Di ) in the formula above. where: 1 The name A of the ABox is omitted for simplicity. The function f represents a measure of the overlap between dissimilarity is inversely proportional to their overlap, since two descriptions expressed in ALC normal form. It measures in this case f (C, D) > 1 and consequently 0 < d(C, D) < 1. the similarity of the input concepts based on the similarity B. A Dissimilarity Measure Based on Information Content between their extensions (approximated with the retrieval) and also on the similarity of the concepts reached by the role As discussed in [12], a measure of concept (dis)similarity restrictions. Namely, It is defined recursively beginning from can be derived from the notion of Information Content (IC) the top level of the descriptions (a disjunctive level) up to that, in turn, depends on the probability of an individual to the bottom level represented by (conjunctions of) primitive belong to a certain concept. Now, differently from other works, concepts. which assume that a probability distribution for the concepts Overlap at the disjunctive level is treated as the maximum in an ontology is known, we derive it from the knowledge overlap between the disjunctive forms of the input concepts. At base, from the distribution that can be estimated therein [16], conjunctive levels, instead of simply considering the similarity [10]. like in Tversky’s measure [20] (as we do for prim’s), that In order to approximate this probability for a certain concept could turn out to be null in case of disjoint retrieval sets for C, we recur to its extension w.r.t. the considered ABox in a the input concepts, we distinguish the overlaps between the fixed interpretation. Namely, we chose the canonical interpre- prims and those between the concepts in the scope of the tation IA , which is the one adopting the set of individuals various role restrictions2 ; the contribution of the overlap of mentioned in the ABox as its domain and the identity as concepts reached through role restrictions can be penalized by its interpretation function [15]. Now, given a concept C its tweaking the parameter λ. The measure for primitive concepts probability is estimated by: resembles Tversky’s measure and it represents a semantic pr(C) = |C IA |/|∆IA | baseline since it depends on the semantics of the knowledge base, as conveyed by the ABox assertions. This is in line with Finally, we can compute the information content of a concept, to the ideas in [16], [10], where semantics is elicited as a employing this probability: probability distribution over the domain of the interpretation. IC(C) = − log pr(C) As for the role restrictions overlap, for the universal restric- tions we simply add the overlap measures varying the role, A measure of the concept dissimilarity is now formally while for the existential restrictions the measure is trickier: defined [14]: borrowing the idea of the existential mappings we consider, per Definition 3 (IC dissimilarity): Let A be an ABox with role, all possible matches between the concepts in the scope canonical interpretation I. The information content dissimilar- of existential restrictions, then we consider the maximal sum ity measure is a function g : L × L 7→ R+ defined recursively of overlaps resulting from all the matches. for any descriptions C, D ∈ L, with Fntwo normal formFconcept m Now, it is possible to derive a dissimilarity measure based C = i=1 Ci and D = j=1 Dj on f as follows: disjunctive level:  Definition 2 (overlap dissimilarity measure): The overlap   0 if C ≡ D dissimilarity measure is a function d : L × L 7→ [0,F1] such ∞ if C u D = ⊥  n g(C, D) := gt (C, D) = that, givenFtwo concept descriptions C, D ∈ L, C = i=1 Ci min gu (Ci , Dj ) otherwise  m  1 ≤ i ≤ n and D = j=1 Dj : 1 ≤ j ≤ m conjunctive level:   1 if f (C, D) = 0 d(C, D) := 0 if f (C, D) = ∞ gu (Ci , Dj ) := gP (prim(Ci ), prim(Dj ))+ 1  f (C,D) otherwise +λ(g∀ (Ci , Dj ) + g∃ (Ci , Dj )) Function d simply measures the level of dissimilarity be- tween two concepts as the inverse of the overlap function with λ ∈ [0, 1]. f . Particularly, if f (C, D) = 0, i.e. there is no overlap primitive concepts: between the considered concepts, then d must indicate that   ∞ if Pi u Pj ≡ ⊥ the two concepts are totally different, indeed d(C, D) = 1, gP (Pi , Pj ) := the maximum value of its range. If f (C, D) = ∞ this means  IC(Pi uPj )+1 otherwise IC(LCS(Pi ,Pj ))+1 that the two concepts are totally overlapped and consequently d(C, D) = 0 that means that the two concept are indistin- value restrictions: guishable, indeed d assumes the minimum value of its range. X g∀ (Ci , Dj ) := gt (valR (Ci ), valR (Dj )) If the considered concepts have a partial overlap then their R∈NR 2 We tried also other solutions, such as considering minima or products of existential restrictions: the three overlap measures, yet experimentally this did not yield a better N performance whereas the computation time was increased. For practical X X reasons, the maximal measure ∞ has been replaced with large numbers g∃ (Ci , Dj ) := min gt (Cik , Djp ) p=1,...,M depending on the cardinality of the set of individuals in the ABox: |Ind(A)|. R∈NR k=1 where Cik ∈ exR (Ci ) and Djp ∈ exR (Dj ) and we suppose D. Discussion w.l.o.g. that N = |exR (Ci )| ≥ |exR (Dj )| = M , otherwise the We proved in [13], [14] that these measures are really indices N and M are to be exchanged in the formula above. dissimilarity measures according to the formal definition [22], The function g represents a measure of the dissimilarity considering that their input (what is actually compared) is between two descriptions expressed in ALC normal form. equivalence classes in L, i.e. ALC concept descriptions with It is defined recursively beginning from the top level of the same normal form. the descriptions (a disjunctive level) up to the bottom level As previously mentioned, we have also attempted slightly represented by (conjunctions of) primitive concepts. different measure definitions (e.g. considering minima or prod- Now g has values in [0, ∞]. It may be useful to derive a ucts at the conjunctive level that sound more intuitive) which normalized dissimilarity measure as shown in the following. experimentally did not prove more effective than the simple Definition 4 (normalized IC dissimilarity): Let A be an (additive) definition given above. ABox with canonical interpretation I. The normalized in- The computational complexity of the measures depends formation content dissimilarity measure is the function d : on the complexity of the required reasoning services for L × L 7→ [0, 1], such that Fn given the conceptFmdescriptions in computing the concepts retrieval. Namely both subsumption ALC normal form C = i=1 Ci and D = j=1 Dj , let and instance-checking are P-space for the ALC logic[18], yet such inference can be computed once and preliminarily, before   0 if g(C, D) = 0 d(C, D) := 1 if g(C, D) = ∞ the measures are computed for the method we will present in  1− 1 otherwise the following. g(C,D) Obviously when computing the dissimilarity measures for C. Measuring the Dissimilarity between Individuals cases involving individuals, the extra cost of computing MSCs The notion of Most Specific Concept is commonly exploited (or their approximations) has to be added. for lifting individuals to the concept level. Nevertheless, in practical applications, these computations Definition 5 (most specific concept): Given an ABox A may be efficiently carried out exploiting the statistics that and an individual a, the most specific concept of a w.r.t. A is are maintained by the DBMSs query optimizers. Besides, the the concept C, denoted MSCA (a), such that A |= C(a) and counts that are necessary for computing the concept extensions for any other concept D such that A |= D(a), it holds that could be estimated by means of the probability distribution C v D. over the domain. In case of cyclic ABoxes expressed in a DL with existen- tial restrictions the MSC may not be expressed by a finite IV. A N EAREST N EIGHBOR C LASSIFICATION P ROCEDURE description [15], yet it may be often approximated. IN D ESCRIPTION L OGICS On performing experiments related to another similarity We briefly review the basics of the k-Nearest Neighbor measure exclusively based on concept extensions [21], we method (k-NN) and propose how to exploit the classification noticed that, recurring to the MSC for lifting individuals to procedure for inductive reasoning. In this lazy approach to the concept level, just falls short: indeed the MSCs may be learning, a notion of distance (or dissimilarity) measure for too specific and unable to include other (similar) individuals the instance space is employed to classify a new instance. in their extensions. By comparing descriptions reduced to the Let xq be the instance that requires a classification. Using normal form we have given a more structural definition of a dissimilarity measure, the set of k nearest pre-classified dissimilarity. However, since MSCs are computed from the instances is selected. The objective is to learn a discrete-valued same ABox assertions, reflecting the current knowledge state, target function h : IS 7→ V from a space of instances IS to this guarantees that structurally similar representations will be a set of values V = {v1 , . . . , vs }. In its simplest setting, the obtained for semantically similar concepts. k-NN algorithm approximates h for new instances xq on the Let us recall that, given the ABox, it is possible to calculate ground of the value that h assumes in the neighborhood of the most specific concept of an individual a w.r.t. the ABox, xq , i.e. the k closest instances to the new instance in terms MSC(a) or at least its approximation MSCk (a) up to a of a dissimilarity measure. Precisely, it is assigned according certain description depth k. In some cases these are equivalent to the value which is voted by the majority of instances in concepts but in general we have that MSCk (a) w MSC(a). the neighborhood. The the classification function ĥ can be Given two individuals a and b in the ABox, we consider formally defined as follows: MSCk (a) and MSCk (b) (supposed in normal form). Now, in order to assess the dissimilarity between the individuals, d k X measure can be applied to these descriptions: ĥ(xq ) := argmax δ(v, h(xi )) v∈V i=1 d(a, b) := d(MSCk (a), MSCk (b)) where δ is a function that returns 1 in case of matching argu- This may turn out to be handy in several tasks, namely both ments and 0 otherwise. Note that the hypothesized function ĥ in inductive reasoning (construction, repairing of knowledge is defined only extensionally, that is the k-NN method does not bases) and in information retrieval. return an intensional classification model (e.g. a function or a concept definition), it merely gives an answer for new query be interpreted negatively; rather, it should count as neutral instances to be classified, employing the procedure above. information. Thus, one can still adopt the decision procedure This simple formulation does not take into account the sim- in Eq. (2), however another value set has to be adopted for ilarity among instances, except when selecting the instances to the hj ’s, namely V = {−1, 0, +1}, where the three values be included in the neighborhood. Therefore a modified setting denote, respectively, positive occurrence, absence and negative is generally adopted, weighting the vote on the grounds of the occurrence (positive for the concept negation). Formally: similarity of the instance:   +1 Cj (x) ∈ A k X hj (x) = 0 Cj (x) 6∈ A ∧ ¬Cj (x) 6∈ A ĥ(xq ) := argmax wi δ(v, h(xi )) (1)  −1 ¬Cj (x) ∈ A v∈V i=1 Again, a more complex procedure may be devised by simply where, usually, wi = 1/d(xi , xq ) or wi = 1/d(xi , xq )2 . substituting the notion of occurrence (absence) of assertions in Usually this method is employed to classify vectors of (from) the ABox with one based on logic entailment (denoted features in some n-dimensional instance space (e.g. often with `) from the knowledge base, i.e. K ` Cj (x), K 6` Cj (x) IS = Rn ). Let us now turn to adapt the method to the more nor K 6` ¬Cj (x) and K ` ¬Cj (x), respectively. Although this complex context of DLs descriptions. Preliminarily, it should may help reaching the precision of deductive reasoning, it is be observed that a strong assumption of this setting is that it also much more computationally expensive, since the simple can be employed to assign a value (e.g. a class) to a query lookup in the ABox must be replaced with instance checking. instance among a set of values which can be regarded as a set of pairwise disjoint concepts/classes. This is an assumption From a computational viewpoint this procedure could be that cannot be always valid. In this case, indeed, an individual implemented to provide an answer even more efficiently than could be an instance of more than one concept. with a standard deductive reasoner. Indeed, once the re- Let us consider a new value set V = {C1 , . . . , Cs } of trieval of the primitive concepts is computed, the dissimilarity concepts Cj that may be assigned to a query instance xq . measures can be easily computed by means of a dynamic If they were to be considered as disjoint (like in the standard programming algorithm. Besides, the various measures could machine learning setting), the decision procedure would adopt be maintained in an ad hoc data structure which would allow the hypothesis function defined in Eq. (1), with the query for an efficient retrieval of the nearest neighbors, such as the instance assigned the single concept voted by the weighted kD-trees or ball trees [23] majority of instances in the neighborhood. V. E XPERIMENTS In the general case considered in this paper, when the disjointness of the classes cannot be assumed (unless explicitly A. Experimental Setting stated in the TBox), one can adopt another answering proce- In order to assess the validity of the presented method dure, decomposing the multi-class problem into smaller binary with the dissimilarity measures proposed in Sect. III, we have classification problems (one per concept). Therefore, a simple applied it to the instance classification problem, with four binary value set (V = {−1, +1}) is to be employed. Then, different ontologies represented in OWL: FSM, S URFACE - for each single concept (say Cj ), a hypothesis ĥj is computed WATER -M ODEL from the Protégé library4 , the F INANCIAL on the fly during the classification phase: ontology5 employed as a testbed for the P ELLET reasoner and k a small FAMILY ontology written in our lab. Although they X δ(v, hj (xi )) are represented in languages that are different from ALC, we ĥj (xq ) := argmax ∀j ∈ {1, . . . , s} (2) v∈V i=1 d(xq , xi )2 simply discarded these details when constructing the MSC approximations to be able to apply the presented measures. where each function hj , simply indicates the occurrence (+1) FAMILY is an ALCF ontology describing kinship rela- or absence (−1) of the corresponding assertion in the ABox for tionships. It is made up of 14 concepts (both primitive the k training instances xi : Cj (xi ) ∈ A. Alternately3 , hj may and defined), some of them are declared to be disjoint, 5 return +1 when Cj (xi ) is logically entailed by the knowledge object properties, 39 distinct individual names. Most of the base K, and −1 otherwise. individuals are asserted to be instances of more than one The problem with non-explicitly disjoint concepts is also concept, and are involved in more than one role assertions. related to the Closed World Assumption usually made in the This ontology has been written to have a small yet more context of Information Retrieval and Machine Learning. That complex case with respect to the following ones. Indeed, is the reason for adapting the standard setting to cope both with while the other ontologies are more regular, i.e. only some the case of non-disjoint concepts and with the OWA which is concepts are employed in the assertions (the others are defined commonly made in the Semantic Web context. To deal with only intensionally), in the FAMILY ontology every concept the OWA, the absence of information on whether a certain instance x belongs to the extension of concept Cj should not 4 See the webpage: http://protege.stanford.edu/plugins/owl/owl-library 3 For the sake of simplicity and efficiency, this case will not be considered 5 See the webpage: http://www.cs.put.poznan.pl/ in the following. alawrynowicz/financial.owl TABLE I has at least one instance asserted. The same happens for the AVERAGE RESULTS OF THE EXPERIMENTS WITH THE METHOD assertions on roles; particularly, there are some cases where EMPLOYING THE MEASURE BASED ON OVERLAP. role assertions constitute a chain from an individual to another one, by means of other intermediate assertions. Match Commission Omission Induction The FSM ontology describes the domain of finite state Rate Rate Rate Rate FAMILY .654±.174 .000±.000 .231±.173 .115±.107 machines using the SOF(D) language. It is made up of FSM .974±.044 .026±.044 .000±.000 .000±.000 20 (both primitive and defined) concepts (some of them are S.-W.-M. .820±.241 .000±.000 .064±.111 .116±.246 explicitly declared to be disjoint), 10 object properties, 7 F INANCIAL .807±.091 .024±.076 .000±.001 .169±.076 datatype properties, 37 distinct individual names. About half of the individuals are asserted as instances of a single concept and are not involved in any role (object property). or not) while it was to be classified as an instance of that S URFACE -WATER -M ODEL is an ALCOF(D) ontology de- concept; scribing the domain of the surface water and the water quality • commission error rate: amount of individuals (analog- models. It is based on the Surface-water Models Information ically) labeled as instances of a concept, while they Clearinghouse (SMIC) of the USGS. Namely, it is an ontology (logically) belong to that concept or vice-versa of numerical models for surface water flow and water quality • induction rate: amount of individuals that were found to simulation. The application domain of these models comprises belong to a concept or its negation, while this information lakes, oceans, estuaries etc.. These models are classified based is not logically derivable from the knowledge base on their availability, application domain, dimensions, partial We report the average rates obtained over all the concepts in differential equation solver, and characteristics types. It is each ontology and also their standard deviation. made up of 19 concepts (both primitive and defined) without any specification about disjointness, 9 object properties, 115 B. Experiments Employing the Overlap Measure distinct individual names; each of them is an instance of a By looking at Tab. I reporting the experimental outcomes single class and only some of them are involved in object with the dissimilarity measure based on the overlap (see Def. properties. 2), preliminarily it is important to note that, for every ontology, F INANCIAL is an ALCIF ontology that describes the the commission error was quite low. This means that the domain of eBanking. It is made up of 60 (both primitive and classifier did not make critical mistakes i.e. cases when an defined) concepts (some of them are declared to be disjoint), individual is deemed as an instance of a concept while it really 17 object properties, and no datatype property. It contains is an instance of another disjoint concept. 17941 distinct individual names. From the original ABox, we In particular, by looking at the outcomes related to the randomly extracted assertions for 652 individuals. FAMILY ontology, it can be observed that the match rate The classification method was applied to all the individuals is the lowest while the highest rate of omission errors was in each ontology; namely, the individuals were checked to reported. This may be due to two facts: 1) very few individuals assess if they were instances of the concepts in the ontology were available w.r.t. the number of concepts7 ; 2) sparse data through the analogical method. The performance was evalu- situation: instances are irregularly spread over the concepts, ated comparing its responses to those returned by a standard that is they might be instances of different concepts, which reasoner6 as a baseline. are sometimes disjoint. Hence the MSC approximations that Specifically, for each individual in the ontology the MSC were computed also resulted very different one from another, is computed and enlisted in the set of training (or test) which reduces the possibility of significantly matching similar examples. Each example is classified applying the adapted k- MSCs. This is a known drawback of the Nearest-Neighbor NN method p presented in the previous section. As a value of k methods. Nevertheless, as mentioned above, it is important to we chose |Ind(A)|, as advised in the instance-based learning note that the algorithm did not make any commission error literature. and it is able to infer new knowledge (11%). The experiment has been repeated twice adopting a leave- As regards the FSM ontology, we have observed the maxi- one-out cross validation procedure with both the dissimilarity mum match rate with respect to the classification given by the measures defined in Section III. logic reasoner. Moreover, differently from the other ontologies, For each concept in the ontology, we measured the follow- both the omission error rate and induction rate were null. A ing parameters for the evaluation: very limited percentage of incorrect classification cases was • match rate: number of cases of individuals that got observed. These outcomes were probably due to the fact that exactly the same classification by both classifiers with individuals in this ontology are quite regularly divided by the respect to the overall number of individuals; assertions on concepts and roles, so computing their MSCs, • omission error rate: amount of unlabeled individuals (our these are all very similar to each other and so the amount method could not determine whether it was an instance 7 Instance-based methods make an intensive use of the information about the individuals and improve their performance with the increase of the number 6 We employed P ELLET : http://pellet.owldl.com of instances considered. TABLE II it is well known that the NN procedure becomes more and AVERAGE RESULTS OF THE EXPERIMENTS WITH THE METHOD more precise as more instances can be considered. The price EMPLOYING THE MEASURE BASED ON INFORMATION CONTENT. to be paid was a longer computation time. Match Commission Omission Induction Rate Rate Rate Rate VI. C ONCLUSIONS AND F UTURE W ORK FAMILY .608±.230 .000±.000 .330±.216 .062±.217 In this work we have coupled with a method for inductive FSM .899±.178 .096±.179 .000±.000 .005±.024 S.-W.-M. .820±.241 .000±.000 .064±.111 .116±.246 inference on ABoxes with two composite concept similar- F INANCIAL .807±.091 .024±.076 .000±.001 .169±.046 ity measures. The method is based on the classification in analogy with the majority of neighbor training individuals. It can be naturally exploited for predicting/suggesting missing of information they convey is very low. A choice of a lower information about individuals thus enabling a sort of inductive number k of neighbors could probably help committing those retrieval. For example, it may be employed a query to the residual errors. KB may be issued. Specifically we are targeting also the task For the same reasons, also for the S URFACE -WATER - of service discovery when for semantically annotated web M ODEL ontology quite a high rate of matching classifications service. Even more so the outcomes of our method may be was reported (yet less then with the previous ontology); decisive for enabling a series of other inductive tasks such as moreover, some cases of omission error (6%) were observed. clustering, case-based reasoning, , etc.. The induction rate was about 12% which means that for The proposed method is able to induce new assertions, in this ontology our classifier always assigned individuals to addition to the knowledge already derivable by means of a the correct concepts but, in some cases, it could also induce reasoner. Then it seems to be able to enhance the standard new assertions. Since this rate represents assertions that were instance-based classification. Particularly, an increase in accu- not logically deducible from the ontology and yet they were racy was observed when the instances increase in number and inferred inductively by the analogical classifier, these figures are homogeneously spread. would be a positive outcome (provided this knowledge were The presented measure can be refined tweaking the weight- deemed as correct by an expert). Particularly, in this case the ing factor λ for decreasing the impact of the dissimilarity increase of the induction rate has been due to the presence of between nested sub-concepts in the descriptions on the de- assertions of mutual disjointness for some of the concepts. termination of the overall value. Results are no different also for the case of the experiments Lately, we have defined another measure for ALN [24]. with F INANCIAL ontology that largely exceeds the others Hence a natural extension may concern the definition of in terms of number of concepts and individuals. Namely, dissimilarity measures in more expressive DLs languages. For the observed match rate is again above the 80% and the example, a normal form for ALCN can be obtained based on rest of the cases are comprised in the induction rate (17%), those for ALN and ALCN . Then, by exploiting the notion of leaving a limited margin to residual errors. This corroborates existential mappings [25], the measure presented in this paper a fact about the NN learners, that is their reaching better may be extended to more expressive DLs. We are currently and better performance in the limit, as long as new training developing other kind of semantic similarities based on the instances become available. Actually, performing a 10-fold idea of the hypothesis-driven distance. cross validation we obtained almost the same results. Besides, as mentioned, this method could be extended with different (yet still computationally tractable) answering C. Experiments with the Information Content Measure procedures grounded on statistical inference (non-parametric The average results obtained by adopting the procedure with tests based on ranked distances), in order to accept answers the measure based on information content (see Def. 4) are as correct with a high degree of confidence. Furthermore, reported in Table II. the k-NN method in its classical form is particularly suitable By analyzing this table it is possible to note that no sensible for the automated induction of missing values for (scalar or variation was observed in the classifications performed using numeric) datatype properties of an individual as an estimate the first dissimilarity measure. Particularly, with both measures derived from the values of the datatypes for the surrounding the method correctly classified all the individuals, without individuals. commission errors. The reason is that, in most of the cases, Kernels are another means to express a notion of similarity the individuals of these ontologies are instances of one concept is some unknown feature space. We are working at the only and they are involved in a few roles (object properties). definition of kernel functions on DLs representations [26], thus Some of the figures are slightly lower than those observed in allowing the exploitation of kernel methods efficiency (e.g. the the other experiment: this is confirmed by a higher variability. support vector machines) in a multi-relational setting. Surprisingly, the results on the larger ontologies (S.-W.-M. and F INANCIAL) perfectly match those obtained with the other ACKNOWLEDGMENTS measure which is probably due to the fact that we used a leave- This work was partially supported by the regional interest one-out cross-validation mode which yielded a high value for projects DIPIS (DIstributed Production as Innovative System) the number k of training instances for the neighborhood and and DDTA (Distretto Digitale Tessile Abbigliamento) in the context of the tasks related to the semantic web services [22] H. Bock., Analysis of Symbolic Data: Exploratory Methods for Extract- discovery. ing Statistical Information from Complex Data. Springer-Verlag, 1999. [23] I. H. Witten and E. Frank, Data Mining, 2nd ed. Morgan Kaufmann, 2005. R EFERENCES [24] N. Fanizzi and C. d’Amato, “A similarity measure for the ALN description logic,” in Proceedings of Convegno Italiano di Logica [1] W. Cohen and H. Hirsh, “Learning the CLASSIC description logic,” Computazionale, CILC06, Bari, Italy, 2006, http://cilc2006.di.uniba.it/ in Proceedings of the 4th International Conference on the Principles download/camera/15 Fanizzi CILC06.pdf. of Knowledge Representation and Reasoning, P. Torasso, J. Doyle, and [25] R. Kusters and R. Molitor, “Computing least common subsumers in E. Sandewall, Eds. Morgan Kaufmann, 1994, pp. 121–133. ALEN ,” in Proceedings of the International Joint Conference on [2] J.-U. Kietz and K. Morik, “A polynomial approach to the constructive Artificial Intelligence, IJCAI2001, B. Nebel, Ed., 2001, pp. 219–224. induction of structural knowledge,” Machine Learning, vol. 14, no. 2, [26] N. Fanizzi and C. d’Amato, “A declarative kernel for ALC concept pp. 193–218, 1994. descriptions.” in Proceedings of the 16th International Symposium on [3] S. Staab, A. Maedche, C. Nedellec, and P. Wiemer-Hastings, Eds., Methodologies for Intelligent Systems, ISMIS 2006, ser. LNAI, F. Es- ECAI2000 Workshop on Ontology Learning, ser. CEUR WS Proceed- posito, Z. Ras, D. Malerba, and G. Semeraro, Eds., vol. 4203. Springer, ings, 2000, vol. 31. 2006, pp. 322–331. [4] S. Staab, A. Maedche, C. Nedellec, and E. Hovy, Eds., IJCAI2001 Workshop on Ontology Learning, 2001. [5] F. Esposito, N. Fanizzi, L. Iannone, I. Palmisano, and G. Semeraro., “Knowledge-intensive induction of terminologies from metadata,” in ISWC2004, Proceedings of the 3rd International Semantic Web Con- ference, ser. LNCS, F. van Harmelen, S. McIlraith, and D. Plexousakis, Eds., vol. 3298. Springer, 2004, pp. 441–455. [6] M. d’Aquin, J. Lieber, and A. Napoli, “Decentralized case-based rea- soning for the Semantic Web,” in Proceedings of the 4th International Semantic Web Conference, ISWC2005, ser. LNCS, Y. Gil, V. Motta, E. Benjamins, and M. A. Musen, Eds., no. 3279. Springer, 2005, pp. 142–155. [7] P. Hitzler and D. Vrandec̆ić, “Resolution-based approximate reasoning for OWL DL,” in Proceedings of the 4th International Semantic Web Conference, ISWC2005, ser. LNCS, Y. Gil, V. Motta, E. Benjamins, and M. A. Musen, Eds., no. 3279. Springer, 2005, pp. 383–397. [8] T. Mitchell, Machine Learning. McGraw-Hill, 1997. [9] M. Rodrı́guez, “Assessing semantic similarity between spatial entity classes,” Ph.D. dissertation, University of Maine, 1997. [10] A. Borgida, T. Walsh, and H. Hirsh, “Towards measuring similarity in description logics,” in Working Notes of the International Description Logics Workshop, ser. CEUR Workshop Proceedings, Edinburgh, UK, 2005. [11] M. W. Bright, A. R. Hurson, and S. H. Pakzad, “Automated resolution of semantic heterogeneity in multidatabases.” ACM Transaction on Database Systems, vol. 19, no. 2, pp. 212–253, 1994. [12] P. Resnik, “Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural lan- guage,” Journal of Artificial Intelligence Research, vol. 11, pp. 95–130, 1999. [13] C. d’Amato, N. Fanizzi, and F. Esposito, “A dissimilarity measure for the ALC description logic,” in Semantic Web Applications and Perspec- tives, 2nd Italian Semantic Web Workshop SWAP2005, P. Bouquet and G. Tummarello, Eds., vol. 166. Trento, Italy: CEUR, 2005. [14] ——, “A dissimilarity measure for ALC concept descriptions,” in Proceedings of the 21st Annual ACM Symposium of Applied Computing, SAC2006, H. Haddad, Ed. Dijon, France: ACM, 2006, pp. 1695–1699. [15] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel- Schneider, Eds., The Description Logic Handbook. Cambridge Uni- versity Press, 2003. [16] F. Bacchus, “Lp, a logic for representing and reasoning with statistical knowledge,” Computational Intelligence, vol. 6, pp. 209–231, 1990. [17] M. Schmidt-Schauß and G. Smolka, “Attributive concept descriptions with complements,” Artificial Intelligence, vol. 48, no. 1, pp. 1–26, 1991. [18] F. M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf, “Deduction in concept languages: From subsumption to instance checking,” Journal of Logic and Computation, vol. 4, no. 4, pp. 423–452, 1994. [Online]. Available: citeseer.ist.psu.edu/donini94deduction.html [19] S. Brandt, R. Küsters, and A.-Y. Turhan, “Approximation and difference in description logics,” in Proceedings of the International Conference on Knowledge Representation, D. Fensel, F. Giunchiglia, D. McGuinness, and M.-A. Williams, Eds. Morgan Kaufmann, 2002, pp. 203–214. [20] A. Tversky, “Features od similarity,” Psycological Review, vol. 84, no. 4, pp. 327–352, 1997. [21] C. d’Amato, N. Fanizzi, and F. Esposito, “A semantic similarity measure for expressive description logics,” in Proceedings of Convegno Italiano di Logica Computazionale, CILC05, A. Pettorossi, Ed., Rome, Italy, 2005.