A Note on the Evaluation of Inductive Concept Classification Procedures Claudia d’Amato, Nicola Fanizzi, and Floriana Esposito LACAM, Dipartimento di Informatica, Università degli studi di Bari Campus Universitario, Via Orabona 4, 70125 Bari, Italy {claudia.damato, fanizzi, esposito}@di.uniba.it Abstract. The limitations of deductive logic-based approaches at deriving op- erational knowledge from ontologies may be overcome by inductive (instance- based) methods, which are usually efficient and noise-tolerant. However the eval- uation of such methods is made particularly difficult by the open-world seman- tics which may often cause individuals not to be deductively classified by the reasoner. In this paper an evaluation method is proposed that is suitable for com- paring inductive classification methods to standard reasoners. Experimentally we show that the behavior of a nearest neighbor classifier is comparable with the one of a standard reasoner in terms of the proposed indices. 1 Motivation Classification for retrieving resources from a knowledge base (KB) in the context of the Semantic Web (SW) is an important task that is performed by means of logical methods. These may fail due to the inherent incompleteness and incoherence of the KBs caused by their distributed nature. This has given rise to alternative methods for approximate reasoning (see the discussion in [1]) or inductive methods [2, 3] which are known to be both efficient and more noise-tolerant. Extending the inductive methods to the SW representations ultimately founded in Description Logics (DL) [4], was not straightfor- ward. In particular, a theoretical problem is posed by the open-world semantics of the ontologies, as opposed to the typical closed-world assumed in the database applications. The evaluation of an inductive classification procedure would essentially require the comparison of the inductive answers provided by the inductive method to the correct ones which would derive from the intended semantics of the considered KBs. However this setting is often infeasible as it would require querying the experts and knowledge engineers that built the KB. Alternatively one may want to compare the inductive answers to those provided by a deductive reasoner, often more efficiently and in a more robust way w.r.t. noise. In previous works [2, 3] we have adopted four indices (match, induction, omission er- ror, commission error rates) to evaluate inductive methods compared to deductive ones as zero-one loss functions [5] where misclassification is charged a single unit. We ex- tend this notion to the case of standard indices employed in Information Retrieval (IR), namely precision, recall and F-measure, originally defined in terms of a notion of rele- vance. Even more so, we extend the mentioned indices to take into account the likeli- hood of the inductively derived answer. In order to perform experiments with evaluating an inductive classification method, an extension of the Nearest Neighbor classification procedure (henceforth, NN) [5] was applied to the standard SW representations. The procedure can classify individuals w.r.t. query concepts, by analogy with the classification of the nearest (w.r.t. some similarity criterion) training individuals. This method is quite efficient because it requires check- ing class-membership for a limited set of training instances. Although a number of dissimilarity measures for concepts expressed in various concept languages have been proposed [6, 2], we will resort to language-independent pseudo-metrics for individuals [3]. The dissimilarity of two individuals is measured by comparing them w.r.t. a given context, i.e. a committee of features (concepts), namely those defined in the KB or that can be generated to this purpose. The paper is organized as follows. The basics of the NN procedure applied to SW representations and the similarity measures adopted are recalled in §2. The new indices for measuring the performance of inductive classifiers is presented in §3, and §4 reports the outcomes of experiments measuring the performance of the inductive procedure in terms of the new indices. Concluding remarks are reported in §5. 2 Classification by Analogy In the following, OWL-DL knowledge bases will be considered with their standard se- mantics borrowed from DL languages [4]. Specifically, a knowledge base K = hT , Ai is assumed to be made up of a TBox T and an ABox A containing, resp., a set of axioms that define concepts and a set of assertions concerning the individuals. 2.1 The Nearest Neighbor Classification Procedure Classification boils down to determining whether an individual belongs to a concept extension (instance checking). An inductive classification method should be able to provide an answer even when this may not be logically inferred. Moreover, it may also provide a measure of the likelihood of its answer. In instance-based learning [5] the basic idea is to find the most similar object(s) to the one that is to be classified w.r.t. a dissimilarity measure. The objective is to induce a classifier as an approximation of a discrete-valued function (a hypothesis) hC : IS 7→ V from a space of instances IS to a set of values V = {v1 , . . . , vs } standing for the classifications that have to be predicted. Normally |IS|  |Ind(A)| i.e. only a limited number of training instances is needed especially if they are prototypical for the regions of the search space. Let x be the instance whose classification is to be determined. Using a dissimilarity measure, the set of the k nearest (pre-classified) training instances w.r.t. x is selected: Nk (x) = {xi }ki=1 . The k-NN algorithm approximates hC for classifying x on the grounds of the value that hC is known to assume for the training instances in Nk (x), i.e. the k closest in- stances to x in terms of a dissimilarity measure. The value is decided by a weighted majority voting procedure: it is simply the most voted value by the instances in Nk (x) weighted by the similarity of the neighbor individual. The estimate of the hypothesis function for the query individual is: k X ĥC (x) := argmax wi δ(v, hC (xi )) (1) v∈V i=1 where δ returns 1 in case of matching arguments and 0 otherwise, and, given a dissimi- larity measure d, the weights are determined by wi = 1/d(x, xi ). Note that ĥC is defined extensionally: the method needs not to provide an analyt- ically defined function, as other inductive methods do [5]. Being based on a majority vote among the individuals in the neighborhood, this procedure is less error-prone com- pared to a purely logic deductive one in case of noise caused by incorrect assertions: it may be able to give a correct classification even in case of (partially) inconsistent KBs. To deal with the open-world semantics, the absence of information on whether a training instance x belongs to the extension of the query concept C should count as neutral (uncertain) information. Thus, assuming the alternate viewpoint, the multi-class problem is transformed into a ternary one. Hence another value set has to be adopted, namely V = {+1, −1, 0}, where the values denote, respectively, membership, non- membership, and uncertainty, respectively. The task can be cast as follows: given a query concept C, determine the membership of an instance x through the NN procedure (see Eq. 1) where V = {−1, 0, +1} and the hypothesis function values for the training instances are determined as follows: hC (x) = +1 if K |= C(x), hC (x) = −1 if K |= ¬C(x) and hC (x) = 0 otherwise. It should be noted that the inductive inference made by the procedure shown above is not guaranteed to be deductively valid. Indeed, inductive inference naturally yields a certain degree of uncertainty. In order to measure the likelihood of the decision made by the procedure (x has a classification corresponding to the value v maximiz- ing the argmax argument in Eq. 1), given the nearest training individuals in Nk (x) = {x1 , . . . , xk }, the quantity that determined the decision should be normalized by divid- ing it by the sum of such arguments over the (three) possible values: Pk wi · δ(v, hC (xi )) `[class(x) = v|Nk (x)] = P i=1 Pk (2) u∈V i=1 wi · δ(u, hC (xi )) Hence the likelihood of the assertion C(x) corresponds to the case when v = +1. 2.2 Semantic Pseudo-Metrics for Individuals Various definitions of semantic similarity (or dissimilarity) measures for concept lan- guages have been proposed [6, 2]. For our purposes, we need a function for measuring the similarity of individuals rather than concepts. The new dissimilarity measures are based on the idea of comparing the semantics of the input individuals along a number of dimensions represented by a committee of concept descriptions. Indeed, on a semantic level, similar individuals should behave similarly with respect to the same concepts. Totally semantic distance measures for individuals can be defined in the context of a knowledge base. More formally, the ra- tionale is to compare individuals on the grounds of their semantics w.r.t. a collection of concept descriptions, say F = {F1 , F2 , . . . , Fm }, which stands as a group of discrimi- nating features expressed in the OWL-DL sublanguage taken into account. In its simple formulation, a family of distance functions for individuals inspired to Minkowski’s norms Lp can be defined as follows [3]: Definition 2.1 (family of measures). Let K = hT , Ai be a knowledge base. Given a set of concept descriptions F = {F1 , F2 , . . . , Fm } and a weight vector w, a family of dissimilarity functions dFp : Ind(A) × Ind(A) 7→ [0, 1] with p > 0 is defined as follows: hP i1 1 |F| p p ∀a, b ∈ Ind(A) dFp (a, b) := |F| i=1 w i | δ i (a, b) | where ∀i ∈ {1, . . . , m} the dissimilarity function δi is defined by:  0 if (K |= Fi (a) ∧ K |= Fi (b)) ∨ (K |= ¬Fi (a) ∧ K |= ¬Fi (b)) δi (a, b) = 1 if (K |= Fi (a) ∧ K |= ¬Fi (b)) ∨ (K |= ¬Fi (a) ∧ K |= Fi (b)) 1 2 otherwise The weight vector w may be determined by the amount of information conveyed by each feature, which can be measured as its estimated entropy: wi = H(Fi ) (as in [3]). 3 Performance Indices Machine learning focuses on binary or multiple classification problems that can be re- duced to the binary case, as normally classes are assumed to be mutually disjoint. In a representation that adopts an open-world semantics a ternary response function requires a different treatment. Adopting a response set V = {−1, 0, +1}, cases of uncertain clas- sification may happen. In the following we assume to evaluate a set of inductive clas- sifications IC to one of deductive classifications DC on concept C, where C(a) ∈ IC iff ĥC (a) = +1 and ¬C(a) ∈ IC iff ĥC (a) = −1; the same applies for DC where ĥC = hC , i.e. it should be deductively computed by the reasoner. 3.1 Generalized IR Measures Since classification can be employed to retrieving the individuals that are related to a given query concept, it is quite straightforward to adopt the standard measures used in IR: precision (P ), recall (R), F-measure. However this suits settings with binary responses determined by a notion of relevance w.r.t. the query. In the SW context, a dif- ferent purpose is pursued by the semantic precision and recall measures [7], introduced for assessing the quality of alignments in an ontology matching task. As a first step, the definition of precision and recall may be generalized adopting different measures of the overlap [8]: ω(S1 , S2 ) ω(S1 , S2 ) Pω (S1 , S2 ) = Rω (S1 , S2 ) = (3) |S1 | |S2 | where originally these sets are made up of alignments, but we may consider them as membership axioms. These measures should be chosen so that some properties are fulfilled : ∀S1 , S2 – ω(S1 , S2 ) ≥ 0 positiveness – ω(S1 , S2 ) ≤ min(|S1 |, |S2 |) maximality – ω(S1 , S2 ) ≥ |S1 ∩ S2 | boundedness Now considering S1 = IC and S2 = DC , the basic definition of the measures corresponds to ω0 := |IC ∩ DC | which trivially fulfills all properties above. In our specific setting, since the answers of the inductive classifier are to be com- pared to those of the reasoner, one may also check the precision and recall of the single responses v ∈ V separately and then consider the (weighted) average of these precision (or recall) measures as an overall index. X |ICv ∩ DC v | P (IC , DC ) := wv v average precision (4) |IC | v∈V X |ICv ∩ DC v | R(IC , DC ) := wv v average recall (5) |DC | v∈V where ICv (resp. DC v ) denotes the subset of the individuals with same classification: {a ∈ IC | ĥC (a) = v} (resp. {a ∈ DC | ĥC (a) = v). The case of uniform weights wv = 1/|V |, ∀v ∈ V corresponds to macro-averaging over the possible values in v V . Alternatively, one may consider the choice wv = |DC |/|TS|, ∀v ∈ V , where T S represents an independent set of individuals employed for testing. In [7] some properties are introduced for semantic precision and recall measures: 1. DC |= IC ⇒ P (IC , DC ) = 1 max-correctness 2. IC |= DC ⇒ R(IC , DC ) = 1 max-completeness 3. Cn(IC ) = Cn(DC ) iff P (IC , DC ) = 1 and R(IC , DC ) = 1 definiteness 4. P (IC , DC ) ≥ 0 and R(IC , DC ) ≥ 0 positiveness 5. P (IC , DC ) ≤ 1 and R(IC , DC ) ≤ 1 maximality 6. P 0 (IC , DC ) ≥ P (IC , DC ) and R0 (IC , DC ) ≥ R(IC , DC ) boundedness for all alternative precision and recall measures P 0 and R0 . where |= is a shortcut for the entailment relation between the assertions in each set and Cn(·) returns the set of assertions entailed by the input set of assertions. Of course entailment w.r.t. the models of the underlying KB is considered. Proposition 3.1. Measures P and R fulfill properties 1–6. Proof. We will consider the uniform weight case wv = 1/|V |: 1. P (IC , DC ) = |V1 | v∈V |ICv ∩ DCv |/|ICv | = |V1 | v∈V |ICv |/|ICv | = 1; P P 2. analogously; 3. Cn(IC ) = Cn(DC ) iff ∀v ∈ V : (ICv ∩ DC v ) = ICv = DC v iff P (IC , DC ) = 1 and R(IC , DC ) = 1; 4. trivial; 5. trivial; 6. P (IC , DC ) = |V1 | v∈V |ICv ∩ DCv |/|ICv | ≥ |V1 | v∈V |ICv ∩ DC v P P |/|IC | = = |IC ∩ DC |/|IC | = P (IC , DC ); analogously R(IC , DC ) ≥ R(IC , DC ). Cn(IC) IC DC Cn(DC) Fig. 1. Representation of the sets involved in the computation of precision and recall. An ideal semantic generalization of the precision and recall measures in Eq.3 ex- ploits the derivation closures (see Fig. 1) in the computation of the overlaps: |Cn(IC ) ∩ Cn(DC )| Pideal (IC , DC ) := Pω0 (Cn(IC ), Cn(DC )) = (6) |Cn(IC )| |Cn(IC ) ∩ Cn(DC )| Rideal (IC , DC ) := Rω0 (Cn(IC ), Cn(DC )) = (7) |Cn(DC )| However, as noted in [7], these measures would be undefined when the number of con- sequences is infinite. Better semantic definitions are then: |IC ∩ Cn(DC )| Psem (IC , DC ) := Pω0 (IC , Cn(DC )) = (8) |IC | |Cn(IC ) ∩ DC | Rsem (IC , DC ) := Rω0 (Cn(IC ), DC ) = (9) |DC | where the problem is solved since |IC | ≤ |TS| and |DC | ≤ |TS|. It is easy to prove that: Proposition 3.2. Measures Psem and Rsem fulfill properties 1–6. Proof. 1. DC |= IC ⇒ Cn(DC ) ⊆ IC ⇒ Psem (IC , DC ) = |IC ∩ Cn(DC )|/|IC | = |IC |/|IC | = 1; 2. analogously; 3. Cn(IC ) = Cn(DC ) iff IC ⊆ Cn(IC ) = Cn(DC ) ⊇ DC iff Psem (IC , DC ) = |IC |/|IC | = 1 and Rsem (IC , DC ) = |DC |/|DC | = 1; 4. trivial; 5. trivial; 6. trivial since IC ⊆ Cn(IC ) and DC ⊆ Cn(DC ). 3.2 Other Measures Alternative evaluation indices have been used [2, 3] that, unlike the previous ones, do not have a direct mapping to the sets of true/false positives/negatives: – match: case of an individual that got the same classification by the reasoner and the inductive classifier; – omission error: case of an individual for which the inductive method could not determine whether it was relevant to the query or not (response 0) while it was found relevant by the reasoner (response ±1); – commission error: case of an individual found to be relevant to the query concept (response −1 or +1) by the inductive classifier, while it logically belongs to its negation or vice-versa (response +1 or −1, respectively); – induction rate: case of an individuals found to be relevant to the query concept or to its negation (response ±1), while either case is not logically derivable from the knowledge base (response 0). Each case increases an index with a single unit (zero-one loss). This can be general- ized by exploiting the likelihood measure provided by the inductive procedure in order to assign parts of the unit to each of the three possible responses. This, in turn, has an impact on the measure of the indices normally presented in terms of rates. Specifically, comparing the inductive answers to those of a reasoner, for each induc- tive classification, instead of incrementing a single count, one may selectively increase more indices at the same time, using the estimated likelihood measures as in Eq. 2 (∀v ∈ V : `v = `[class(x) = v]): −1: increase the match count with the likelihood `−1 , increase the omission error count with `0 , increase the commission error count with `+1 ; 0: increase the match count with the likelihood `0 , increase the induction count with `−1 + `+1 ; +1: increase the match count with the likelihood `+1 , increase the omission error count with `0 , increase the commission error count with `−1 . The counts above can be represented in a contingency matrix M = (muv )u,v∈V which gives an idea of the performance in multi-class problems. An even better eval- uation can be performed by comparing M to another matrix R = (ruv )u,v∈V repre- senting P the outcomes with P the random classifier exploiting P the row and column totals: ruv = ( w∈V muw · w∈V mwv )/N , with N = w,t∈V mwt . The kappa statistic P P  v∈V mvv − v∈V rvv κ := P  N − v∈V rvv can be employed to measure the relative improvement over the random classifier [5]. 4 Some Experiments In order to test the NN procedure integrated with the pseudo-metric proposed in the previous sections, it was applied to a number of classification problems. To this pur- Table 1. Facts concerning the ontologies employed in the experiments. ontology DL language #concepts #object prop. #data prop. #individuals SWM ALCOF(D) 19 9 1 115 B IO PAX ALCHF(D) 28 19 30 323 LUBM ALR+ HI(D) 43 7 25 555 NTN SHIF (D) 47 27 8 676 SWSD ALCH 258 25 0 732 F INANCIAL ALCIF 60 17 0 1000 Table 2. Experimental results in terms of the new IR measures. ontology precision recall F-measure SWM 97.46±03.27 96.87±04.23 97.16±03.69 B IO PAX 92.21±13.00 80.23±14.55 85.80±13.73 LUBM 99.14±04.64 93.85±12.09 96.42±06.71 NTN 82.57±17.22 63.13±17.74 71.55±17.48 SWSD 80.82±08.88 75.66±04.24 78.15±05.74 F INANCIAL 97.72±07.36 57.86±13.45 72.68±09.52 pose, we selected some OWL ontologies from different domains, namely: S URFACE - WATER -M ODEL (SWM), N EW T ESTAMENT NAMES (NTN) from the Protégé library1 , our Semantic Web Service Discovery dataset2 (SWSD), one generated by the Lehigh University Benchmark (LUBM), the BioPax glycolysis ontology3 (BioPax) and F INAN - CIAL ontology4 . Tab. 1 summarizes important details concerning these ontologies. A 10-fold cross validation was performed. The simplest version of the distance (dF1 ) was employed using all the p concepts in the knowledge base for determining the set F. The parameter k was set to |Ind(A)| depending on the number of individuals in the ontology. The performance was evaluated comparing the unductive responses to those returned by a standard reasoner5 as a baseline. 4.1 Generalized IR Measures The outcomes are reported in Fig.2. For each knowledge base, we report the average values (and the standard deviation) obtained classifying each individual against each concept in the KB using both the reasoner and the NN classifier. It is possible to note that generally results are good especially for the smaller ontolo- gies (in terms of number of individuals). In particular precision was good (> 90%) for all but for the SWSD and NTN ontologies for which it drops to around 80%. Namely, 1 http://protege.stanford.edu/plugins/owl/owl-library 2 https://www.uni-koblenz.de/FB4/Institutes/IFI/AGStaab/Projects/xmedia/ dl-tree.htm 3 http://www.biopax.org/Downloads/Level1v1.4/ 4 http://www.cs.put.poznan.pl/alawrynowicz/ 5 We employed P ELLET v. 1.5.2. See http://pellet.owldl.com Table 3. Results with alternative indices: set-theoretic (s.) and likelihood version (l.). ontology type match commission omission induction s. 97.89±03.04 00.00±00.00 00.92±01.25 01.20±01.82 SWM l. 96.96±04.16 00.00±00.00 01.31±01.61 01.73±02.68 s. 90.90±13.28 08.56±13.37 00.00±00.00 00.54±02.64 B IO PAX l. 87.75±15.38 11.38±15.35 02.29±01.25 00.48±02.28 s. 98.48±06.30 00.00±00.00 00.83±02.64 00.69±05.19 LUBM l. 97.69±07.51 00.00±00.00 01.15±04.10 01.15±03.68 s. 88.39±16.69 00.24±01.08 05.99±07.27 05.37±09.33 NTN l. 86.53±17.51 00.38±01.66 06.34±08.05 06.55±08.48 s. 98.12±05.01 00.00±00.00 01.15±02.86 00.74±02.17 SWSD l. 97.80±05.47 00.00±00.00 01.12±02.73 01.11±02.73 s. 97.02±07.76 02.64±07.77 00.02±00.07 00.32±00.15 F INANCIAL l. 93.86±09.21 03.55±09.26 02.17±01.83 00.42±00.36 SWSD turned out to be more difficult (also in terms of recall) for two reasons: a very limited number of individuals per concept was available and the number of different concepts is larger than in other knowledge bases. For the other ontologies scores are quite high, as testified also by the F-measure values. The results in terms of recall are also more stable than those for recall as proved by the limited variance observed, whereas some concepts turned out to be quite difficult. The reason for precision being less than recall are probably due to the open-world assumption. Indeed, in a many cases it was observed that the NN procedure deemed some individuals as relevant for the target concept while the DL reasoner was not able to assess this relevance and this was computed as a mistake while it may likely turn out to be a correct inference when judged by a human expert. Thus different indices would be needed in this case that may make explicit both the rate of inductively classified individuals and the nature of the mistakes. 4.2 Other Measures Tab. 3 reports the outcomes in terms of the set-theoretic loss-function and the new indices exploiting the likelihood value provided by the NN classifier. Preliminarily, it is important to note that, in each experiment, the commission error was low or absent (except for the BioPax ontology). This means that the search procedure is generally quite accurate: it did not make critical mistakes i.e. cases when an individual is deemed as an instance of a concept while it really is an instance of a disjoint one. Also omission error and induction rates are quite low, yet they were more typically observed in the experiments with the considered ontologies. The usage of all concepts for the set F of dF1 made the measure quite accurate, which is the reason why the procedure resulted quite conservative as regards inducing new as- sertions. In many cases, it matched rather faithfully the reasoner decisions. From the IR point of view the cases of induction are interesting because they suggest new assertions which cannot be logically derived by using a deductive reasoner yet they might be used to complete a knowledge base [3], e.g. after being validated by an ontology engineer. 5 Concluding Remarks In this paper new evaluation measures were proposed that are suitable for comparing inductive classification methods to standard reasoners. Experimentally, we showed that the behavior of a NN classifier is comparable with the one of a standard reasoner in terms of the proposed indices. Other extensions of the current measures may be made exploiting the probabilistic output and different loss-functions. Futher measures such as specificity, sparsity, fallout could also be generalized. Moreover, the same criteria may be adopted also in the eval- uation of approximate reasoning methods [1]. Similar measures may be also employed for evaluating other learning algorithms, such as (un)supervised conceptual clustering [9]. We are currently investigating the possibility of devising classifiers that provide binary responses [10], that would require suitable performance indices. References [1] Hitzler, P., Vrandec̆ić, D.: Resolution-based approximate reasoning for OWL DL. In Gil, Y., Motta, E. Benjamins, V., Musen, M.A., eds.: Proceedings of the 4th International Se- mantic Web Conference, ISWC2005. Number 3279 in LNCS, Galway, Ireland, Springer (2005) 383–397 [2] d’Amato, C., Fanizzi, N., Esposito, F.: Reasoning by analogy in description logics through instance-based learning. In Tummarello, G., Bouquet, P., Signore, O., eds.: Proceedings of Semantic Web Applications and Perspectives, 3rd Italian Semantic Web Workshop, SWAP2006. Volume 201 of CEUR Workshop Proceedings., Pisa, Italy, CEUR-WS.org (2006) [3] d’Amato, C., Fanizzi, N., Esposito, F.: Query answering and ontology population: An inductive approach. In Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M., eds.: Proceedings of the 5th European Semantic Web Conference, ESWC2008. Volume 5021 of LNCS., Springer (2008) 288–302 [4] Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P., eds.: The De- scription Logic Handbook. Cambridge University Press (2003) [5] Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. 2nd edn. Morgan Kaufmann (2005) [6] Borgida, A., Walsh, T., Hirsh, H.: Towards measuring similarity in description logics. In Horrocks, I., Sattler, U., Wolter, F., eds.: Working Notes of the International Description Logics Workshop. Volume 147 of CEUR Workshop Proceedings., Edinburgh, UK (2005) [7] Euzenat, J.: Semantic precision and recall for ontology alignment evaluation. In Veloso, M., ed.: Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI2007, Hyderabad, India (2007) 348–353 [8] Eherig, M., Euzenat, J.: Relaxed precision and recall for ontology matching. In Ashpole, B., Euzenat, J., Eherig, M., Stuckenschmidt, H., eds.: Proceedings of the K-Cap2005 Workshop on Integrating Ontology, Banff, Canada (2005) 25–32 [9] Fanizzi, N., d’Amato, C., Esposito, F.: Conceptual clustering for concept drift and novelty detection. In Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M., eds.: Proceed- ings of the 5th European Semantic Web Conference, ESWC2008. Volume 5021 of LNAI., Springer (2008) 318–332 [10] d’Amato, C., Fanizzi, N., Esposito, F.: Distance-based classification in OWL ontologies. In Lovrek, I., Howlett, R., Jain, L., eds.: Proceedings of the 12th International Conference, KES2008. Volume 5178 of LNAI., Zagreb, Croatia, Springer (2008) 656–661