=Paper=
{{Paper
|id=None
|storemode=property
|title=Measuring Similarity in Ontologies: A New Family of Measures
|pdfUrl=https://ceur-ws.org/Vol-1272/paper_23.pdf
|volume=Vol-1272
|dblpUrl=https://dblp.org/rec/conf/semweb/AlsubaitPS14
}}
==Measuring Similarity in Ontologies: A New Family of Measures==
Measuring Similarity in Ontologies: A new family of measures Tahani Alsubait, Bijan Parsia, and Uli Sattler School of Computer Science, The University of Manchester, United Kingdom {alsubait,bparsia,sattler}@cs.man.ac.uk 1 Introduction Similarity measurement is important for numerous applications. Be it classical information retrieval, clustering, ontology matching or various other applica- tions. It is also known that similarity measurement is difficult. This can be easily seen by looking at the several attempts that have been made to develop similarity measures, see for example [2, 4]. The problem is also well-founded in psychology and a number of psychological models of similarity have been already developed, see for example [3]. Rather than adopting a psychological model for similarity as a foundation, we noticed that some existing similarity measures for ontologies are ad-hoc and unprincipled. In addition, there is still a need for similarity measures which are applicable to expressive Description Logics (DLs) (i.e., beyond EL) and which are terminological (i.e., do not require an ABox). To address these requirements, we have developed a new family of similarity measures which are founded on the feature-based psychological model [3]. The individual measures vary in their accuracy/computational cost based on which features they consider. To date, there has been no thorough empirical investigation of similarity measures. This has motivated us to carry out two separate empirical studies. First, we compare the new measures along with some existing measures against a gold-standard. Second, we examine the practicality of using the new measures over an independently motivated corpus of ontologies (BioPortal library) which contains over 300 ontologies. We also examine whether cheap measures can be an approximation of some more computationally expensive measures. In addi- tion, we explore what could possibly could wrong when using a cheap similarity measure. 2 A new family of similarity measures The new measures are based on Jaccard’s similarity coefficient which has been proved to be a proper metric (i.e., satisfies the properties: equivalence closure, symmetry and triangle inequality). Jaccard’s coefficient, which maps similarity to a value in the range [0,1], is defined as follows (for sets of “features” A0 ,B 0 of A,B, i.e., subsumers of A and B): 0 0 |(A ∩B )| J(A, B) = |(A0 ∪B 0 )| We aim at similarity measures for general OWL ontologies and thus a naive implementation of this approach would be trivialised because a concept has in- finitely many subsumers. To overcome this, we present refinements for the simi- larity function in which we do not count all subsumers but consider subsumers from a set of (possibly complex) concepts of a concept language L. Let C and D be concepts, let O be an ontology and let L be a concept language. We set: S(C, O, L) = {D ∈ L(O) e | O |= C v D} Com(C, D, O, L) = S(C, O, L) ∩ S(D, O, L) Union(C, D, O, L) = S(C, O, L) ∪ S(D, O, L) |Com(C, D, O, L)| Sim(C, D, O, L) = |U nion(C, D, O, L)| To design a new measure, it remains to specify the set L. For example: AtomicSim(C, D) = Sim(C, D, O, LAtomic (O)), e and LAtomic (O) e =O e ∩ NC . SubSim(C, D) = Sim(C, D, O, LSub (O)), e and LSub (O) e = Sub(O). GrSim(C, D) = Sim(C, D, O, LG (O)), e and LG (O) e = {E | E ∈ Sub(O) or E = ∃r.F, for some r ∈ Oe ∩ NR and F ∈ Sub(O)}. where O e is the signature of O, NC is the set of concept names and Sub(O) is the set of concept expressions in O. The rationale of SubSim(·) is that it provides similarity measurements that are sensitive to the modeller’s focus. To capture more possible subsumers, one can use GrSim(·) for which the grammar can be extended easily. 3 Approximations of similarity measures Some measures might be practically inefficient due to the large number of can- didate subsumers. For this reason, it would be nice if we can examine whether a “cheap” measure can be a good approximation for a more expensive one. Definition 1 Given two similarity functions Sim(·), Sim0 (·), we say that: – Sim0 (·) preserves the order of Sim(·) if ∀A1 , B1 , A2 , B2 ∈ O: e Sim(A1 , B1 ) ≤ Sim(A2 , B2 ) =⇒ Sim0 (A1 , B1 ) ≤ Sim0 (A2 , B2 ). – Sim0 (·) approximates Sim(·) from above if ∀A, B ∈ O: e Sim(A, B) ≤ 0 Sim (A, B). – Sim0 (·) approximates Sim(·) from below if ∀A, B ∈ O: e Sim(A, B) ≥ 0 Sim (A, B). Consider AtomicSim(·) and SubSim(·). The first thing to notice is that the set of candidate subsumers for the first measure is actually a subset of the set of candidate subsumers for the second measure (O e ∩ NC ⊆ Sub(O)). However, we need to notice also that the number of entailed subsumers in the two cases need not to be proportionally related. Hence, the above examples of similarity measures are, theoretically, non-approximations of each other. 4 Empirical evaluation We carry out a comparison between the three measures GrSim(·), SubSim(·) and AtomicSim(·) against human similarity judgments. We also include two existing similarity measures in this comparison (Rada [2] and Wu & Palmer [4]). We also study in detail the behaviour of our new family of measures in practice. GrSim(·) is considered as the expensive and most precise measure in this study. To study the relation between the different measures in practice, we examine the following properties: order-preservation, approximation from above/below and correlation (using Pearson’s coefficient). 4.1 Experimental set-up Part 1: Comparison against a gold-standard The similarity of 19 SNOMED- CT concept pairs was calculated using the three methods along with Rada [2] and Wu & Palmer [4] measures. We compare these similarities to human judge- ments taken from the Pedersen et al.[1] test set. Part 2: Cheap vs. expensive measures A snapshot of BioPortal from November 2012 was used as a corpus. It contains a total of 293 ontologies. We excluded 86 ontologies which have only atomic subsumptions as for such ontologies the behaviour of the considered measures will be identical, i.e., we already know that AtomicSim(·) is good and cheap. Due to the large number of classes and difficulty of spotting interesting patterns by eye, we calculated the pairwise similarity for a sample of concepts from the corpus. The size of the sample is 1,843 concepts with 99% confidence level. To ensure that the sample encompasses concepts with different characteristics, we picked 14 concepts from each ontology. The selection was not purely random. Instead, we picked 2 random concepts and for each random concept we picked some neighbour concepts. 4.2 Results How good is the expensive measure? Not surprisingly, GrSim and SubSim had the highest correlation values with experts’ similarity (Pearson’s correlation coefficient r = 0.87, p < 0.001). Secondly comes AtomicSim with r = 0.86. Finally comes Wu & Palmer then Rada with r = 0.81 and r = 0.64 respectively. Figure 1 shows the similarity curves for the 6 measures used in this comparison. The new measures along with Wu & Palmer measure preserve the order of human similarity more often than Rada measure. They mostly underestimated similarity whereas the Rada measure was mostly overestimating human similarity. Cost of the expensive measure The average time per ontology taken to calculate grammar-based similarities was 2.3 minutes (standard deviation σ = 10.6 minutes, median m = 0.9 seconds) and the maximum time was 93 minutes for the Neglected Tropical Disease Ontology which is a SRIQ ontology with 1237 logical axioms, 252 concepts and 99 object properties. For this ontology, the cost of AtomicSim(·) was only 15.545 sec and 15.549 sec for SubSim(·). 9 out of 196 ontologies took over 1 hour to be processed. One thing to note about these ontologies is the high number of logical axioms and object properties. Clearly, GrSim(·) is far more costly than the other two measures. This is why we want to know how good/bad a cheaper measure can be. Fig. 1: 6 Curves of similarity for 19 SNOMED clinical terms How good is a cheap measure? Although we have excluded all ontologies with only atomic subsumptions from the study, in 12% of the ontologies the three measures were perfectly correlated (r = 1, p < 0.001). These perfect correlations indicate that, in some cases, the benefit of using an expensive measure is totally neglectable. AtomicSim(·) and SubSim(·) did not preserve the order of GrSim(·) in 80% and 73% of the ontologies respectively. Also, they were not approximations from above nor from below in 72% and 64% of the ontologies respectively. Take a look at the African Traditional Medicine ontology in Figure 2. SubSim(·) is 100% order-preserving while AtomicSim(·) is only 99% order-preserving. Fig. 2: African Traditional Medicine Fig. 3: Platynereis Stage Note also the Platynereis Stage Ontology in Figure 3 in which both AtomicSim(·) and SubSim(·) are 75% order-preserving. However, AtomicSim(·) was 100% ap- proximating from above while SubSim(·) was 85% approximating from below. References 1. T. Pedersen, S. Pakhomov, S. Patwardhan, and C. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Infor- matics, 30(3):288–299, 2007. 2. R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. In IEEE Transaction on Systems, Man, and Cybernetics, volume 19, page 1730, 1989. 3. A. Tversky. Features of similarity. Psycological Review by the American Psycological Association, Inc., 84(4), July 1977. 4. Z. Wu and MS. Palmer. Verb semantics and lexical selection. In Proceedings of the 32nd. Annual Meeting of the Association for Computational Linguistics (ACL 1994), page 133138, 1994.