=Paper= {{Paper |id=None |storemode=property |title=Measuring Similarity in Ontologies: A New Family of Measures |pdfUrl=https://ceur-ws.org/Vol-1272/paper_23.pdf |volume=Vol-1272 |dblpUrl=https://dblp.org/rec/conf/semweb/AlsubaitPS14 }} ==Measuring Similarity in Ontologies: A New Family of Measures== https://ceur-ws.org/Vol-1272/paper_23.pdf
     Measuring Similarity in Ontologies: A new
                family of measures

                   Tahani Alsubait, Bijan Parsia, and Uli Sattler

    School of Computer Science, The University of Manchester, United Kingdom
                  {alsubait,bparsia,sattler}@cs.man.ac.uk




1    Introduction

Similarity measurement is important for numerous applications. Be it classical
information retrieval, clustering, ontology matching or various other applica-
tions. It is also known that similarity measurement is difficult. This can be
easily seen by looking at the several attempts that have been made to develop
similarity measures, see for example [2, 4]. The problem is also well-founded in
psychology and a number of psychological models of similarity have been already
developed, see for example [3]. Rather than adopting a psychological model for
similarity as a foundation, we noticed that some existing similarity measures
for ontologies are ad-hoc and unprincipled. In addition, there is still a need for
similarity measures which are applicable to expressive Description Logics (DLs)
(i.e., beyond EL) and which are terminological (i.e., do not require an ABox).
To address these requirements, we have developed a new family of similarity
measures which are founded on the feature-based psychological model [3]. The
individual measures vary in their accuracy/computational cost based on which
features they consider.
    To date, there has been no thorough empirical investigation of similarity
measures. This has motivated us to carry out two separate empirical studies.
First, we compare the new measures along with some existing measures against
a gold-standard. Second, we examine the practicality of using the new measures
over an independently motivated corpus of ontologies (BioPortal library) which
contains over 300 ontologies. We also examine whether cheap measures can be
an approximation of some more computationally expensive measures. In addi-
tion, we explore what could possibly could wrong when using a cheap similarity
measure.


2    A new family of similarity measures

The new measures are based on Jaccard’s similarity coefficient which has been
proved to be a proper metric (i.e., satisfies the properties: equivalence closure,
symmetry and triangle inequality). Jaccard’s coefficient, which maps similarity
to a value in the range [0,1], is defined as follows (for sets of “features” A0 ,B 0 of
A,B, i.e., subsumers of A and B):
                   0   0
              |(A ∩B )|
    J(A, B) = |(A0 ∪B 0 )|
We aim at similarity measures for general OWL ontologies and thus a naive
implementation of this approach would be trivialised because a concept has in-
finitely many subsumers. To overcome this, we present refinements for the simi-
larity function in which we do not count all subsumers but consider subsumers
from a set of (possibly complex) concepts of a concept language L. Let C and
D be concepts, let O be an ontology and let L be a concept language. We set:

                      S(C, O, L) = {D ∈ L(O) e | O |= C v D}
                 Com(C, D, O, L) = S(C, O, L) ∩ S(D, O, L)
                Union(C, D, O, L) = S(C, O, L) ∪ S(D, O, L)
                                     |Com(C, D, O, L)|
                  Sim(C, D, O, L) =
                                    |U nion(C, D, O, L)|
To design a new measure, it remains to specify the set L. For example:

    AtomicSim(C, D) = Sim(C, D, O, LAtomic (O)),
                                            e    and LAtomic (O)
                                                              e =O
                                                                 e ∩ NC .
       SubSim(C, D) = Sim(C, D, O, LSub (O)),
                                         e    and LSub (O)
                                                        e = Sub(O).
        GrSim(C, D) = Sim(C, D, O, LG (O)),
                                         e    and LG (O)
                                                      e = {E | E ∈ Sub(O)
               or E = ∃r.F, for some r ∈ Oe ∩ NR and F ∈ Sub(O)}.

where O e is the signature of O, NC is the set of concept names and Sub(O) is the
set of concept expressions in O. The rationale of SubSim(·) is that it provides
similarity measurements that are sensitive to the modeller’s focus. To capture
more possible subsumers, one can use GrSim(·) for which the grammar can be
extended easily.

3     Approximations of similarity measures
Some measures might be practically inefficient due to the large number of can-
didate subsumers. For this reason, it would be nice if we can examine whether
a “cheap” measure can be a good approximation for a more expensive one.
Definition 1 Given two similarity functions Sim(·), Sim0 (·), we say that:
 – Sim0 (·) preserves the order of Sim(·) if ∀A1 , B1 , A2 , B2 ∈ O:
                                                                  e Sim(A1 , B1 ) ≤
   Sim(A2 , B2 ) =⇒ Sim0 (A1 , B1 ) ≤ Sim0 (A2 , B2 ).
 – Sim0 (·) approximates Sim(·) from above if ∀A, B ∈ O:          e Sim(A, B) ≤
      0
   Sim (A, B).
 – Sim0 (·) approximates Sim(·) from below if ∀A, B ∈ O:          e Sim(A, B) ≥
      0
   Sim (A, B).

    Consider AtomicSim(·) and SubSim(·). The first thing to notice is that the
set of candidate subsumers for the first measure is actually a subset of the set
of candidate subsumers for the second measure (O e ∩ NC ⊆ Sub(O)). However,
we need to notice also that the number of entailed subsumers in the two cases
need not to be proportionally related. Hence, the above examples of similarity
measures are, theoretically, non-approximations of each other.
4     Empirical evaluation
We carry out a comparison between the three measures GrSim(·), SubSim(·)
and AtomicSim(·) against human similarity judgments. We also include two
existing similarity measures in this comparison (Rada [2] and Wu & Palmer [4]).
We also study in detail the behaviour of our new family of measures in practice.
GrSim(·) is considered as the expensive and most precise measure in this study.
    To study the relation between the different measures in practice, we examine
the following properties: order-preservation, approximation from above/below
and correlation (using Pearson’s coefficient).

4.1   Experimental set-up
Part 1: Comparison against a gold-standard The similarity of 19 SNOMED-
CT concept pairs was calculated using the three methods along with Rada [2]
and Wu & Palmer [4] measures. We compare these similarities to human judge-
ments taken from the Pedersen et al.[1] test set.
    Part 2: Cheap vs. expensive measures A snapshot of BioPortal from
November 2012 was used as a corpus. It contains a total of 293 ontologies.
We excluded 86 ontologies which have only atomic subsumptions as for such
ontologies the behaviour of the considered measures will be identical, i.e., we
already know that AtomicSim(·) is good and cheap. Due to the large number
of classes and difficulty of spotting interesting patterns by eye, we calculated
the pairwise similarity for a sample of concepts from the corpus. The size of the
sample is 1,843 concepts with 99% confidence level. To ensure that the sample
encompasses concepts with different characteristics, we picked 14 concepts from
each ontology. The selection was not purely random. Instead, we picked 2 random
concepts and for each random concept we picked some neighbour concepts.

4.2   Results
How good is the expensive measure? Not surprisingly, GrSim and SubSim
had the highest correlation values with experts’ similarity (Pearson’s correlation
coefficient r = 0.87, p < 0.001). Secondly comes AtomicSim with r = 0.86.
Finally comes Wu & Palmer then Rada with r = 0.81 and r = 0.64 respectively.
Figure 1 shows the similarity curves for the 6 measures used in this comparison.
The new measures along with Wu & Palmer measure preserve the order of human
similarity more often than Rada measure. They mostly underestimated similarity
whereas the Rada measure was mostly overestimating human similarity.
    Cost of the expensive measure The average time per ontology taken to
calculate grammar-based similarities was 2.3 minutes (standard deviation σ =
10.6 minutes, median m = 0.9 seconds) and the maximum time was 93 minutes
for the Neglected Tropical Disease Ontology which is a SRIQ ontology with
1237 logical axioms, 252 concepts and 99 object properties. For this ontology,
the cost of AtomicSim(·) was only 15.545 sec and 15.549 sec for SubSim(·). 9 out
of 196 ontologies took over 1 hour to be processed. One thing to note about these
ontologies is the high number of logical axioms and object properties. Clearly,
GrSim(·) is far more costly than the other two measures. This is why we want
to know how good/bad a cheaper measure can be.
             Fig. 1: 6 Curves of similarity for 19 SNOMED clinical terms

    How good is a cheap measure? Although we have excluded all ontologies
with only atomic subsumptions from the study, in 12% of the ontologies the three
measures were perfectly correlated (r = 1, p < 0.001). These perfect correlations
indicate that, in some cases, the benefit of using an expensive measure is totally
neglectable.
    AtomicSim(·) and SubSim(·) did not preserve the order of GrSim(·) in 80%
and 73% of the ontologies respectively. Also, they were not approximations from
above nor from below in 72% and 64% of the ontologies respectively.
    Take a look at the African Traditional Medicine ontology in Figure 2. SubSim(·)
is 100% order-preserving while AtomicSim(·) is only 99% order-preserving.




   Fig. 2: African Traditional Medicine              Fig. 3: Platynereis Stage

   Note also the Platynereis Stage Ontology in Figure 3 in which both AtomicSim(·)
and SubSim(·) are 75% order-preserving. However, AtomicSim(·) was 100% ap-
proximating from above while SubSim(·) was 85% approximating from below.

References
1. T. Pedersen, S. Pakhomov, S. Patwardhan, and C. Chute. Measures of semantic
   similarity and relatedness in the biomedical domain. Journal of Biomedical Infor-
   matics, 30(3):288–299, 2007.
2. R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a
   metric on semantic nets. In IEEE Transaction on Systems, Man, and Cybernetics,
   volume 19, page 1730, 1989.
3. A. Tversky. Features of similarity. Psycological Review by the American Psycological
   Association, Inc., 84(4), July 1977.
4. Z. Wu and MS. Palmer. Verb semantics and lexical selection. In Proceedings of
   the 32nd. Annual Meeting of the Association for Computational Linguistics (ACL
   1994), page 133138, 1994.