<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Measuring Similarity in Ontologies: A new family of measures</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tahani Alsubait</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bijan Parsia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Uli Sattler</string-name>
          <email>sattlerg@cs.man.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science, The University of Manchester</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>5</lpage>
      <abstract>
        <p>Similarity measurement is important for numerous applications. Be it classical information retrieval, clustering, ontology matching or various other applications. It is also known that similarity measurement is di cult. This can be easily seen by looking at the several attempts that have been made to develop similarity measures, see for example [2, 4]. The problem is also well-founded in psychology and a number of psychological models of similarity have been already developed, see for example [3]. Rather than adopting a psychological model for similarity as a foundation, we noticed that some existing similarity measures for ontologies are ad-hoc and unprincipled. In addition, there is still a need for similarity measures which are applicable to expressive Description Logics (DLs) (i.e., beyond E L) and which are terminological (i.e., do not require an ABox). To address these requirements, we have developed a new family of similarity measures which are founded on the feature-based psychological model [3]. The individual measures vary in their accuracy/computational cost based on which features they consider. To date, there has been no thorough empirical investigation of similarity measures. This has motivated us to carry out two separate empirical studies. First, we compare the new measures along with some existing measures against a gold-standard. Second, we examine the practicality of using the new measures over an independently motivated corpus of ontologies (BioPortal library) which contains over 300 ontologies. We also examine whether cheap measures can be an approximation of some more computationally expensive measures. In addition, we explore what could possibly could wrong when using a cheap similarity measure.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>We aim at similarity measures for general OWL ontologies and thus a naive
implementation of this approach would be trivialised because a concept has
innitely many subsumers. To overcome this, we present re nements for the
similarity function in which we do not count all subsumers but consider subsumers
from a set of (possibly complex) concepts of a concept language L. Let C and
D be concepts, let O be an ontology and let L be a concept language. We set:</p>
      <p>S(C; O; L) = fD 2 L(Oe) j O j= C v Dg</p>
      <p>Com(C; D; O; L) = S(C; O; L) \ S(D; O; L)
Union(C; D; O; L) = S(C; O; L) [ S(D; O; L)</p>
      <p>Sim(C; D; O; L) = jCom(C; D; O; L)j</p>
      <p>jU nion(C; D; O; L)j
To design a new measure, it remains to specify the set L. For example:
AtomicSim(C; D) = Sim(C; D; O; LAtomic(Oe)); and LAtomic(Oe) = Oe \ NC :
SubSim(C; D) = Sim(C; D; O; LSub(Oe)); and LSub(Oe) = Sub(O):
GrSim(C; D) = Sim(C; D; O; LG(Oe)); and LG(Oe) = fE j E 2 Sub(O)
or E = 9r:F; for some r 2 Oe \ NR and F 2 Sub(O)g:
where Oe is the signature of O, NC is the set of concept names and Sub(O) is the
set of concept expressions in O. The rationale of SubSim( ) is that it provides
similarity measurements that are sensitive to the modeller's focus. To capture
more possible subsumers, one can use GrSim( ) for which the grammar can be
extended easily.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Approximations of similarity measures</title>
      <p>Some measures might be practically ine cient due to the large number of
candidate subsumers. For this reason, it would be nice if we can examine whether
a \cheap" measure can be a good approximation for a more expensive one.
De nition 1 Given two similarity functions Sim( ), Sim0( ), we say that:
{ Sim0( ) preserves the order of Sim( ) if 8A1; B1; A2; B2 2 Oe: Sim(A1; B1)</p>
      <p>Sim(A2; B2) =) Sim0(A1; B1) Sim0(A2; B2).
{ Sim0( ) approximates Sim( ) from above if 8A; B 2 Oe: Sim(A; B)</p>
      <p>Sim0(A; B).
{ Sim0( ) approximates Sim( ) from below if 8A; B 2 Oe: Sim(A; B)
Sim0(A; B).</p>
      <p>Consider AtomicSim( ) and SubSim( ). The rst thing to notice is that the
set of candidate subsumers for the rst measure is actually a subset of the set
of candidate subsumers for the second measure (Oe \ NC Sub(O)). However,
we need to notice also that the number of entailed subsumers in the two cases
need not to be proportionally related. Hence, the above examples of similarity
measures are, theoretically, non-approximations of each other.</p>
    </sec>
    <sec id="sec-3">
      <title>Empirical evaluation</title>
      <p>
        We carry out a comparison between the three measures GrSim( ), SubSim( )
and AtomicSim( ) against human similarity judgments. We also include two
existing similarity measures in this comparison (Rada [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and Wu &amp; Palmer [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]).
We also study in detail the behaviour of our new family of measures in practice.
GrSim( ) is considered as the expensive and most precise measure in this study.
      </p>
      <p>To study the relation between the di erent measures in practice, we examine
the following properties: order-preservation, approximation from above/below
and correlation (using Pearson's coe cient).
4.1</p>
      <sec id="sec-3-1">
        <title>Experimental set-up Part 1: Comparison against a gold-standard The similarity of 19 SNOMED</title>
        <p>
          CT concept pairs was calculated using the three methods along with Rada [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
and Wu &amp; Palmer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] measures. We compare these similarities to human
judgements taken from the Pedersen et al.[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] test set.
        </p>
        <p>Part 2: Cheap vs. expensive measures A snapshot of BioPortal from
November 2012 was used as a corpus. It contains a total of 293 ontologies.
We excluded 86 ontologies which have only atomic subsumptions as for such
ontologies the behaviour of the considered measures will be identical, i.e., we
already know that AtomicSim( ) is good and cheap. Due to the large number
of classes and di culty of spotting interesting patterns by eye, we calculated
the pairwise similarity for a sample of concepts from the corpus. The size of the
sample is 1,843 concepts with 99% con dence level. To ensure that the sample
encompasses concepts with di erent characteristics, we picked 14 concepts from
each ontology. The selection was not purely random. Instead, we picked 2 random
concepts and for each random concept we picked some neighbour concepts.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Results</title>
        <p>How good is the expensive measure? Not surprisingly, GrSim and SubSim
had the highest correlation values with experts' similarity (Pearson's correlation
coe cient r = 0:87; p &lt; 0:001). Secondly comes AtomicSim with r = 0:86.
Finally comes Wu &amp; Palmer then Rada with r = 0:81 and r = 0:64 respectively.
Figure 1 shows the similarity curves for the 6 measures used in this comparison.
The new measures along with Wu &amp; Palmer measure preserve the order of human
similarity more often than Rada measure. They mostly underestimated similarity
whereas the Rada measure was mostly overestimating human similarity.</p>
        <p>Cost of the expensive measure The average time per ontology taken to
calculate grammar-based similarities was 2.3 minutes (standard deviation =
10:6 minutes, median m = 0:9 seconds) and the maximum time was 93 minutes
for the Neglected Tropical Disease Ontology which is a SRIQ ontology with
1237 logical axioms, 252 concepts and 99 object properties. For this ontology,
the cost of AtomicSim( ) was only 15.545 sec and 15.549 sec for SubSim( ). 9 out
of 196 ontologies took over 1 hour to be processed. One thing to note about these
ontologies is the high number of logical axioms and object properties. Clearly,
GrSim( ) is far more costly than the other two measures. This is why we want
to know how good/bad a cheaper measure can be.</p>
        <p>How good is a cheap measure? Although we have excluded all ontologies
with only atomic subsumptions from the study, in 12% of the ontologies the three
measures were perfectly correlated (r = 1; p &lt; 0:001). These perfect correlations
indicate that, in some cases, the bene t of using an expensive measure is totally
neglectable.</p>
        <p>AtomicSim( ) and SubSim( ) did not preserve the order of GrSim( ) in 80%
and 73% of the ontologies respectively. Also, they were not approximations from
above nor from below in 72% and 64% of the ontologies respectively.</p>
        <p>Take a look at the African Traditional Medicine ontology in Figure 2. SubSim( )
is 100% order-preserving while AtomicSim( ) is only 99% order-preserving.</p>
        <p>Note also the Platynereis Stage Ontology in Figure 3 in which both AtomicSim( )
and SubSim( ) are 75% order-preserving. However, AtomicSim( ) was 100%
approximating from above while SubSim( ) was 85% approximating from below.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>T.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pakhomov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Patwardhan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Chute</surname>
          </string-name>
          .
          <article-title>Measures of semantic similarity and relatedness in the biomedical domain</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          ,
          <volume>30</volume>
          (
          <issue>3</issue>
          ):
          <volume>288</volume>
          {
          <fpage>299</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>R.</given-names>
            <surname>Rada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mili</surname>
          </string-name>
          , E. Bicknell, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Blettner</surname>
          </string-name>
          .
          <article-title>Development and application of a metric on semantic nets</article-title>
          .
          <source>In IEEE Transaction on Systems, Man, and Cybernetics</source>
          , volume
          <volume>19</volume>
          , page 1730,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Tversky</surname>
          </string-name>
          .
          <article-title>Features of similarity. Psycological Review by the American Psycological Association</article-title>
          , Inc.,
          <volume>84</volume>
          (
          <issue>4</issue>
          ),
          <year>July 1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          and MS. Palmer.
          <article-title>Verb semantics and lexical selection</article-title>
          .
          <source>In Proceedings of the 32nd</source>
          .
          <article-title>Annual Meeting of the Association for Computational Linguistics (ACL</article-title>
          <year>1994</year>
          ), page 133138,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>