<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CompositeMatch: Detecting N-Ary Matches in Ontology Alignment?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kelly Moran</string-name>
          <email>kmoran@ll.mit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kajal Claypool</string-name>
          <email>claypool@ll.mit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin Hescott</string-name>
          <email>hescott@cs.tufts.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MIT Lincoln Laboratory</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Tufts University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The field of ontology alignment still contains numerous unresolved problems, one of which is the accurate identification of composite matches. In this work, we present a context-sensitive ontology alignment algorithm, CompositeMatch, that identifies these matches, along with the typical one-to-one matches, by looking more broadly at the information that a concept's relationships confer. We show that our algorithm can identify composite matches with greater confidence than current tools.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>CompositeMatch is a three-pass algorithm that operates on two input ontologies
and produces an alignment file containing matches between them. The first phase
performs a linguistic match between the ontologies’ concepts. The linguistic match
assigns a normalized similarity score between 0 and 1.0 to each pair.</p>
      <p>In the second phase, the most uncertain pairs– collectively referred to as the
grey zone– are judged on contextual criteria to determine whether they should
be accepted as viable matches. The grey zone consists of all conflicting matches–
matches that contain the same concept, rendering unclear which match is the true
match for the concept– plus matches with a similarity score between the upper and
lower thresholds set prior to execution. The second phase defines two rules that
serve as a filtering process to increase the scores of contextually similar matches.</p>
      <p>The third and final phase is a post-processing phase that scours the matches
for possible composite matches, looking again at contextual criteria for any
indicative information, before outputting the final set of matches to the user.
Example 1. Consider the case shown in Figure 1. The first phase finds some
similarity between the concept MastersThesis in ontology O and concepts Masters
and Thesis in ontology O’, as well as PhDThesis in O and PhD and Thesis in O’.
Phase 2 compares the parents of these pairs but does not find sufficient contextual
similarity to augment the strengths of any of them. In Phase 3, the algorithm
finds that the conflicting matches identified earlier result in composite concepts:
the concepts MastersThesis and PhDThesis from O form a composite concept that
matches the composite concept between Thesis, Masters, and PhD in O’.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminary Results</title>
      <p>We evaluated the performance of CompositeMatch on two tests, the first being
the benchmark test from the Ontology Alignment Evaluation Initiative (OAEI)
2008. Because the OAEI benchmark does not account for composite matches, we
created a second test: a set of six ontologies, each a modified version of the OAEI
benchmark base case, into which composite matches were injected.</p>
      <p>We compared the results of CompositeMatch on the OAEI 2008 benchmark to
those of a high-performing OAEI 2008 entrant, RiMOM. The mean performance
of CompositeMatch on the tests is a precision of .926 and a recall of .557. RiMOM
has an overall precision of .939 and a recall of .802. We also considered the subset of
all tests not including random or foreign language labels as we believe this subset
is a better indicator of how CompositeMatch performs on its intended data sets.
CompositeMatch achieves a significantly higher precision and recall on this subset
of .996 and .896 respectively, and RiMOM increases slightly to .965 and .967.</p>
      <p>While further evaluation is needed, our preliminary results indicate that
CompositeMatch correctly identifies each composite match, achieving both a precision
and a recall of 1.0, while RiMOM identifies none.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>