<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Partitioning-based Ontology Matching Approaches: A Comparative Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alsayed Algergawy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Friederike Klan</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Birgitta Konig-Ries</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Engineering, Tanta University</institution>
          ,
          <country country="EG">Egypt</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Computer Science, Friedrich Schiller University of Jena</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Generic Framework. Ontology matching is the process that takes two or more ontologies to identify semantically corresponding entities across them. As the numbers of developed ontologies as well as the number of entities in each ontology are increasing, traditional approaches to ontology matching fail or are not able to scale. Therefore, there is a growing need for new matching algorithms. A common approach to deal with the large-scale matching problem is the partitioning-based technique [5]. To make these techniques comparable, we propose a generic framework containing the following phases (shown in Fig. 1): - Prematch. This phase aims to prepare input ontologies for matching. It starts by parsing and representing input ontologies as graphs, called ontology graphs. The input ontology graphs are then partitioned into a set of sub-ontologies such that entities belonging to one partition are similar (have some common features) while entities from different partitions are Fig. 1: Partitioning-based matching steps. dissimilar. The partitioning process may extend from using simple ad hoc rules [2] to clustering algorithms [1,4]. The task now is to determine which partitions of the two sets are sufficiently similar and thus worth to be matched in more detail. The goal is to reduce the matching overhead by avoiding to find correspondences between unrelated partitions. - Match. Once settling on similar partitions (clusters) of the two ontologies, the next step is to fully match similar clusters to obtain the correspondences between their elements. Each pair of similar partitions represents an individual match task that is independently solved. - Postmatch. Local match results should be merged (combined) to generate the final match result. The Postmatch phase is also concerned with matching cardinality and mapping representation. Matching Systems: A Comparison. We aim to present partitioning-based approaches fitting to the algorithmic steps identified above indicating which part of the solution is covered by which prototypes, thereby supporting a comparison of these approaches. We notice that all these approaches use the graph</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        data structure as the internal data representation. However, they utilize
different algorithms to partition the ontology graph. Falcon-AO [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and the
extension of COMA++ [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] employ an agglomerative clustering algorithm, which
independently partitions input ontologies. To dependently partition ontologies,
TaxoMap [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] uses a co-clustering technique. It is worth noting that some
matching approaches first partition the ontology graphs and then determine
similar partitions such as COMA++ [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ] and Falcon-AO [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], while others
determine similar partitions during the partitioning process such as TaxoMap [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
We also observe that to determine similar partitions the matching approaches
use different methods extending from exploiting only the partitions’ roots, e.g.
COMA++ [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], to exploiting the whole partition information, e.g. Falcon-AO [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Some other approaches compromise between the two extremes, e.g. the extension
of COMA++ [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] exploits entity names to find similar partitions.
      </p>
      <p>From the matching phase point of view, each matching system uses its own
matching strategy which exploits linguistic and structural features of
ontologies. Some of these systems make use of existing matching strategies, such as
TaxoMap (using the Falcon-AO match strategy) and the Unbalanced OM
approach utilizing the similarity flooding algorithm. More specifically, this means
that these matching systems do not implement matching strategies specific to
this kind of matching, however, they utilize off-the-shelf matching strategies.</p>
      <p>It is also worth noting that some matching approaches interlink between the
last two phases, i.e. they do not focus on getting local match results for each
matching task, but directly construct the final match result. Other matching
approaches, like COMA++, first consider each match task as a completely
independent match task getting its own local results and then merge or combine
these local results to get the final match result.</p>
      <p>Future Directions. In this paper we introduced a first conceptual comparison
of partitioning-based matching approaches. This will be followed up by an
experimental evaluation to determine which combination of approaches works best
in which circumstances and to identify necessary areas of improvement.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Algergawy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Massmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>A clustering-based approach for largescale ontology matching</article-title>
          .
          <source>In ADBIS</source>
          , pages
          <fpage>415</fpage>
          -
          <lpage>428</lpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Do</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>Matching large schemas: Approaches and evaluation</article-title>
          .
          <source>Information Systems</source>
          ,
          <volume>32</volume>
          (
          <issue>6</issue>
          ):
          <fpage>857</fpage>
          -
          <lpage>885</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>F.</given-names>
            <surname>Hamdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Safar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Reynaud</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Zargayouna</surname>
          </string-name>
          .
          <article-title>Alignment-based partitioning of large-scale ontologies</article-title>
          .
          <source>In SCI</source>
          , volume
          <volume>292</volume>
          , pages
          <fpage>251</fpage>
          -
          <lpage>269</lpage>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qu</surname>
          </string-name>
          , and G. Cheng.
          <article-title>Matching large ontologies: A divide-and-conquer approach</article-title>
          .
          <source>DKE</source>
          ,
          <volume>67</volume>
          :
          <fpage>140</fpage>
          -
          <lpage>160</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>E.</given-names>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>Schema Matching and Mapping, chapter Towards Large-scale Schema and Ontology Matching</article-title>
          , pages
          <fpage>3</fpage>
          -
          <lpage>27</lpage>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>