<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On Partitioning for Ontology Alignment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sunny Pereira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valerie Cross</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ernesto Jiménez-Ruiz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Miami University</institution>
          ,
          <addr-line>Oxford, OH</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Oslo</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontology alignment (OA) for two very large ontologies becomes time consuming and memory intensive. A general approach to address these challenges is to partition each ontology into cohesive blocks. (i.e., partitions). Ontology partitioning brings new challenges: how best to partition each ontology into blocks and whether the partitioning process on each ontology should be independent of each other. In this paper, we present preliminary work to determine the suitability of partitioning strategies to improve the performance of OA systems, especially those unable to cope with the largest datasets. The PBM (Partition Block Matching) [2,3], PAP (partition, anchor, partition) and APP (anchor, partition, partition) [1] partitioning methods have been implemented as independent methods from the alignment system. In the preliminary experiments included in this paper we report results for the systems LogMap [4] and FCA-Map [7]. In [1], [2], and [3] a path-based semantic [6] similarity measure is used to determine link strength between concepts within an ontology when creating blocks. In these experiments, the path-based Wu-Palmer [6] as well as information content based Lin [5] semantic similarity measures are considered. The ontology structure is used in determining the information content (IC) for a concept. The link strengths are calculated between concepts that only differ by one in their depth within the ontology. The authors of the PBM method use ISUB to find the anchors between concepts. In our experiments, anchors are found using an exact label match between two concepts in the two different ontologies. Each identified block pair represents a matching (sub)task, however, since blocks are only characterized by a set of concepts, they are first converted to (logical) ontology modules and then given to the ontology alignment system as input. The initial experiments were performed on task 1 of the OAEI largebio track,1 involving small fragments of FMA and NCI, using all three methods. The results using Wu-Palmer are shown below in Table 1 and those for Lin in Table 2. The parameters used are an of 0.05 for PBM, an of 0.75 for APP. A maximum block size of 500 and a depth difference of one for semantic similarity calculation is used for all three methods. Blocks with only one concept are considered isolated blocks. Coverage represents how many of the entities occurring in the OAEI reference alignments are present in the identified block pairs. The precision and recall are calculated over the combined alignment results for all the matching tasks (i.e., pair of modules extracted from the block pairs). FMA blocks (resp. NCI blocks) represents the number of total blocks produced after partitioning of the FMA ontology (resp. NCI ontology). The results from task 1 suggest that the PBM method provides much higher recall values than the other two methods. The Wu-Palmer measure performed slightly better than Lin. The next experiments examined how the PBM with the Wu-Palmer performed on the OAEI largebio tasks that use the whole ontologies, that is, task 2, task 4 and task 6. The maximum block size is 3000. Table 3 presents these results. 1 http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Methods</title>
      <p>MTaatcshkisng Coverage Precision Recall Partitioning Matching
Time (s)
83 0.801 0.833 0.728 52.454 81.689
37 0.348 0.861 0.321 56.508 39.423
46 0.483 0.862 0.439 56.704 49.938
2</p>
    </sec>
    <sec id="sec-2">
      <title>Discussion and future work</title>
      <p>In this paper we have presented a preliminary evaluation of state of the art partitioning
algorithms for ontology alignment. The obtained results are not good as expected since,
after the partitioning and identification of the (sub)matching tasks, the coverage of the
entities in the reference alignments is rather low. For example, in the FMA-SNOMED
case only 59% of the entities appearing in the reference alignment are covered by the
modules in the identified matching tasks. In this case 41% of the entities were lost in
either isolated blocks or blocks for which a suitable pair could not be found.</p>
      <p>As expected, given the coverage of entities in the reference alignment, the results
obtained by LogMap are very low as compared to the results reported for LogMap in last
OAEI campaign. In addition the partitioning step represents a considerable overhead
with respect LogMap’s computation times. Nevertheless, FCA-Map was successfully
run in task 2 of the largebio track using partitioning,2 while the system could not cope
with the task when given the whole FMA and NCI ontologies.</p>
      <p>In the close future we aim at investigating new algorithms to provide a suitable
partitioning for ontology alignment where the loss of coverage in the identified (sub)matching
tasks, in terms of entities of the reference alignments, is minimized. We also intend to
perform an extensive evaluation of the novel partitioning algorithms with all OAEI
participating systems, especially those failing to cope with the largest tasks.
2 Not tested in tasks 4 and 6 due to limited experimental time</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hamdi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , et al.:
          <article-title>Alignment-based partitioning of large-scale ontologies</article-title>
          .
          <source>SCI</source>
          , vol.
          <volume>292</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Block matching for ontologies</article-title>
          .
          <source>In: Int'l Sem. Web Conf</source>
          . (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , et al.:
          <article-title>Matching large ontologies: A divide-and-conquer approach</article-title>
          .
          <source>DKE</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jiménez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca-Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>LogMap: Logic-based and scalable ontology matching</article-title>
          .
          <source>In: ISWC</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , et al.:
          <article-title>An information-theoretic definition of similarity</article-title>
          . In: ICML (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Verbs semantics and lexical selection</article-title>
          .
          <source>In: ACL</source>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , S.:
          <article-title>FCA-Map results for OAEI 2016</article-title>
          . In: Ontology Matching (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>