<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Partitioning and Matching Tuning of Large Biomedical Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amir Laadhar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faiza Ghozzi</string-name>
          <email>faiza.ghozzi@isims.usf.tn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ryutaro Ichise</string-name>
          <email>ichise@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Imen Megdiche</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Franck Ravat</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivier Teste</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute of Informatics</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Toulouse University, IRIT (CNRS/UMR 5505)</institution>
          ,
          <addr-line>Toulouse</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Sfax, MIRACL</institution>
          ,
          <addr-line>Sfax</addr-line>
          ,
          <country country="TN">Tunisia</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>2.2</p>
      <p>
        Ontologies Partitioning
We employ the hierarchical agglomerative clustering technique to divide an
ontology into a set of partitions. This method is based on the equation 1 to compute
the structural similarity between the entities of the input ontologies. This
equation is inspired by Wu and Palmer [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] similarity measure. The partitioning of
every ontology results in a dendrogram. We cut each dendrogram automatically
in order to result in a set of partitions. We examine the output of all the possible
cuts until nding the rst cut which do not result in any isolated partitions.
Isolated partitions are partitions containing only one entity. We identify the similar
partition-pairs through the set of exact matchings between the input ontologies.
      </p>
      <p>StrcSim(ei;m; ei;n) =</p>
      <p>Dist(ri; lca)</p>
      <p>2
Dist(ei;m; lca) + Dist(ei;n; lca) + Dist(ri; lca)
3 Experiments Th = (2)
In Table 1, we compare our proposejdsi mpaSrctoitrioejning approach to the currently
available partitioning strategies using two OAEI 2017 biomedical data sets: the
Anatomy task and the LargeBio small segments tasks.</p>
      <p>Table 1. Anatomy track partitioning results</p>
    </sec>
    <sec id="sec-2">
      <title>Proposed approach SeeCOnt [3] Falcon [2] Alsayed et al. [1]</title>
      <p>Precision F-Measure Recall Number of partitions
0.945 0.883 0.829 57/57
0.951 0.863 0.789 ND
0.964 0.730 0.591 139/119
0.975 0.753 0.613 84/80</p>
      <p>We employed UBERON as an external biomedical knowledge for deriving
synthetic reference alignments. We use ISUB similarity measure to compute the
similarity scores between the derived mappings. In Table 2, we illustrate the
accuracy of the partitioning approach with the deduced thresholds.</p>
    </sec>
    <sec id="sec-3">
      <title>Anatomy FMA-NCI FMA-SNOMED SNOMED-NCI</title>
    </sec>
    <sec id="sec-4">
      <title>Precision F-Measure Recall Derived Threshold</title>
      <p>0.945 0.883 0.829 0.91
0.957 0.870 0.789 0.69
0.860 0.674 0.554 0.75
0.911 0.697 0.564 0.85
4 Conclusion and Future Work
As future work, we intend to automate all the matching tuning process while
focusing on di erent type of heterogeneity applied over the partitions-pairs.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Algergawy</surname>
            , Alsayed,
            <given-names>Sabine</given-names>
          </string-name>
          <string-name>
            <surname>Massmann</surname>
            , and
            <given-names>Erhard</given-names>
          </string-name>
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>"A clustering-based approach for large-scale ontology matching</article-title>
          .
          <source>" East European Conference on Advances in Databases and Information Systems</source>
          . Springer, Berlin, Heidelberg, (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Hu</surname>
            , Wei,
            <given-names>Yuzhong</given-names>
          </string-name>
          <string-name>
            <surname>Qu</surname>
          </string-name>
          , and Gong Cheng.
          <article-title>"Matching large ontologies: A divide-andconquer approach</article-title>
          .
          <source>" Data Knowledge Engineering</source>
          <volume>67</volume>
          .1, (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Algergawy</surname>
            , Alsayed, Samira Babalou,
            <given-names>Mohammad J.</given-names>
          </string-name>
          <string-name>
            <surname>Kargar</surname>
            , and
            <given-names>S. Hashem</given-names>
          </string-name>
          <string-name>
            <surname>Davarpanah</surname>
          </string-name>
          .
          <article-title>"Seecont: A new seeding-based clustering approach for ontology matching."</article-title>
          <source>In East European Conference on Advances in Databases and Information Systems</source>
          , Springer (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Wu</surname>
            , Zhibiao, and
            <given-names>Martha</given-names>
          </string-name>
          <string-name>
            <surname>Palmer</surname>
          </string-name>
          .
          <article-title>"Verbs semantics and lexical selection."</article-title>
          <source>In Proceedings of the 32nd annual meeting on Association for Computational Linguistics</source>
          , (
          <year>1994</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>