-

Partitioning and Matching Tuning of Large Biomedical Ontologies

Amir Laadhar

Faiza Ghozzi

faiza.ghozzi@isims.usf.tn 2

Ryutaro Ichise

ichise@nii.ac.jp 0

Imen Megdiche

Franck Ravat

Olivier Teste

1 0 National Institute of Informatics , Tokyo , Japan 1 Toulouse University, IRIT (CNRS/UMR 5505) , Toulouse , France 2 University of Sfax, MIRACL , Sfax , Tunisia

2.2

Ontologies Partitioning We employ the hierarchical agglomerative clustering technique to divide an ontology into a set of partitions. This method is based on the equation 1 to compute the structural similarity between the entities of the input ontologies. This equation is inspired by Wu and Palmer [ 4 ] similarity measure. The partitioning of every ontology results in a dendrogram. We cut each dendrogram automatically in order to result in a set of partitions. We examine the output of all the possible cuts until nding the rst cut which do not result in any isolated partitions. Isolated partitions are partitions containing only one entity. We identify the similar partition-pairs through the set of exact matchings between the input ontologies.

StrcSim(ei;m; ei;n) =

Dist(ri; lca)

2 Dist(ei;m; lca) + Dist(ei;n; lca) + Dist(ri; lca) 3 Experiments Th = (2) In Table 1, we compare our proposejdsi mpaSrctoitrioejning approach to the currently available partitioning strategies using two OAEI 2017 biomedical data sets: the Anatomy task and the LargeBio small segments tasks.

Table 1. Anatomy track partitioning results

Proposed approach SeeCOnt [3] Falcon [2] Alsayed et al. [1]

Precision F-Measure Recall Number of partitions 0.945 0.883 0.829 57/57 0.951 0.863 0.789 ND 0.964 0.730 0.591 139/119 0.975 0.753 0.613 84/80

We employed UBERON as an external biomedical knowledge for deriving synthetic reference alignments. We use ISUB similarity measure to compute the similarity scores between the derived mappings. In Table 2, we illustrate the accuracy of the partitioning approach with the deduced thresholds.

Anatomy FMA-NCI FMA-SNOMED SNOMED-NCI Precision F-Measure Recall Derived Threshold

0.945 0.883 0.829 0.91 0.957 0.870 0.789 0.69 0.860 0.674 0.554 0.75 0.911 0.697 0.564 0.85 4 Conclusion and Future Work As future work, we intend to automate all the matching tuning process while focusing on di erent type of heterogeneity applied over the partitions-pairs.

1. Algergawy , Alsayed, Sabine

Massmann , and Erhard

Rahm . "A clustering-based approach for large-scale ontology matching . " East European Conference on Advances in Databases and Information Systems . Springer, Berlin, Heidelberg, ( 2011 ).

2. Hu , Wei, Yuzhong Qu , and Gong Cheng. "Matching large ontologies: A divide-andconquer approach . " Data Knowledge Engineering 67 .1, ( 2008 ).

3. Algergawy , Alsayed, Samira Babalou, Mohammad J.

Kargar , and S. Hashem

Davarpanah . "Seecont: A new seeding-based clustering approach for ontology matching." In East European Conference on Advances in Databases and Information Systems , Springer ( 2015 ).

4. Wu , Zhibiao, and Martha Palmer . "Verbs semantics and lexical selection." In Proceedings of the 32nd annual meeting on Association for Computational Linguistics , ( 1994 ).