FCAMapX results for OAEI 2018 Guowei Chen1,2 and Songmao Zhang2 1 University of Chinese Academy of Sciences 2 Institute of Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, P.R. China gwch@amss.ac.cn smzhang@math.ac.cn Abstract. FCAMapX is an automated ontology matching system based on Formal Concept Analysis, a mathematical model for analyzing indi- viduals and structuring concepts. FCAMapX has succeeded in partici- pating in three tracks of 2018 OAEI this year, including the Conference track, Anatomy, and Large Biomedical Ontologies. Based on our 2016 OAEI submission system FCA-Map which failed some large tasks within a designated time, we pursue improvements in efficiency and precision in FCAMapX. Concretely, we optimize the data structures for saving mem- ory space and implement a more efficient algorithm for computing formal concept lattices. To favor precision, we tighten the condition for identify- ing lexical mappings and strengthen the structural validation to retrieve negative evidence for matches identified lexically and structurally. As a result, the running time for all the tasks has become less than an hour in our experimental setting; and in a majority of the cases, the precision and F-measure are both improved while the recall is lowered. Additionally, in comparison with other OAEI participants, FCAMapX has achieved the best or the second best F-measure and recall in most large biomedical ontology matching tasks. 1 Presentation of the system Based on our 2016 OAEI participant system FCA-Map [1,3], this edition, called FCAMapX, pursues to improve the efficiency and precision. 1.1 State, purpose, general statement In OAEI 2016, we submitted FCA-Map, a novel system based on Formal Con- cept Analysis to identify and validate mappings across ontologies, including one- to-one mappings and complex mappings. FCA-Map incrementally generates a total of three types of formal contexts and extracts mappings from the lattices derived. First, the token-based formal context describes how class names, labels, and synonyms share lexical tokens, leading to lexical mappings (anchors) across ontologies. Second, the relation-based formal context describes how classes are in taxonomic, partonomic and disjoint relationships with the anchors, leading 2 Guowei Chen and Songmao Zhang to positive and negative structural evidence for validating the lexical match- ing. Third, the positive relation-based context can be used to discover structural mappings. The 2016 OAEI evaluation in the Anatomy, the Large Biomedical On- tologies, and the Disease and Phenotype track demonstrates the effectiveness of FCA-Map and its competitiveness with the top-ranked systems. For SNOMED- NCI(whole), the largest ontology matching task in OAEI, FCA-Map ranks first for recall and second for F-measure; ranks second for both F-measures of FMA- NCI and FMA-SNOMED, and obtains the best F-measures for most Disease and Phenotype tasks [3]. On the other hand, FCA-Map suffers from long running times due to the high complexity of deriving formal concept lattice in the Formal Concept Analysis formalism, which is a PSPACE-complete problem. Moreover, the performance of FCA-Map in terms of precision is relatively poorer than of recall and F-measure. We intend to address these two issues in the 2018 edition FCAMapX. 1.2 Specific techniques used In order to improve the efficiency, we optimize the data structures for saving memory space and implement a more efficient algorithm Hermes [4] for com- puting formal concept lattice and Galois sub-hierarchy. The Hermes algorithm has an efficient running time of O(min{nm, nα }), where n is the number of ob- jects or attributes, m the size of formal context, and nα the time required to perform matrix multiplication (currently α = 2.376). To improve the precision, we tighten the condition for identifying lexical mappings from the token-based lattice computed in the first step. Moreover, the second step for structural vali- dation and the third step for structural mapping are swapped so that the positive and negative evidence can be retrieved for all mappings identified, lexically and structurally. This can favor precision as mappings with negative evidence are discarded. 1.3 Adaptations made for the evaluation Similarly to our previous edition, our SEALS submission included precomputed word variants originated from UMLS[5] for mapping biomedical ontologies. More- over, in order to augment the performance of FCAMapX in mapping ontologies in general purpose domains like those of the Conference track, we used the synsets of WordNet[6] in the first step for identifying synonymous terms. Property names in the Conference ontologies are also taken into account when constructing the token-based formal context for lexical mapping. 1.4 Link to the system and parameters file SEALS wrapped version of FCAMapX for OAEI 2018 is available at https: //drive.google.com/open?id=1-0upxrcPbu5OVJAJn-DtTOUMOh3QDriM. FCAMapX results for OAEI 2018 3 1.5 Link to the set of provided alignments The results obtained by FCAMapX for OAEI 2018 are available at https:// drive.google.com/open?id=1DzRD_90O3YwoGpW5FJL9vSy_f1Ia0YZo 2 Results In this section, we present our evaluation results obtained by running FCAMapX over the tracks of Anatomy, Conference, and Large Biomedical Ontologies. Tests were performed using a desktop computer with 16 GB of RAM and Intel R CoreTM i7-8700 CPU @ 3.20GHz. 2.1 The OAEI 2018 Anatomy Track The anatomy track consists of the Adult Mouse Anatomy (2744 classes) and a fragment of the NCI Thesaurus (3304 classes) for describing the human anatomy. Compared with our 2016 version, FCAMapX has improved the precision from 0.932 to 0.941, whereas the recall is decreased from 0.837 to 0.791, leading to a drop of the F-Measure from 0.882 to 0.860, as shown in Table 1). Table 1. Results for Anatomy track Task Precision Recall F-Measure Runtime (s) MA-NCI 0.941 0.791 0.860 11.811 2.2 The OAEI 2018 Conference Track The Conference 2018 Track contains 16 ontologies describing the domain of con- ference organizations. These ontologies are of smaller scale with limited classes and semantic relations, for which our approach can be ineffective, as analyzed in [3]. In this edition, we add external knowledge source WordNet and the re- sults are listed in Table 2. Taking advantage of the additional synonyms defined in WordNet for general purpose domains, FCAMapX has increased the average recall from 0.52 to 0.582 and the average F-measure from 0.61 to 0.62, while the precision drops from 0.75 to 0.698. 2.3 The OAEI 2018 Large Biomedical Ontologies Track This track consists of finding alignments between the Foundational Model of Anatomy (FMA), SNOMED CT, and the National Cancer Institute Thesaurus (NCI). These ontologies are of both large-scale and semantic richness. The results obtained by FCAMapX are depicted in Table 3. Except for FMA-NCI (small), in all other five tasks, FCAMapX has managed to increase the precision as well as 4 Guowei Chen and Songmao Zhang Table 2. Results for Conference track Task Precision Recall F-Measure Runtime (s) cmt-conference 0.563 0.600 0.581 1.194 cmt-confOf 0.667 0.375 0.480 0.291 cmt-edas 0.615 0.615 0.615 0.391 cmt-ekaw 0.556 0.455 0.500 0.254 cmt-iasted 0.500 1.000 0.667 0.546 cmt-sigkdd 0.750 0.750 0.750 0.222 conference-confOf 0.818 0.600 0.692 0.243 confenrece-edas 0.600 0.529 0.562 0.355 conference-ekaw 0.619 0.520 0.565 0.273 conference-iasted 0.364 0.286 0.320 0.466 conference-sigkdd 0.750 0.600 0.667 0.223 confOf-edas 0.846 0.579 0.687 0.304 confOf-ekaw 0.857 0.600 0.706 0.24 confOf-iasted 0.857 0.667 0.750 0.403 conOf-sigkdd 1.000 0.571 0.727 0.193 edas-ekaw 0.647 0.478 0.550 0.343 edas-iasted 0.727 0.421 0.533 0.48 edas-sigkdd 0.875 0.467 0.609 0.293 ekaw-iasted 0.462 0.600 0.522 0.467 ekaw-sigkdd 0.778 0.636 0.700 0.235 iasted-sigkdd 0.813 0.867 0.839 0.534 the F-measure while the recall values are lowered. Take FMA-SNOMED (whole) for example, the precision is 1.8 times of the 2016 version and the F-measure 1.4 times. More importantly, in our own experimental setting, FCAMapX fin- ished all tasks in the Large Biomedical track within 2 hours as required by 2016 OAEI, whereas our 2016 system failed the three Whole tasks. For the largest task SNOMED-NCI (whole), our previous version ran about 13 hours as reported in [3], and by FCAMapX, the time has been downsized to 0.95 hours. Table 3. Results for Large Biomedical track Task Precision Recall F-Measure Runtime (s) FMA-NCI (small) 0.948 0.911 0.929 73.692 FMA-NCI (whole) 0.665 0.841 0.743 1171.62 FMA-SNOMED (small) 0.955 0.815 0.879 125.791 FMA-SNOMED (whole) 0.819 0.762 0.789 2179.924 SNOMED-NCI (small) 0.878 0.703 0.781 1039.138 SNOMED-NCI (whole) 0.796 0.680 0.733 3418.672 FCAMapX results for OAEI 2018 5 As reported by OAEI 3 , out of the six tasks in the track, FCAMapX ranks first for three and second for two tasks in terms of recall; and for F-measure, FCAMapX ranks first for two and second for three tasks. 3 General comments This is the second time that we participate in the OAEI campaign with our Formal Concept Analysis based systems. The main goal is to improve the effi- ciency in regard to our 2016 edition which failed to finish within the designated time for three tasks in Large Biomedical Ontologies track. This has been accom- plished by FCAMapX. At the same time, strengthening the structural validation of mappings has yielded higher precisions which can lead to better F-measure values. 3.1 Comments on the results FCAMapX has succeeded in participating in three tracks this year, including the Conference track, Anatomy, and Large Biomedical Ontologies. The running time for all the tasks has become less than an hour now in our experimental setting. In a majority of the cases, the precision and F-measure are both improved while the recall is lowered. That FCAMapX performs unsatisfactorily for FMA-NCI (small) in comparison with our 2016 system deserves a further explanation. 3.2 Discussions on the way to improve the proposed system We intended to run FCAMapX on the Disease and Phenotype track where our previous 2016 system performs competitively [1,3]. The results in our own setting against the consensus alignments with vote 3 are listed in Table 4, where the matching tasks involve the Human Phenotype (HP) Ontology, the Mammalian Phenotype (MP) Ontology, the Human Disease Ontology (DOID), and the Or- phanet and Rare Diseases Ontology (ORDO). Note that these results cannot be compared with our 2016 system, as the version and source of the four ontologies are different from the ones used in 2016 4 . Unfortunately, FCAMapX failed this track with errors as reported by the OAEI evaluation. This indicates that the quality of the system shall be improved. Table 4. Results for Disease and Phenotype track Task Precision Recall F-Measure Runtime (s) HP-MP 0.848 0.760 0.802 2368.376 DOID-ORDO 0.869 0.729 0.793 450.134 3 http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2018/results/ 4 http://oaei.ontologymatching.org/2018/phenotype/ 6 Guowei Chen and Songmao Zhang 3.3 Comments on the OAEI procedure With our participating experience this year, we find that OAEI is well organized in an efficient way and organizers helpful. Various tracks have different levels of difficulty, which is challenging and appealing, and the SEALS platform is very convenient to use. 4 Conclusions In this paper, we present FCAMapX as an improved version of our 2016 OAEI system FCA-Map. The improvement mainly lies in the efficiency, as illustrated by the dramatic drop of running times, for instance from 13 to 1 hour for the largest OAEI task. The second improvement is on the mapping precision which normally causes the F-measure to rise. Compared with other OAEI participants, FCAMapX has achieved the best or the second best F-measure and recall in five out of the six large biomedical ontology matching tasks. Despite these, our system still has a long way to go in terms of covering all OAEI tracks, especially those instance matching tasks for which the Formal Concept Analysis formalism has a potential to prevail with its capability of clustering commonalities among individuals. Acknowledgements This work has been supported by the National Key Research and Development Program of China under grant 2016YFB1000902, and the Natural Science Foun- dation of China grant 61621003. References 1. Zhao, M., & Zhang, S. (2016, December). FCA-Map results for OAEI 2016. In OM@ ISWC (pp. 172-177). 2. Zhao, M., & Zhang, S. (2016, December). Identifying and validating ontology map- pings by formal concept analysis. In OM@ ISWC (pp. 61-72). 3. Zhao, M., Zhang, S., Li, W., & Chen, G. (2018). Matching biomedical ontologies based on formal concept analysis. Journal of biomedical semantics, 9(1), 11. 4. Berry, A., Huchard, M., Napoli, A., & Sigayret, A. (2012, October). Hermes: an efficient algorithm for building Galois sub-hierarchies. In CLA: Concept Lattices and their Applications (pp. 21-32). Universidad de Malaga. 5. Lindberg DA, Humphreys BL, McCray AT, et al. The unified medical language system. IMIA Yearbook. 1993;32(4):28191. 6. G. A. Miller. WordNet: A Lexical Database for English. Communications of the ACM, 38(11):3941, 1995. 7. de Souza, K.X.S., Davis, J.: Aligning ontologies and evaluating concept similari- ties. In: OTM Confederated International Conferences On the Move to Meaningful Internet Systems, Springer (2004) 10121029 FCAMapX results for OAEI 2018 7 8. Guan-yu, L., Shu-peng, L., et al.: Formal concept analysis based ontology merging method. In: Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on. Volume 8., IEEE (2010) 279282 9. Obitko, M., Snsel, V., Smid, J.: Ontology design with formal concept analysis. CLA 128(3) (2004) 13771390 10. Stumme, G., Maedche, A.: Fca-merge: Bottom-up merging of ontologies. In: IJCAI. Volume 1. (2001) 225230 11. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of con- cepts. In: Ordered sets. Springer (1982) 445470 12. Xu, X., Wu, Y., Chen, J.: Fuzzy fca based ontology mapping. In: 2010 First In- ternational Conference on Networking and Distributed Computing, IEEE (2010) 181185 13. Zhang, S., Bodenreider, O.: Experience in aligning anatomical ontologies. Interna- tional journal on Semantic Web and information systems 3(2) (2007) 1