=Paper=
{{Paper
|id=Vol-3324/oaei22_paper11
|storemode=property
|title=Matcha and Matcha-DL results for OAEI 2022
|pdfUrl=https://ceur-ws.org/Vol-3324/oaei22_paper11.pdf
|volume=Vol-3324
|authors=Daniel Faria,Marta Contreiras Silva,Pedro Cotovio,Patrícia Eugénio,Catia Pesquita
|dblpUrl=https://dblp.org/rec/conf/semweb/FariaSCEP22
}}
==Matcha and Matcha-DL results for OAEI 2022==
Matcha and Matcha-DL results for OAEI 2022 Daniel Faria1,2 , Marta Contreiras Silva1 , Pedro Cotovio1 , Patrícia Eugénio1 and Catia Pesquita1 1 LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal 2 INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal Abstract Matcha is a novel ontology matching system under development that aims to tackle holistic ontology matching, complex ontology matching and machine-learning based matching. It builds upon the success of AgreementMakerLight (AML), but is based on an entirely novel architecture to support more complex algorithms. Matcha-DL is an expansion of Matcha for supervised learning settings. Matcha achieved state of the art performance in several OAEI tasks, whereas Matcha-DL ranked first in F-measure in the majority of the Bio-ML tasks. 1. Presentation of the System 1.1. State, Purpose, General Statement Matcha is a novel ontology matching system under development that aims to tackle some of the biggest challenges in the field: holistic ontology matching [1], complex ontology matching [2] and machine-learning based matching. It builds upon the success of AgreementMakerLight (AML), and indeed incorporates all its key algorithms, but under a new core architecture that enables holistic matching of multiple ontologies and provides a better framework for complex ontology matching. Among its novel functionalities are encoding ontology alignments in OWL (including complex alignments) and merging ontologies, which are integral to the holistic ontology matching algorithm we have proposed [3]. Moreover, Matcha’s architecture is more encompassing than AML’s, reconciling features that in the past required separate extensions (such as AMLC). Matcha-DL is an expansion of the Matcha framework to the supervised learning realm via exploiting Matcha algorithms’ results to learn a decision boundary for each alignment. 1.2. Specific Techniques Used With respect to algorithms, Matcha includes all of AML’s lexical and structural matching algorithms as well as its background knowledge strategy [4, 5], and also includes a revised implementation of the pattern-based complex matching algorithms that were developed under AMLC. Thus, at present, no major algorithmic innovations are present in Matcha, but because its processing of ontology relations is different from AML’s, the results of some matching algorithms are expected to be different as well. Moreover, as Matcha is © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 Interna- tional (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) still under development, some functionalities of AML have not yet been implemented, including translation and alignment repair. Matcha-DL implements a dense neural network that receives as input the scores generated from running all the matcher algorithms of the Matcha framework. Afterwards, Matcha-DL can also exploit Matcha’s filters for holistic matching to enforce alignment consistency. Currently, Matcha-DL only supports equivalence matching, but an adaptation for subsumption matching is under development. We are also expanding this framework with more complex deep learning based solutions. 1.3. Adaptations Made for the Evaluation No specific adaptations were made to Matcha for the evaluation other than implementing the MELT [6] web-based package required to participate in the OAEI. In our imple- mentation, Matcha relies on the parameters passed by MELT to configure itself for the matching tasks. Alas, these parameters are not correctly configured for all OAEI tracks, which led to unexpectedly poor results in those tracks. Matcha-DL supports global matching and local ranking-based metrics. However, it has a slight disadvantage on the latter since the sparsity in the core Matcha algorithm’s scores leads to a near binary probability distribution. As such, Matcha-DL is sub-optimal to optimize metrics like MRR and Hits@k proposed by [7] but achieves state-of-the-art performances when considering global matching metrics. 1.4. Link to the System and Parameters File Matcha is still under development and therefore not yet publicly available. A public release will be made once the core development is completed. 2. Results Matcha’s OAEI 2022 results are summarized in Table 1, with Matcha and Matcha-DL’s results in the Bio-ML track are reported in Table 2. 3. Conclusions The Matcha system is still in its early stages of development and participating in OAEI 2022 was fundamental to highlight challenges and opportunities going forward. We applaud the novel tracks at OAEI 2022 which have shown both the maturity of equivalence-oriented ontology alignment approaches and the opportunites afforded by supervised learning, but also that both ensuring generalizability and tackling tasks such as subsumption and complex matching are still very much open challenges. Table 1 Summary of Matcha’s OAEI 2022 results across 7 tracks Recall/ Run Task Precision F-measure Rank Coverage time (s) —— Anatomy —— Mouse-Human 0.951 0.93 0.941 37 1 —— Biodiversity & Ecology —— NCBITAXON-TAXREFLD Animalia 0.577 1.0 0.732 5 2 NCBITAXON-TAXREFLD Bacteria 0.623 0.984 0.763 15 3 NCBITAXON-TAXREFLD Chromista 0.785 0.999 0.879 51 1 NCBITAXON-TAXREFLD Fungi 0.741 0.992 0.848 75 2 NCBITAXON-TAXREFLD Plantae 0.722 1.0 0.839 11 2 —— Common Knowledge Graphs —— Nell-DBpedia 1.0 0.81 0.90 60 2 Yago-Wikidata 1.0 0.80 0.89 198 2 —— Complex —— Conference 0.31 0.37 0.34 - 1 Hydrography 0.49 0.04 0.08 - 3 Geolink 0.49 0.22 0.30 - 2 Populated Geolink 0.49 0.22 0.30 - 2 Populated Enslaved 0.46 0.12 0.18 - 3 —— Conference —— OntoFarm (rar2-M3) 0.37 0.07 0.12 - 14 —— Knowledge Graph —— Aggregate (overall) 0.53 0.70 0.60 9621 7 —— Material Sciences and Engineering —— First Test Case 0.0 0.0 0.0 22 4 Second Test Case 0.0 0.0 0.0 15 4 Third Test Case 0.500 0.032 0.060 21 3 Table 2 Summary of Matcha’s and Matcha-DL’s Bio-ML OAEI 2022 results. Recall/ Run Task Precision F-measure Rank1 Coverage time (s) Semi-Supervised Matcha OMIM-ORDO 0.694 0.511 0.589 - 5 NCIT-DOID 0.883 0.754 0.813 - 4 SNOMED-FMA 0.845 0.592 0.696 - 6 SNOMED-NCIT (Pharm) 0.924 0.607 0.733 - 6 SNOMED-NCIT (Neoplas) 0.704 0.564 0.626 - 7 Matcha-DL OMIM-ORDO 0.887 0.578 0.700 - 1 NCIT-DOID 0.955 0.801 0.871 - 1 SNOMED-FMA 0.998 0.756 0.856 - 1 SNOMED-NCIT (Pharm) 0.999 0.593 0.744 - 3 SNOMED-NCIT (Neoplas) 0.956 0.615 0.748 - 1 Unsupervised Matcha OMIM-ORDO 0.743 0.508 0.604 - 7 NCIT-DOID 0.906 0.756 0.825 - 4 SNOMED-FMA 0.875 0.594 0.707 - 5 SNOMED-NCIT (Pharm) 0.941 0.613 0.742 - 5 SNOMED-NCIT (Neoplas) 0.754 0.564 0.645 - 7 Matcha-DL OMIM-ORDO 0.943 0.514 0.666 - 1 NCIT-DOID 0.978 0.771 0.862 - 2 SNOMED-FMA 0.990 0.746 0.851 - 1 SNOMED-NCIT (Pharm) 0.996 0.612 0.758 - 2 SNOMED-NCIT (Neoplas) 0.973 0.589 0.734 - 2 Acknowledgements This work was supported by FCT through the LASIGE Research Unit (UIDB/00408 /2020 and UIDP/00408/2020). It was also partially supported by the KATY project which has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 101017453. Marta Silva was partially funded by FCT through the fellowship 2022.11895.BD. References [1] I. Megdiche, O. Teste, C. Trojahn, An extensible linear approach for holistic ontology matching, in: International Semantic Web Conference, Springer, 2016, pp. 393–410. [2] É. Thiéblin, O. Haemmerlé, N. Hernandez, C. Trojahn, Survey on complex ontol- ogy matching, Semantic Web 11 (2020) 689–727. URL: https://doi.org/10.3233/ SW-190366. doi:10.3233/SW-190366. [3] M. C. Silva, D. Faria, C. Pesquita, Matching multiple ontologies to build a knowledge graph for personalized medicine, in: European Semantic Web Conference, Springer, 2022, pp. 461–477. [4] D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. F. Cruz, F. M. Couto, The AgreementMakerLight Ontology Matching System, in: OTM Conferences - ODBASE, 2013, pp. 527–541. [5] D. Faria, C. Pesquita, E. Santos, I. F. Cruz, F. M. Couto, Automatic Background Knowledge Selection for Matching Biomedical Ontologies, PLoS One 9 (2014) e111226. [6] S. Hertling, J. Portisch, H. Paulheim, MELT - matching evaluation toolkit, in: Semantic Systems. The Power of AI and Knowledge Graphs - 15th International Conference, SEMANTiCS 2019, Karlsruhe, Germany, September 9-12, 2019, Pro- ceedings, 2019, pp. 231–245. URL: https://doi.org/10.1007/978-3-030-33220-4_17. doi:10.1007/978-3-030-33220-4\_17. [7] Y. He, J. Chen, H. Dong, E. Jiménez-Ruiz, A. Hadian, I. Horrocks, Machine learning- friendly biomedical datasets for equivalence and subsumption ontology matching, arXiv preprint arXiv:2205.03447 (2022).