=Paper= {{Paper |id=Vol-3324/oaei22_paper11 |storemode=property |title=Matcha and Matcha-DL results for OAEI 2022 |pdfUrl=https://ceur-ws.org/Vol-3324/oaei22_paper11.pdf |volume=Vol-3324 |authors=Daniel Faria,Marta Contreiras Silva,Pedro Cotovio,Patrícia Eugénio,Catia Pesquita |dblpUrl=https://dblp.org/rec/conf/semweb/FariaSCEP22 }} ==Matcha and Matcha-DL results for OAEI 2022== https://ceur-ws.org/Vol-3324/oaei22_paper11.pdf
Matcha and Matcha-DL results for OAEI 2022
Daniel Faria1,2 , Marta Contreiras Silva1 , Pedro Cotovio1 , Patrícia Eugénio1 and
Catia Pesquita1
1
    LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal
2
    INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal


                                        Abstract
                                        Matcha is a novel ontology matching system under development that aims to tackle holistic
                                        ontology matching, complex ontology matching and machine-learning based matching. It
                                        builds upon the success of AgreementMakerLight (AML), but is based on an entirely novel
                                        architecture to support more complex algorithms. Matcha-DL is an expansion of Matcha for
                                        supervised learning settings. Matcha achieved state of the art performance in several OAEI
                                        tasks, whereas Matcha-DL ranked first in F-measure in the majority of the Bio-ML tasks.




1. Presentation of the System
1.1. State, Purpose, General Statement
Matcha is a novel ontology matching system under development that aims to tackle
some of the biggest challenges in the field: holistic ontology matching [1], complex
ontology matching [2] and machine-learning based matching. It builds upon the success
of AgreementMakerLight (AML), and indeed incorporates all its key algorithms, but
under a new core architecture that enables holistic matching of multiple ontologies
and provides a better framework for complex ontology matching. Among its novel
functionalities are encoding ontology alignments in OWL (including complex alignments)
and merging ontologies, which are integral to the holistic ontology matching algorithm we
have proposed [3]. Moreover, Matcha’s architecture is more encompassing than AML’s,
reconciling features that in the past required separate extensions (such as AMLC).
Matcha-DL is an expansion of the Matcha framework to the supervised learning realm via
exploiting Matcha algorithms’ results to learn a decision boundary for each alignment.

1.2. Specific Techniques Used
With respect to algorithms, Matcha includes all of AML’s lexical and structural matching
algorithms as well as its background knowledge strategy [4, 5], and also includes a revised
implementation of the pattern-based complex matching algorithms that were developed
under AMLC. Thus, at present, no major algorithmic innovations are present in Matcha,
but because its processing of ontology relations is different from AML’s, the results of
some matching algorithms are expected to be different as well. Moreover, as Matcha is

                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 Interna-
                                       tional (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
still under development, some functionalities of AML have not yet been implemented,
including translation and alignment repair.
   Matcha-DL implements a dense neural network that receives as input the scores
generated from running all the matcher algorithms of the Matcha framework. Afterwards,
Matcha-DL can also exploit Matcha’s filters for holistic matching to enforce alignment
consistency. Currently, Matcha-DL only supports equivalence matching, but an adaptation
for subsumption matching is under development. We are also expanding this framework
with more complex deep learning based solutions.

1.3. Adaptations Made for the Evaluation
No specific adaptations were made to Matcha for the evaluation other than implementing
the MELT [6] web-based package required to participate in the OAEI. In our imple-
mentation, Matcha relies on the parameters passed by MELT to configure itself for the
matching tasks. Alas, these parameters are not correctly configured for all OAEI tracks,
which led to unexpectedly poor results in those tracks.
  Matcha-DL supports global matching and local ranking-based metrics. However, it
has a slight disadvantage on the latter since the sparsity in the core Matcha algorithm’s
scores leads to a near binary probability distribution. As such, Matcha-DL is sub-optimal
to optimize metrics like MRR and Hits@k proposed by [7] but achieves state-of-the-art
performances when considering global matching metrics.

1.4. Link to the System and Parameters File
Matcha is still under development and therefore not yet publicly available. A public
release will be made once the core development is completed.


2. Results
Matcha’s OAEI 2022 results are summarized in Table 1, with Matcha and Matcha-DL’s
results in the Bio-ML track are reported in Table 2.


3. Conclusions
The Matcha system is still in its early stages of development and participating in
OAEI 2022 was fundamental to highlight challenges and opportunities going forward.
We applaud the novel tracks at OAEI 2022 which have shown both the maturity of
equivalence-oriented ontology alignment approaches and the opportunites afforded by
supervised learning, but also that both ensuring generalizability and tackling tasks such
as subsumption and complex matching are still very much open challenges.
Table 1
Summary of Matcha’s OAEI 2022 results across 7 tracks
                                                Recall/                   Run
                 Task                   Precision           F-measure              Rank
                                               Coverage                 time (s)
                                    —— Anatomy ——
 Mouse-Human                            0.951    0.93         0.941       37        1
                     —— Biodiversity & Ecology ——
 NCBITAXON-TAXREFLD Animalia   0.577        1.0               0.732        5        2
 NCBITAXON-TAXREFLD Bacteria   0.623       0.984              0.763       15        3
 NCBITAXON-TAXREFLD Chromista  0.785       0.999              0.879       51        1
 NCBITAXON-TAXREFLD Fungi      0.741       0.992              0.848       75        2
 NCBITAXON-TAXREFLD Plantae    0.722        1.0               0.839       11        2
                           —— Common Knowledge Graphs ——
 Nell-DBpedia                         1.0       0.81     0.90             60        2
 Yago-Wikidata                        1.0       0.80     0.89             198       2
                                    —— Complex ——
 Conference                             0.31     0.37         0.34         -        1
 Hydrography                            0.49     0.04         0.08         -        3
 Geolink                                0.49     0.22         0.30         -        2
 Populated Geolink                      0.49     0.22         0.30         -        2
 Populated Enslaved                     0.46     0.12         0.18         -        3
                                   —— Conference ——
 OntoFarm (rar2-M3)                     0.37      0.07        0.12         -        14
                                —— Knowledge Graph ——
 Aggregate (overall)                   0.53       0.70        0.60       9621       7
                        —— Material Sciences and Engineering ——
 First Test Case                         0.0        0.0        0.0        22        4
 Second Test Case                        0.0        0.0        0.0        15        4
 Third Test Case                        0.500      0.032      0.060       21        3
Table 2
Summary of Matcha’s and Matcha-DL’s Bio-ML OAEI 2022 results.
                                                Recall/                 Run
               Task                 Precision             F-measure              Rank1
                                              Coverage                time (s)
                                   Semi-Supervised
                                      Matcha
 OMIM-ORDO                           0.694       0.511      0.589        -         5
 NCIT-DOID                           0.883       0.754      0.813        -         4
 SNOMED-FMA                          0.845       0.592      0.696        -         6
 SNOMED-NCIT (Pharm)                 0.924       0.607      0.733        -         6
 SNOMED-NCIT (Neoplas)               0.704       0.564      0.626        -         7
                                     Matcha-DL
 OMIM-ORDO                           0.887       0.578      0.700        -         1
 NCIT-DOID                           0.955       0.801      0.871        -         1
 SNOMED-FMA                          0.998       0.756      0.856        -         1
 SNOMED-NCIT (Pharm)                 0.999       0.593      0.744        -         3
 SNOMED-NCIT (Neoplas)               0.956       0.615      0.748        -         1
                                    Unsupervised
                                      Matcha
 OMIM-ORDO                           0.743      0.508       0.604        -         7
 NCIT-DOID                           0.906      0.756       0.825        -         4
 SNOMED-FMA                          0.875      0.594       0.707        -         5
 SNOMED-NCIT (Pharm)                 0.941      0.613       0.742        -         5
 SNOMED-NCIT (Neoplas)               0.754      0.564       0.645        -         7
                                     Matcha-DL
 OMIM-ORDO                           0.943      0.514       0.666        -         1
 NCIT-DOID                           0.978      0.771       0.862        -         2
 SNOMED-FMA                          0.990      0.746       0.851        -         1
 SNOMED-NCIT (Pharm)                 0.996      0.612       0.758        -         2
 SNOMED-NCIT (Neoplas)               0.973      0.589       0.734        -         2
Acknowledgements
This work was supported by FCT through the LASIGE Research Unit (UIDB/00408
/2020 and UIDP/00408/2020). It was also partially supported by the KATY project which
has received funding from the European Union’s Horizon 2020 research and innovation
program under grant agreement No 101017453. Marta Silva was partially funded by FCT
through the fellowship 2022.11895.BD.


References
[1] I. Megdiche, O. Teste, C. Trojahn, An extensible linear approach for holistic ontology
    matching, in: International Semantic Web Conference, Springer, 2016, pp. 393–410.
[2] É. Thiéblin, O. Haemmerlé, N. Hernandez, C. Trojahn, Survey on complex ontol-
    ogy matching, Semantic Web 11 (2020) 689–727. URL: https://doi.org/10.3233/
    SW-190366. doi:10.3233/SW-190366.
[3] M. C. Silva, D. Faria, C. Pesquita, Matching multiple ontologies to build a knowledge
    graph for personalized medicine, in: European Semantic Web Conference, Springer,
    2022, pp. 461–477.
[4] D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. F. Cruz, F. M. Couto, The
    AgreementMakerLight Ontology Matching System, in: OTM Conferences - ODBASE,
    2013, pp. 527–541.
[5] D. Faria, C. Pesquita, E. Santos, I. F. Cruz, F. M. Couto, Automatic Background
    Knowledge Selection for Matching Biomedical Ontologies, PLoS One 9 (2014) e111226.
[6] S. Hertling, J. Portisch, H. Paulheim, MELT - matching evaluation toolkit, in:
    Semantic Systems. The Power of AI and Knowledge Graphs - 15th International
    Conference, SEMANTiCS 2019, Karlsruhe, Germany, September 9-12, 2019, Pro-
    ceedings, 2019, pp. 231–245. URL: https://doi.org/10.1007/978-3-030-33220-4_17.
    doi:10.1007/978-3-030-33220-4\_17.
[7] Y. He, J. Chen, H. Dong, E. Jiménez-Ruiz, A. Hadian, I. Horrocks, Machine learning-
    friendly biomedical datasets for equivalence and subsumption ontology matching,
    arXiv preprint arXiv:2205.03447 (2022).