<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Results for Matcha and Matcha-DL in OAEI 2023</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Faria</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marta Silva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pedro Cotovio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucas Ferraz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Balbi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Catia Pesquita</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>1.1. State</institution>
          ,
          <addr-line>Purpose, General Statement</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>INESC-ID, Instituto Superior Técnico, Universidade de Lisboa</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LASIGE, Faculdade de Ciências, Universidade de Lisboa</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>Matcha is an ontology matching system under development, designed to tackle long-standing challenges such as complex and holistic ontology matching. It incorporates all of the key algorithms from AgreementMakerLight (AML) over a novel broader core architecture, and includes several new algorithms. Matcha-DL augments Matcha to semi-supervised tasks, it uses trainable model to select and rank candidates proposed by Matcha. Matcha performed well overall, achieving the highest F-measure in 6 of the 18 distinct OAEI tasks and ranking in the top three in 9 others. Matcha-DL achieved the highest F-measure in 4 of the 5 semi-supervised BioML tasks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1.2. Specific Techniques Used</title>
      <p>
        A matching algorithm that uses Large Language Models (LLM) has been added to Matcha.
The strategy relies on the conversion of the entities’ labels and synonyms into embeddings,
followed by a computation of the cosine similarity between embeddings. The embeddings can be
obtained from any LLM model, although for this OAEI edition, we used the sentence-BERT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
all-MiniLM-L6-v2 model1 (the pretrained model with no fine-tuning).
      </p>
      <p>Matcha’s novel translation module is a neural-based many-to-many multilingual translation
method that uses an Encoder-Decoder Long Short-Term Memory architecture that consists
of two complex recurrent neural networks that act as an encoder and decoder pair. This
mechanism solves a sequence-to-sequence prediction problem between two given source and
target ontologies’ languages.</p>
      <p>The encoder maps the source language’s label to a vector representation that serves as input
to the decoder, that then maps the vector back to a translation of the label in the target language.
The label translations are then added to the lexicons of the original ontologies.</p>
      <p>For Matcha-DL, a specific pipeline was developed that incorporates nearly all of Matcha’s
matching algorithms to create an input for a dense neural network. Matcha-DL augments
Matcha by learning to rank candidates produced by Matcha based on this input scores.</p>
      <p>Matcha’s matching algorithms are described in Table 1.</p>
    </sec>
    <sec id="sec-2">
      <title>1.3. Adaptations Made for the Evaluation</title>
      <p>
        The MELT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] web-based package was implemented in Matcha for the required evaluation in
OAEI.
      </p>
      <p>Matcha-DL was mainly designed as a ranking model for local matching. It was however
adapted to for global matching tasks by considering candidates produced by the Matcha core
algorithm.</p>
    </sec>
    <sec id="sec-3">
      <title>1.4. Link to the System and Parameters File</title>
      <p>As Matcha is still under development, it is not publicly available. A public release is planned
once the core development is completed, which is expected to be soon.</p>
      <sec id="sec-3-1">
        <title>2. Results</title>
        <p>Matcha’s results for OAEI 2023 are summarized in Table 2, with the exception of the results for
the BioML track, which are presented in Table 3, for both Matcha and Matcha-DL.</p>
        <p>Matcha had good general performance, achieving the highest F-measure out of all systems in
8 of the 18 distinct OAEI tasks, while ranking in the top 3 in 9 others.</p>
        <p>The participation in the complex track was hindered by the change in the definition of
instances. In this year’s datasets the entities shared a local name with diferent prefixes, but
Matcha’s algorithms rely on the entities being semantically equivalent – either by having the
same URI or by being declared as owl:sameAs.
1https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2</p>
        <sec id="sec-3-1-1">
          <title>Matches classes based on overlapping individuals that instantiate them, computed through conservative instance matching algorithms</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Matches ontologies by finding literal full name matches between their lexicons. Weighs matches according to the provenance of the names</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>Matches ontologies by computing the cosine similarity between the embeddings of their lexicons</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>Matches ontologies by using cross-references and/or exact lexical matches between them and a third mediating ontology</title>
        </sec>
        <sec id="sec-3-1-5">
          <title>Matches ontologies by measuring the maximum string similarity, using one of the four available string similarity measures</title>
        </sec>
        <sec id="sec-3-1-6">
          <title>Matches ontologies by measuring the word similarity, using a weighted</title>
        </sec>
        <sec id="sec-3-1-7">
          <title>Jaccard index</title>
          <p>Instance Matching</p>
        </sec>
        <sec id="sec-3-1-8">
          <title>Matches individuals by finding literal matches between the values of their annotation and data properties</title>
        </sec>
        <sec id="sec-3-1-9">
          <title>Maps individuals by comparing their values through the ISub string similarity metric</title>
        </sec>
        <sec id="sec-3-1-10">
          <title>Maps individuals by comparing the lexicon entries of one with the values of the other using a combination of string and word matching algorithms</title>
          <p>In regards to the Multifarm track, we had a character encoding issue that we were unable to
resolve in time. In the Knowledge Graph track, Matcha only found matches for instances which
was an unexpected result and requires further investigation.</p>
          <p>In the Material Sciences and Engineering, some of the matches were between classes and
object properties, which was puzzling considering that Matcha separates entities by type for
the matching tasks, meaning there should be no mappings between diferent entity types. We
suspect it could possibly be an issue with the ontologies’ encoding.</p>
          <p>Matcha-DL, through the incorporation of a relatively simple training procedure to augment
Matcha, demonstrates surprising results on the semi-supervised Bio-ML tasks. Remarkably, it
achieves the highest F-score in four out of five tasks. Comparative analysis with the Matcha
algorithm underscores the discernible advantage of incorporating training within the context
of ontology alignment.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3. Conclusions</title>
        <p>There is a general improvement in Matcha’s results when compared to the earlier version
presented in OAEI 2022, although some further refinements are still required. Matcha achieved
the highest F-measure in 6 of the 18 distinct OAEI tasks and ranked in the top three in 9 others.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Acknowledgements</title>
        <p>This work was supported by FCT through the LASIGE Research Unit (UIDB/00408
/2020 and UIDP/00408/2020). It was also partially supported by the KATY project which has
received funding from the European Union’s Horizon 2020 research and innovation program
under grant agreement No 101017453. This work was supported in part by projet 41, HfPT:
Health from Portugal, funded by the Portuguese Plano de Recuperação e Resiliência. Marta
Silva was partially funded by FCT through the fellowship 2022.11895.BD.</p>
        <sec id="sec-3-3-1">
          <title>NCBITAXON-TAXREFLD Animalia</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>NCBITAXON-TAXREFLD Bacteria</title>
        </sec>
        <sec id="sec-3-3-3">
          <title>NCBITAXON-TAXREFLD Chromista</title>
        </sec>
        <sec id="sec-3-3-4">
          <title>NCBITAXON-TAXREFLD Fungi</title>
        </sec>
        <sec id="sec-3-3-5">
          <title>NCBITAXON-TAXREFLD Plantae</title>
        </sec>
        <sec id="sec-3-3-6">
          <title>NCBITAXON-TAXREFLD Protozoa</title>
        </sec>
        <sec id="sec-3-3-7">
          <title>MACROALGAE-MACROZOOBENTHOS</title>
          <p>FISH-ZOOPLANKTON</p>
        </sec>
        <sec id="sec-3-3-8">
          <title>OMIM-ORDO</title>
        </sec>
        <sec id="sec-3-3-9">
          <title>NCIT-DOID</title>
        </sec>
        <sec id="sec-3-3-10">
          <title>SNOMED-FMA</title>
        </sec>
        <sec id="sec-3-3-11">
          <title>SNOMED-NCIT (Pharm)</title>
        </sec>
        <sec id="sec-3-3-12">
          <title>SNOMED-NCIT (Neoplas)</title>
        </sec>
        <sec id="sec-3-3-13">
          <title>OMIM-ORDO</title>
        </sec>
        <sec id="sec-3-3-14">
          <title>NCIT-DOID</title>
        </sec>
        <sec id="sec-3-3-15">
          <title>SNOMED-FMA</title>
        </sec>
        <sec id="sec-3-3-16">
          <title>SNOMED-NCIT (Pharm)</title>
        </sec>
        <sec id="sec-3-3-17">
          <title>SNOMED-NCIT (Neoplas)</title>
        </sec>
        <sec id="sec-3-3-18">
          <title>OMIM-ORDO</title>
        </sec>
        <sec id="sec-3-3-19">
          <title>NCIT-DOID</title>
        </sec>
        <sec id="sec-3-3-20">
          <title>SNOMED-FMA</title>
        </sec>
        <sec id="sec-3-3-21">
          <title>SNOMED-NCIT (Pharm)</title>
        </sec>
        <sec id="sec-3-3-22">
          <title>SNOMED-NCIT (Neoplas)</title>
        </sec>
        <sec id="sec-3-3-23">
          <title>OMIM-ORDO</title>
        </sec>
        <sec id="sec-3-3-24">
          <title>NCIT-DOID</title>
        </sec>
        <sec id="sec-3-3-25">
          <title>SNOMED-FMA</title>
        </sec>
        <sec id="sec-3-3-26">
          <title>SNOMED-NCIT (Pharm)</title>
          <p>SNOMED-NCIT (Neoplas)
–
–
–
–
–
–
–
–</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>I.</given-names>
            <surname>Megdiche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Teste</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Trojahn</surname>
          </string-name>
          ,
          <article-title>An extensible linear approach for holistic ontology matching</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2016</year>
          , pp.
          <fpage>393</fpage>
          -
          <lpage>410</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>É.</given-names>
            <surname>Thiéblin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Haemmerlé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Trojahn</surname>
          </string-name>
          , Survey on complex ontology matching,
          <source>Semantic Web</source>
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>689</fpage>
          -
          <lpage>727</lpage>
          . URL: https://doi.org/10.3233/SW-190366. doi:
          <volume>10</volume>
          .3233/SW- 190366.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          , E. Santos,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <article-title>The AgreementMakerLight Ontology Matching System</article-title>
          ,
          <source>in: OTM Conferences - ODBASE</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>527</fpage>
          -
          <lpage>541</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <article-title>Automatic Background Knowledge Selection for Matching Biomedical Ontologies</article-title>
          ,
          <source>PLoS One</source>
          <volume>9</volume>
          (
          <year>2014</year>
          )
          <article-title>e111226</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B. F.</given-names>
            <surname>d</surname>
          </string-name>
          . Lima,
          <article-title>Breaking rules: taking Complex Ontology Alignment beyond rulebased approaches</article-title>
          ,
          <source>Ph.D. thesis</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>10084</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Portisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          , MELT
          <article-title>- matching evaluation toolkit</article-title>
          ,
          <source>in: Semantic Systems. The Power of AI and Knowledge Graphs - 15th International Conference, SEMANTiCS</source>
          <year>2019</year>
          , Karlsruhe, Germany, September 9-
          <issue>12</issue>
          ,
          <year>2019</year>
          , Proceedings,
          <year>2019</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>245</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -33220-4_
          <fpage>17</fpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>030</fpage>
          - 33220- 4\_
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>