<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Samira Oulefki</string-name>
          <email>soulefki@usthb.dz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lamia Berkani</string-name>
          <email>lberkani@usthb.dz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ladjel Bellatreche</string-name>
          <email>bellatreche@ensma.fr</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nassim Boudjenah</string-name>
          <email>boudjenah36@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aicha</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mokhtari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>LIAS/ISAE-ENSMA</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Poitiers</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dep. of Artificial Intelligence and Data Sciences, Faculty of Informatics, USTHB</institution>
          ,
          <addr-line>Bab Ezzouar 16111, Algiers</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>BioGITOM is an advanced ontology matching (OM) system developed for the biomedical domain, designed to meet the increasing need for precise ontology alignment and data interoperability. By integrating Graph Isomorphism Networks and Graph Transformers, BioGITOM produces enriched concept embeddings that combine both structural and semantic information. This hybrid model enables the system to accurately identify correspondences between concepts across various ontologies, effectively addressing the challenges presented by the complexity and diversity of biomedical data. BioGITOM demonstrated outstanding performance in the Bio-ML benchmark tasks, ranking as the top system in all three tasks and outperforming eight competing methods.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology matching</kwd>
        <kwd>deep learning</kwd>
        <kwd>GNN</kwd>
        <kwd>graph transformer</kwd>
        <kwd>graph isomorphism transformer</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Recent advances in learning methods, particularly those utilizing Deep Learning (DL) and
Graph Neural Networks (GNNs), offer a more powerful means of extracting meaningful entity
representations for OM [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ].
      </p>
      <p>
        In this paper, we present BioGITOM, a novel OM system specifically designed for the
biomedical domain. BioGITOM enhances concept matching by integrating both semantic and
structural information. It leverages BioBERT to extract semantic features and employs a Graph
Isomorphism Transformer (GIT) model, which combines Graph Isomorphism Networks (GINs)
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and Graph Transformers (GTs) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], to capture structural relationships within ontologies.
This hybrid approach allows BioGITOM to deliver highly accurate correspondences between
complex biomedical concepts, meeting the growing demand for more precise and scalable OM
in biomedical research and applications.
      </p>
    </sec>
    <sec id="sec-2">
      <title>1.1. State, Purpose, General Statement</title>
      <p>BioGITOM is a specialized OM system developed to address the increasing complexity and
heterogeneity in biomedical ontologies. Its core purpose is to ensure effective integration and
alignment of disparate ontologies, which is essential for improving data interoperability in
biomedical research. BioGITOM is particularly designed to manage the unique challenges posed
by the biomedical field, where ontologies often differ in structure and semantics. By combining
advanced graph-based techniques, BioGITOM is able to produce more accurate mappings
between concepts from different ontologies, thereby supporting enhanced data sharing and
collaboration across systems.</p>
    </sec>
    <sec id="sec-3">
      <title>1.2. Specific Techniques Used</title>
      <p>
        BioGITOM leverages a sophisticated set of techniques to deliver high-precision OM by
integrating both structural and semantic features of biomedical concepts. The system employs
a hybrid Graph Neural Network (GNN) model, combining the strengths of Graph Isomorphism
Networks (GINs) and Graph Transformers (GTs) to handle complex biomedical data. Below is a
detailed breakdown of the specific techniques used:
1. Preprocessing: This module prepares the raw ontology data for processing. It reads input
files in OWL (Ontology Web Language) format, creates RDF (Resource Description
Framework) graphs, and extracts concept labels and synonyms. By doing so, it generates
a rich set of terms and relationships for further processing.
2. Concept Name Encoder: BioGITOM leverages BioBERT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], a pre-trained language model
specialized for biomedical text, to encode the names and synonyms of ontology
concepts. BioBERT captures the semantic nuances of biomedical terms, providing rich
embeddings for each concept.
3. Graph Isomorphism Transformer (GIT): The core of BioGITOM is the Graph Isomorphism
Transformer (GIT) model. This hybrid approach combines the structural expressiveness
of Graph Isomorphism Networks (GINs) with the ability of Graph Transformers to
capture long- range dependencies in graphs. GINs ensure that the local graph structure
of the ontology is accurately captured, while GTs excel at identifying more global,
nonlocal relationships between concepts. This combination allows the system to create rich
      </p>
      <p>structural embeddings for each concept, capturing both fine-grained and broad context
information about how concepts relate to each other in the ontology graph.</p>
      <p>
        Gating Aggregator: The Gating Aggregator is responsible for merging the semantic and
structural embeddings generated by the Concept Name Encoder and GIT, respectively.
This is done through a gated mechanism [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] that dynamically balances the importance
of semantic and structural information for each concept. The gating function, controlled
by a learnable weight matrix and bias, determines how much semantic information
versus structural information should be reflected in the final embedding. This step
ensures that the final embeddings used for matching are an optimal blend of both types
of information, tailored to the specific characteristics of the ontologies being compared.
Mappings Selector: The final step in BioGITOM’s architecture is the Mappings Selector,
which compares the merged embeddings to identify correspondences between concepts
from different ontologies. A similarity measure, such as cosine similarity, is applied to
determine how closely the embeddings match. The output is a set of mappings between
concepts, along with confidence scores indicating the strength of each match.
      </p>
    </sec>
    <sec id="sec-4">
      <title>1.3. Adaptations Made for the Evaluation</title>
      <p>For this evaluation, BioGITOM was applied in its standard configuration without any
taskspecific modifications. This approach demonstrates the system's inherent versatility and
robustness, as it was capable of achieving high performance without the need for additional
customization.</p>
      <p>The results underscore BioGITOM's effectiveness and generalizability across different OM
tasks within the biomedical domain, highlighting its potential as a reliable tool for diverse
applications.</p>
    </sec>
    <sec id="sec-5">
      <title>1.4. Link to the System and Parameters File</title>
      <p>BioGITOM is currently in the development phase and has not yet been released to the public.
A public release is planned once the core development is finalized, ensuring that the system is
fully functional and ready for broader use in OM tasks.</p>
      <sec id="sec-5-1">
        <title>2. Results</title>
        <p>BioGITOM’s results for OAEI 2024 are summarized in the following sub-sections:</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>2.1. Performance Evaluation of BioGITOM Using OMIM-ORDO Dataset</title>
    </sec>
    <sec id="sec-7">
      <title>2.2. Performance evaluation of BioGITOM using DOID-NCIT dataset</title>
      <p>Table 2 shows that BioGITOM performs exceptionally well on the DOID-NCIT dataset,
achieving a precision of 0.944 and an F1 score of 0.913. While BioGITOM's recall value
(0.884) is slightly lower than the highest recall of 0.959 achieved by LogMapBio, its
overall performance remains highly competitive, demonstrating a strong balance between
accuracy and recall.</p>
    </sec>
    <sec id="sec-8">
      <title>2.3. Performance evaluation of BioGITOM using SNOMED-FMA (Body) dataset</title>
      <p>As shown in Table 3, BioGITOM delivers outstanding performance on the SNOMED-FMA
(Body) dataset, achieving the highest precision (0.962), recall (0.886), and F1 score (0.923) among
all competing methods.</p>
    </sec>
    <sec id="sec-9">
      <title>2.4. Performance evaluation of BioGITOM using SNOMED-NCIT (Pharm) dataset</title>
      <p>As shown in Table 4, BioGITOM demonstrates strong performance on the SNOMED-NCIT
(Pharm) dataset, achieving the highest precision (0.983). However, its recall is relatively lower
at 0.713, leading to an F1 score of 0.827. Despite this, BioGITOM secures the second position
overall in this dataset, reflecting its high accuracy in producing correct mappings while
acknowledging room for improvement in capturing a greater number of relevant matches.</p>
      <sec id="sec-9-1">
        <title>3. General Comments</title>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>3.1. Comments on the Results (Strengths and Weaknesses)</title>
      <p>The experimental results highlight the significant advantages of BioGITOM compared to other
highly ranked systems. A key strength of our approach lies in the Graph Isomorphism
Transformer (GIT) model, which effectively generates contextually relevant representations,
enabling the system to handle the complexities of biomedical ontologies. This capability is
especially valuable when working with intricate and heterogeneous ontological structures.</p>
      <p>However, a limitation of our current approach is its focus solely on generating equivalent
mappings. This narrow focus does not fully address other types of semantic relationships, such
as subsumption, which may be crucial in certain OM tasks.</p>
    </sec>
    <sec id="sec-11">
      <title>3.2. Discussion on Improvements for the Proposed System</title>
      <p>To enhance the system’s versatility and performance, we are actively investigating ways to
expand the range of matching relationships that BioGITOM can handle, moving beyond simple
equivalences to include subsumption, and other relevant relationships.</p>
      <p>
        Additionally, we are exploring the transfer of concept representations into a hyperbolic
integration space. This shift is motivated by the limitations of Euclidean space for hierarchical
ontologies, where distortions can occur. Hyperbolic space is better suited for preserving
hierarchical structures, and we believe that this transformation will significantly improve
BioGITOM’s accuracy and representation of complex ontological relationships [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ].
      </p>
      <sec id="sec-11-1">
        <title>4. Conclusion</title>
        <p>BioGITOM is a novel approach for biomedical OM that leverages a hybrid Graph Neural
Network model, GIT, integrating the strengths of Graph Transformers (GTs) and Graph
Isomorphism Networks (GINs).</p>
        <p>Experimental results show that BioGITOM consistently outperforms competitive methods
across most of the evaluated datasets, underscoring its strong ability to produce highly accurate
mappings. However, the system currently focuses exclusively on generating equivalent
mappings. To address this limitation, we are actively working on extending BioGITOM to
handle a broader range of matching relationships, such as subsumption, which will enhance the
system’s versatility and applicability in more complex ontology matching scenarios.</p>
      </sec>
      <sec id="sec-11-2">
        <title>5. Acknowledgement</title>
        <p>We would like to extend our heartfelt thanks to Jérôme Euzenat for his efforts and support in
enabling the submission of our system to the OAEI. His contributions to the initiative are
greatly appreciated, and we are grateful for the opportunity to participate.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          .
          <year>2013</year>
          . Ontology Matching. Springer-Verlag, Heidelberg (DE).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          , E. Santos,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.M.</given-names>
            <surname>Couto</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>The AgreementMakerLight ontology matching system</article-title>
          .
          <source>In: On the Move to Meaningful Internet Systems: OTM 2013 Conferences</source>
          , Springer Berlin Heidelberg, pp.
          <fpage>527</fpage>
          -
          <lpage>541</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>LogMap: Logic-based and scalable ontology matching</article-title>
          .
          <source>In: Proceedings of the International Conference on Semantic Web (ISWC</source>
          <year>2011</year>
          ), Springer Berlin Heidelberg, pp.
          <fpage>273</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sui</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>ERSOM: A structural ontology matching approach using automatically learned entity representation</article-title>
          .
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          , pp.
          <fpage>2419</fpage>
          -
          <lpage>2429</lpage>
          , http://dx.doi.org/10.18653/v1/d15-
          <fpage>1289</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kolyvakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalousis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiritsis</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>DeepAlignment: Unsupervised ontology matching with refined word vectors</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Vol.
          <volume>1</volume>
          , Association for Computational Linguistics, pp.
          <fpage>787</fpage>
          -
          <lpage>798</lpage>
          , http://dx.doi.org/10.18653/v1/n18-
          <fpage>1072</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>DAEOM: A Deep Attentional Embedding Approach for Biomedical Ontology Matching</article-title>
          .
          <source>Applied Sciences</source>
          ,
          <volume>10</volume>
          (
          <issue>21</issue>
          ),
          <volume>7909</volume>
          . https://doi.org/10.3390/app10217909.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jegelka</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>How powerful are graph neural networks?</article-title>
          <source>In: Proceedings of the 7th International Conference on Learning Representations (ICLR)</source>
          , https://doi.org/10.48550/arXiv.
          <year>1810</year>
          .
          <volume>00826</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ananiadou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rong</surname>
          </string-name>
          .
          <year>2022</year>
          .
          <article-title>Transformer for graphs: An overview from architecture perspective</article-title>
          .
          <source>CoRR abs/2202</source>
          .08455, arXiv preprint arXiv:
          <volume>2202</volume>
          .
          <fpage>08455</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>So</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kang</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>BioBERT: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>36</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>1234</fpage>
          -
          <lpage>1240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Huai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.J.</given-names>
            <surname>Yuan</surname>
          </string-name>
          .
          <year>2022</year>
          .
          <article-title>Delving deep into regularity: A simple but effective method for Chinese named entity recognition</article-title>
          .
          <source>In: Findings of the Association for Computational Linguistics: NAACL</source>
          <year>2022</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , pp.
          <fpage>1863</fpage>
          -
          <lpage>1873</lpage>
          , http://dx.doi.org/10.18653/v1/
          <year>2022</year>
          .findings-naacl.
          <volume>143</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Quamar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Özcan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>MEDTO: Medical Data to Ontology Matching Using Hybrid Graph Neural Networks</article-title>
          .
          <source>In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '21)</source>
          , Virtual Event, Singapore,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA, pp.
          <fpage>2946</fpage>
          -
          <lpage>2954</lpage>
          , https://doi.org/10.1145/3447548.3467138.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          .
          <year>2022</year>
          .
          <article-title>Matching Biomedical Ontologies via a Hybrid Graph Attention Network</article-title>
          .
          <source>Front Genet</source>
          .
          <volume>13</volume>
          :893409, https://doi.org/10.3389/fgene.
          <year>2022</year>
          .
          <volume>893409</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>