<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Can Language Models Align Biomedical Ontologies?: Evaluating Retrieval-Augmented Prompt Strategies in Bio-ML.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lucas Ferraz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pedro Giesteira Cotovio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Catia Pesquita</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LASIGE, Faculdade de Ciências, Universidade de Lisboa</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>36</volume>
      <fpage>0009</fpage>
      <lpage>0009</lpage>
      <abstract>
        <p>Aligning biomedical ontologies presents a significant challenge due to their complexity and the highly domainspecific nature of their vocabulary. Recent advancements in Language Models (LMs) have led to their increasing application in ontology alignment tasks, ofering promising results. However, a systematic evaluation of semanticsbased prompting strategies for leveraging LMs in this context remains unexplored. This study investigates the efectiveness of diferent prompting techniques to enhance biomedical ontology alignment performance. We have developed a framework to support the design of LM-based queries to assess the semantic similarity between ontology classes. The framework interrogates the ontologies to align to extract relevant contextual information to inject into the LM prompts allowing the use of Retrieval Augmented Generation (RAG). We conduct preliminary experiments on selected hard cases from biomedical ontologies that compose the Ontology Alignment Evaluation Initiative Bio-ML track and provide some insights into the efectiveness, reliability, and limitations of prompt-based approaches in ontology matching.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Language Models</kwd>
        <kwd>Ontology Alignment</kwd>
        <kwd>Knowledge Representation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Ontologies have become increasingly popular in various fields due to their ability to provide structured,
formal representations of knowledge. These knowledge structures are particularly valuable in areas
such as Artificial Intelligence (AI), Natural Language Processing (NLP), and Semantic Web technologies.
An ontology represents a set of concepts within a domain and the relationships between them, allowing
for more efective data sharing, discovery, and reasoning across diferent systems and applications.</p>
      <p>As individual ontologies grow and evolve independently from each other, any given concept will
inevitably display conceptual, linguistic, and structural diferences when modelled in diferent ontologies,
in diferent contexts and by diferent creators. These diferences often arise from varying domain
perspectives, terminologies and modelling choices across the maintainers and communities that develop
and use the ontologies.</p>
      <p>
        Ontology alignment addresses this issue through the generation of a set of mappings
(correspondences) between entities in diferent ontologies to establish semantic interoperability [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However,
automatically identifying these correspondences is a highly complex task. In general, ontologies are
typically designed within a specific context, relying on implicit background knowledge that is not
explicitly captured in their schema definitions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Most ontology alignment techniques perform entity mapping based on leveraging lexical, structural,
semantical, and external information of the entities being matched [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ]. Lexical information has
proved to be the more successful source for biomedical ontology alignment [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], with algorithms
based on exploring the lexical component of ontologies outperforming other approaches by a good
margin [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Lexical information can be extracted from entity labels, facilitating the exploration of
word sense disambiguation and the inference of lexical relationships between entities. However, the
lexical component of biomedical ontologies is typically restricted to the labels of concepts, which results
in limitations in capturing mappings that require more contextual information beyond the simple
similarity of labels.
      </p>
      <p>
        The dawn of Large Language Models (LLMs) marked a turning point in our ability to capture and
understand deep semantic relationships between terms. Traditional NLP techniques were insuficient
at extracting contextual meaning, relying on simpler models that could not fully grasp language
nuances. However, with the advent of LLMs such as GPT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], BERT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and other transformer-based
architectures [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], we currently have the ability to process and model complex relationships between
words, phrases, and even complete documents.
      </p>
      <p>
        LLMs are trained on massive datasets containing billions of tokens and are capable of understanding
/ representing not just the meaning of individual terms but how they interact in context. This allows us
to capture subtle semantic relationships, such as synonyms, antonyms, hyponyms, and hypernyms,
which are extremely useful for tasks such as translation, summarization, and question-answering. These
capabilities are likely to translate to the ontology alignment scenario provided a suitable formulation
of the problem is achieved. The success of prompt-based strategies in ontology alignment [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] has
motivated us to explore whether LLMs are able to tackle the mapping of classes from biomedical
ontologies and how well they are able to handle the more dificult cases. However, recent works
have highlighted the dificulties in applying prompt-based strategies to real-world ontologies in other
domains [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        In this paper, we present a preliminary study that focuses on investigating the impact of including
hierarchical relations in the prompt, exploring diferent design patterns for its verbalisation. We
performed an evaluation of prompt-design strategies using a carefully selected set of challenging
mappings extracted from Bio-ML [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], highlighting the pitfalls and strengths of each strategy.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Ontology alignment is the process of identifying correspondences between entities in two distinct
ontologies, typically referred to as the source ontology () and the target ontology (). The goal of
ontology alignment is to establish meaningful mappings between entities, ensuring interoperability
between heterogeneous data sources. The resulting alignment consists of a set of mappings, often
represented as tuples &lt;, , , &gt;, where  and  are entities from  and  respectively,  denotes a
semantic relation (e.g., equivalence, subsumption), and  represents the confidence score of the mapping.
These mappings are crucial for tasks such as data integration, Knowledge Graph fusion, and semantic
interoperability in domains like healthcare, biology, and the Semantic Web.</p>
      <p>Ontology matching systems are predominantly unsupervised, relying on heuristics and rules instead
of deriving a mapping function through learning. These systems typically include three stages:
preprocessing (the identification and retrieval of entities from the ontologies based on specific criteria);
candidate generation (the use of diverse matching techniques to generate possible correspondences
based on ontology features); and filtering (the refinement of initial matches through discarding of
unlikely mappings).</p>
      <p>
        In recent years, with the advent of language models, more attention has been devoted to Machine
Learning-based ontology alignment, with several systems incorporating it [13, 14, 15, 16] and the
creation of the Bio-ML track at the Ontology Alignment Evaluation Initiative. ML introduces a
datafocused approach to ontology alignment, shifting away from heuristic and rule-based approaches.
Unlike traditional OM systems, ML-based methods aim to learn a mapping function using labelled
reference alignments, enabling more adaptive and scalable matching solutions. In principle, this allows
for improved candidate generation, better matcher combination strategies, and more efective filtering
techniques. Most approaches employ BERT-like methods [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and typically sacrifice recall in favour of
precision [17].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Overview</title>
        <p>To investigate the impact of semantic context (in the form of verbalized hierarchical relations) in
prompt-based ontology matching tasks we developed a simple framework to design prompts based
on combinations of relevant elements into meaningful patterns. Prompts built using this framework
were evaluated using diferent language models with diferent model sizes. Our approach takes as input
two candidate entities (typically classes) from each ontology to align, designs a prompt to evaluate the
validity of a mapping between them based on diferent parameters, interrogates the language model
using the prompt, and evaluates its output. In our study, we assume candidates are already selected,
and focus only on prompt design and evaluation.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Prompt design for matching</title>
        <p>We present a two-stage framework for generating context-aware prompts designed for tasks such as
ontology alignment. This framework decomposes prompt generation into a static stage — where
invariant templates (static skeletons) are constructed from a base template using task-specific configuration
parameters — and a dynamic stage — where these skeletons are enriched with instance-specific data.
This modular design allows a small number of configurable templates to be eficiently adapted to large
datasets.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Static Stage: Template Construction</title>
          <p>The static stage begins with a base template, denoted by , which contains symbolic placeholders
indicating the diferent elements that compose a prompt, where specific types of information will be
inserted. The elements are listed in Table 1.</p>
          <p>Element Tag Description
$TC (Task Context) A description of the task the model should perform.
$I (Instruction) A description of the nature of the question that will be asked and the expected answer format.
$CONF (Confidence) The type of confidence that the model should output (if any).
$S (Source) The main label(s) of the source ontology entity.
$CTX_S (Source Context) Labels of potentially meaningful entities to the source entity.
$T (Target) The main label(s) of the target ontology entity.
$CTX_T (Target Context) Labels of potentially meaningful entities to the target entity.</p>
          <p>$TYPE (Equivalence Type) The type of equivalence to be assessed.</p>
          <p>A brief description of each of these elements is presented in Table 1 and their possible values are
presented in Table 2.</p>
          <p>Category Type String Value
Task Context — "You are doing an ontology alignment task,"
Instruction — "I am going to ask you a question and you should answer ’yes’ or ’no’."
Confidence float "Followed by confidence as a score from 0 to 1 (e.g., ’yes:0.8’)"</p>
          <p>categorical "Followed by confidence as ’Not Confident’, ’Confident’, or ’Very Confident’ (e.g., ’yes:Confident’)"
Context subclass_of "a subclass of $SC", with ’$SC’ being a superclass of either $S or $T</p>
          <p>kind_of "a kind of $SC", with ’$SC’ being a superclass of either $S or $T</p>
          <p>Equivalence Type Equivalent "equivalent"
You are doing an ontology alignment task. I am going to ask you a question and you should answer ’yes’ or ’no’,
followed by your confidence in your answer as a score from 0 to 1, like this: ’yes:0.8’.</p>
          <p>Question: Are ’Neuraminidase Deficiency’ and ’glycoproteinosis’ equivalent?</p>
          <p>A set of configuration parameters  ∈ {1, . . . ,  } governs the substitution of these placeholders
with specific verbalizations. We define four vectors:
 = ( 1,  2, . . . ,   ),  = ( 1,  2, . . . ,   ),
 = ( 1,  2, . . . ,   ),  = ( 1,  2, . . . ,   ),
where:
to
where:
•   ∈ {True, False} indicates whether to include a task context.
•   ∈ Γ specifies the comparison prompt.
•   ∈ Σ represents the semantic context prompt.</p>
          <p>•   ∈ ℒ denotes the confidence type (e.g., float or cat).</p>
          <p>Then, for each configuration  ∈ {1, . . . ,  }, the static base template  corresponds to a unique
pattern combining each of the four elements &lt;  ,  ,  ,   &gt; by replacing each placeholder in  by

the appropriate string. The output of this stage is the set {}=1 of static templates that capture the
invariant, configuration-specific aspects of the prompt.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Dynamic Stage: Instance-Specific Enrichment</title>
          <p>Let {( ,  )}=1 be the set of source–target entity pairs in the dataset. In the dynamic stage, each static
template  is enriched with instance-specific information to produce a dynamically-built prompt.</p>
          <p>For each entity pair ( ,  ) and each static template , a dynamic prompt  is generated according
 = dynamic(;  ,  , , ),
1. Label Formatting: The entities  and  provide label sets, which are formatted (e.g., truncated
to a specified cardinality and concatenated with a given delimiter) to yield ( ) and ( ). These
formatted labels replace the placeholders $S and $T, respectively.
2. Contextual Enrichment: Additional contextual information is extracted from the ontology and
formatted as ( ) and ( ), replacing the placeholders $CTX_S and $CTX_T. In cases of absent
context, extraneous semantic tokens may be removed. In this work, we focused on subsumption
relations to include hierarchical context.</p>
          <p>Thus, the dynamic prompt for each static skeleton and entity pair is obtained via the function dynamic,
and for each , the complete set of dynamic prompts is given by</p>
          <p>= { }=1.</p>
          <p>These dynamically enriched prompts form the final output features for the dataset.</p>
          <p>In summary, the static stage produces a family of invariant templates and the dynamic stage adapts
these templates to each instance ( ,  ). Figures 1 and 2 illustrate two prompt examples with and
without hierarchical context.</p>
          <p>You are doing an ontology alignment task. I am going to ask you a question and you should answer ’yes’
or ’no’, followed by your confidence in your answer as a score from 0 to 1, like this: ’yes:0.8’. Question: Are
’Neuraminidase Deficiency’ ( a kind of Mucolipidosis) and ’glycoproteinosis’ (a kind of lysosomal storage disease)
equivalent?</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Models</title>
        <p>Our experiments evaluated the prompts in five diferent language models with varying numbers of
parameters and reasoning capabilities. The Flan-T5-Base model [18] (with 220 million parameters),
is a lightweight transformer model developed by Google, tailored for instruction-based tasks and
without any reasoning capabilities. The Claude 3.7 Sonnet model [19] was developed by Anthropic and
is a significantly larger model than lightweight models such as Flan-T5-base, possessing 137 billion
parameters but also lacking reasoning capabilities. Our experiments also incorporate GPT4 [20], a
largescale model comprising 1.76 trillion parameters, which represents a significant milestone in enhancing
linguistic fluency and contextual comprehension within generative language models. Additionally, we
also analyse the performance of two state-of-the-art reasoning models: GPT4o [21], a multimodal model
with 200 billion parameters and OpenAIo1 [22], another multimodal model with 175 billion parameters.</p>
        <sec id="sec-3-3-1">
          <title>Model</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>Source</title>
          <p>Flan-t5-base
Claude 3.7 Sonnet
OpenAIo1
GPT4o
GPT4</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>Number of Parameters</title>
        </sec>
        <sec id="sec-3-3-4">
          <title>Reasoning</title>
          <p>220 million
137 billion
175 billion
200 billion
1.76 trillion
No
No
Yes
Yes
No</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Parsing Model Responses</title>
        <p>
          Let  denote a parsing function that maps a textual response , generated by a predictive model, into a
numerical confidence score  ∈ [
          <xref ref-type="bibr" rid="ref1">− 1, 1</xref>
          ]. This confidence score quantifies the certainty associated with a
binary classification decision, indicating either a positive or negative outcome.
        </p>
        <p>
          Parsing Process: The function  is defined through the sequential application of the following
procedures:
1. Text Normalization:
2. Numeric Confidence Extraction:
• Transform the textual response  into lowercase form:  ←
• Remove leading and trailing white space:  ← trim().
lowercase().
3. Default Uncertainty Handling:
• Use regular expressions to search for numeric confidence values within .
• If a numeric value is found, convert it to a float and clip its value to the range [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]. For
instance, a response including "0.85" would yield  = 0.85.
• In the absence of numeric values, assign a default confidence  = 1.0. This default ensures
reliance exclusively on the binary signal derived from keyword polarity.
4. Solution Polarity Determination:
• Adjust the polarity of the mapping based on explicit binary indicators:
        </p>
        <p>⎧⎪0, if "no" (negative) is detected,
 = ⎨1, if "yes" (positive) is detected,</p>
        <p>⎪⎩0.0, if neither or both indicators ("yes" and "no") are detected.</p>
        <p>This parsing approach enables consistent extraction of numerical confidence scores from multiple
textual responses generated for each query.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Evaluation</title>
        <p>
          Our preliminary experiments focused on a subset of the mappings for the NCIT-DOID task of
BioML[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. This track includes a special dataset, Bio-ML LLM, which contains 50 randomly selected
matched class pairs from ground truth mappings, excluding pairs that can be aligned with direct string
matching (i.e., having at least one shared label). This restricts the eficacy of conventional lexical
matching. Of these 50 pairs, we selected the six which were considered as particularly hard to detect.
For each source class in these "very hard" mappings, we created an additional "hard" negative (i.e., a
target class with some lexical similarity to the source). The mappings are listed in Table 4.
        </p>
        <sec id="sec-3-5-1">
          <title>Source</title>
          <p>Esophageal Verrucous Carcinoma
Esophageal Verrucous Carcinoma
Diabetic Vascular Disorder
Diabetic Vascular Disorder
Malignant Hypopharyngeal Neoplasm
Malignant Hypopharyngeal Neoplasm
Neuraminidase Deficiency
Neuraminidase Deficiency
Bone Necrosis
Bone Necrosis
Microcystic Adnexal Carcinoma
Microcystic Adnexal Carcinoma</p>
        </sec>
        <sec id="sec-3-5-2">
          <title>Target Status</title>
          <p>esophagus verrucous carcinoma 1
esophageal varix 0
diabetic angiopathy 1
diabetic encephalopathy 0
hypopharynx cancer 1
malignant granular cell skin tumor 0
glycoproteinosis 1
biotinidase 0
ischemic bone disease 1
dysbaric osteonecrosis 0
malignant syringoma 1
nasopharynx carcinoma 0</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Table 5 presents the confusion matrix for the preliminary experiments. When the prompt does not
include hierarchical contextual information, the best-performing models are OpenAIo1 and GTP4o,
which despite being smaller than GPT4 have improved reasoning capabilities. These reasoning
capabilities may help the models perform better when there is less information available. In fact, GPT4 ranks
fourth despite being the largest model.</p>
      <p>When semantic contextual information is given to the models, we observed very diferent behaviours
between the "kind_of" prompt and the "subclass_of" prompt. While the "kind_of" resulted in improved
results for the non-reasoning models, for the reasoning models, it had either no impact or a small
negative impact. The "subclass_of" prompt, however, did not perform as well, having a negative impact
in most models. These results demonstrate that hierarchical contextual information should be considered
when designing prompts for biomedical ontology alignment. It is well worth noting that the second
best performing approach was the pairing between Claude-3.7-Sonnet and the "kind_of" prompt, which
achieved nearly identical results with GPT4, while being 10% of its size.
flan-t5-base</p>
      <sec id="sec-4-1">
        <title>Claude 3.7 Sonnet</title>
      </sec>
      <sec id="sec-4-2">
        <title>OpenAIo1</title>
        <p>GPT4o
GPT4
w/o HC
w/ HC (kind of)
Pred: 1</p>
        <p>Pred: 0</p>
        <p>Pred: 1</p>
        <p>Pred: 0</p>
        <p>Pred: 1</p>
        <p>Pred: 0
Actual: 1
Actual: 0
Actual: 1
Actual: 0
Actual: 1
Actual: 0
Actual: 1
Actual: 0
Actual: 1
Actual: 0</p>
        <p>We also investigated in more depth some false negative cases, depicted in Table 6. Some mappings,
such as "Neuraminidase Deficiency - glycoproteinosis" are missed by all models, regardless of the
context that is imparted in the prompt. Curiously, some sources indicate that this may not actually be
an equivalence but rather a subsumption, with the corresponding diseases being modelled as such in
ICD-10 (categories E77 and E77.1). However, including the hierarchical context in the form of "kind_of"
prompts mitigates these issues, with most models, especially the mid to large sized, improving their
recall of hard-to-find positive mappings.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This study explored the efectiveness of semantic prompting strategies, particularly the use of
hierarchical contextual information, in enhancing biomedical ontology alignment with language models.
Our experiments revealed that the impact of the inclusion of hierarchical context depended on the
prompt wording. While the "kind_of" prompt — which more closely aligns with everyday language
— improved the performance for non-reasoning models, the "subclass_of" prompt generally led to
decreased performance. These findings highlight that the value of adding semantic context is heavily
influenced by the verbalisation used when designing prompts.</p>
      <p>We also found that smaller models like OpenAIo1 and GPT4o outperformed larger models like GPT4
when no hierarchical context was included in the prompt. This suggests that smaller models with better
reasoning capabilities may perform more efectively when limited information is provided. Interestingly,
the pairing of Claude-3.7-Sonnet with the "kind_of" prompt delivered nearly identical results to GPT4,
despite being only 10% of its size, showing that less resource-intensive models can still achieve strong
performance when combined with the right prompting strategies.</p>
      <p>Additionally, the inclusion of hierarchical context through the "kind_of" prompt improved the recall
of hard-to-find mappings, especially for mid- to large-sized models. However, some mappings remained
challenging for all models, indicating that certain biomedical ontology mappings require more advanced
approaches.</p>
      <p>Future work will focus on extending the prompt design framework to include in-context learning
based on positive and negative examples and developing additional strategies to extract semantic context
by exploring common biomedical ontology features such as partonomy, rich synonyms and logical
definitions.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by FCT through the fellowships 2022.10557.BD (Pedro Cotovio), and the
LASIGE Research Unit, ref. UID/00408/2025. It was also partially supported by the KATY project which
has received funding from the European Union’s Horizon 2020 research and innovation program under
grant agreement No 101017453. This work was also supported partially by project 41, HfPT: Health
from Portugal, funded by the Portuguese Plano de Recuperação e Resiliência.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-4o in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          , Ontology matching, 2nd ed., Springer-Verlag, Heidelberg (DE),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Portisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hladik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Background knowledge in ontology matching: A survey</article-title>
          ,
          <source>Semantic Web</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>2639</fpage>
          -
          <lpage>2693</lpage>
          . doi:
          <volume>10</volume>
          .3233/SW-223085.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <article-title>Ontology matching: State of the art, future challenges, and thinking based on utilized information</article-title>
          ,
          <source>in: Proceedings of the 19th International Workshop on Ontology Matching.</source>
          , volume
          <volume>9</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>91235</fpage>
          -
          <lpage>91243</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .
          <volume>3057081</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Balasubramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          , Agreementmakerlight, Semantic
          <string-name>
            <surname>Web</surname>
          </string-name>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <article-title>Logmap: Logic-based and scalable ontology matching</article-title>
          ,
          <source>in: The Semantic Web-ISWC</source>
          <year>2011</year>
          : 10th International Semantic Web Conference, Bonn, Germany,
          <source>October 23-27</source>
          ,
          <year>2011</year>
          , Proceedings,
          <source>Part I 10</source>
          , Springer,
          <year>2011</year>
          , pp.
          <fpage>273</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          , I. Mott,
          <string-name>
            <given-names>C.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <article-title>Tackling the challenges of matching biomedical ontologies</article-title>
          ,
          <source>Journal of biomedical semantics 9</source>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Improving language understanding by generative pre-training (</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , volume
          <volume>1</volume>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>Exploring large language models for ontology alignment</article-title>
          ,
          <source>arXiv preprint arXiv:2309.07172</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Macilenti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fiorelli</surname>
          </string-name>
          ,
          <article-title>Prompting is not all you need evaluating gpt-4 performance on a real-world ontology alignment use case</article-title>
          ,
          <source>Procedia Computer Science</source>
          <volume>246</volume>
          (
          <year>2024</year>
          )
          <fpage>1289</fpage>
          -
          <lpage>1298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hadian</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks,</surname>
          </string-name>
          <article-title>Machine learning-friendly biomedical datasets for equivalence and subsumption ontology matching</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2022</year>
          , pp.
          <fpage>575</fpage>
          -
          <lpage>591</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>