<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Z. Qiang);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhangcheng Qiang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weiqing Wang</string-name>
          <email>teresa.wang@monash.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kerry Taylor</string-name>
          <email>kerry.taylor@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Australian National University, School of Computing</institution>
          ,
          <addr-line>108 North Road, Acton, ACT 2601, Canberra</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Monash University, Faculty of Information Technology</institution>
          ,
          <addr-line>25 Exhibition Walk, Clayton, VIC 3800, Melbourne</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>We present the results obtained in the Ontology Alignment Evaluation Initiative (OAEI) 2025 campaign using our ontology matching (OM) system Agent-OM. This is our first participation in the OAEI campaign, featuring two variants with diferent large language models (LLMs): The production version uses commercial LLMs for optimal performance, while the lite version uses open-source LLMs for cost-efectiveness. Experimental results in eight OAEI tracks demonstrate the generative power of Agent-OM in handling OM tasks from diverse domains, languages, and vocabularies. We also outline future directions to improve our system.</p>
      </abstract>
      <kwd-group>
        <kwd>ontology matching</kwd>
        <kwd>OAEI campaign</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
_ = 1 and 
_ = 1.0 to ensure
that LLMs always choose the top-one token and do not filter the output. Note that these parameters
may not be modifiable in commercial LLMs. For example, the 
_ value is currently not available
in OpenAI models. We use   
_ = 1.0
(or similar parameters,   
_ = 0.0
and
_ = 0.0</p>
      <p>) to assign no penalty for repeated output from LLMs; in other words, to
encourage LLMs to produce repetitive and stable outputs.</p>
    </sec>
    <sec id="sec-2">
      <title>1.2. Agent-OM in the OAEI 2025 campaign</title>
      <p>There are two Agent-OM variants that participate in the OAEI 2025 campaign.
• Agent-OM is the production version of Agent-OM. The backend uses commercial LLMs and the
corresponding embedding models. The production version achieves optimal performance, but requires
extensive access to commercial APIs. The results show slight diferences across diferent runs due to
limited support for reproducibly fixing the model’s hyperparameters.
• Agent-OM-Lite is the lite version of Agent-OM. The backend uses open-source LLMs for both
language processing and text embedding. Although the performance of the lightweight version is
usually poorer than that of the production version, it ofers an alternative solution for cost-constrained
or security-constrained scenarios. The result is more stable across diferent runs.</p>
      <sec id="sec-2-1">
        <title>1.2.1. System settings</title>
        <p>
          Figure 1 provides the LLM variations available for OAEI 2025. For commercial API-accessed LLMs used
in Agent-OM, gpt-4o [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] with the timestamp tag 2024-05-13 has the optimal performance and stability.
However, its API cost can be expensive for large-scale OM tasks. Alternatives include the late-breaking
version without the timestamp tag and the mini version gpt-4o-mini [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Note that the late-breaking
version may produce less stable results, while the mini version can lower the matching performance. For
open-source llama-3 [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] models used in Agent-OM-Lite, the large-size model llama-3-70b can perform
better than the small-size model llama-3-8b, but the execution time may be longer.
        </p>
        <p>
          Table 1 shows the hyperparameter settings for OAEI 2025. We use gpt-4o(-mini) with the text
embedding model ada-002 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] for the production version Agent-OM and llama-3 for Agent-OM-Lite. The
global settings of similarity_threshold = 0.90 and top@k = 3 may not be optimal for each track. We
recommend trying diferent settings to find a customised setting for each task. The new LLM
hyperparameters may cause additional execution time for LLMs. If the task does not have a reproducibility
requirement, we suggest setting the temperature to 0.0 and ignoring other hyperparameters. There is
only a slight diference between multiple runs with this setting. The system hyperparameter @ is
used to restrict the top  matching candidates chosen by Agent-OM and its lite version, while the LLM
hyperparameter  _ is used to restrict the top  tokens selected by the LLM used in Agent-OM and its
lite version. The  _ value is not functional when  _ = 1 .
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>1.2.2. Performance reporting</title>
        <p>
          For the confidence of each mapping, Agent-OM provides an approximate range (e.g.   ≥ 0.90 ),
but not the exact value (e.g.   = 0.97 ). This is because Agent-OM applies reciprocal rank
fusion (RRF) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] on top of the matching results, and the ranking results do not have a statistical link to
confidence. We use RRF to overcome two limitations of the traditional approach by computing a mean
of syntactic, lexical, and semantic matching results (as illustrated in Figure 2):
(1) The traditional approach cannot determine the best match between very similar entities. For
example, entities may have the same mean value despite having diferent results in syntactic, lexical,
and semantic matching (coloured blue in the figure). By computing and accumulating their rankings,
the RRF approach is able to distinguish the best match from other close matches.
(2) The traditional approach is very sensitive to insuficient input data causing semantic matching to
fail. For example, an entity with missing results in semantic matching will obtain a very low mean
value (coloured red in the figure). In such cases, the RRF approach is able to minimise the impact of
missing values so that the entity with missing values becomes comparable with other entities.
        </p>
        <p>
          Agent-OM is expected to have a longer execution time than traditional OM systems. Agent-OM is
built on LLM agents, which are inherently characterised by latency behaviours. Its powerful capability in
reasoning is achieved by accumulating historical context and enabling a comprehensive tool-augmented
extension. This results in a lengthy context fed into LLMs, as well as increased resource usage in
tool calling and memory access [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Additionally, for accessing commercial API-accessed models (e.g.
gpt-4o and gpt-4o-mini), the execution time is under the control of the API provider and not of the
matcher. Guardrails are typically applied to restrict the number of requests per minute (RPM) and
tokens per minute (TPM). For example, the OpenAI rate limits given in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. For accessing open-source
models (e.g. llama-3), the execution time depends on the settings of the local machine. On machines
equipped with graphics processing units (GPUs), the processing is significantly faster than on machines
with only central processing units (CPUs). Agent-OM has multiple CRUD (create, read, update, and
delete) functions on its database. The time used in querying and searching is driven by the choice of
database implementation.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>1.3. Link to the system and parameters file</title>
      <p>Agent-OM is open-source and released under a Creative Commons
Attribution-NonCommercialShareAlike 4.0 International License. The source code, data, and/or other artifacts have been made
available at https://github.com/qzc438/ontology-llm.</p>
    </sec>
    <sec id="sec-4">
      <title>1.4. Link to the system alignments</title>
      <p>
        The system alignments are stored in the folder named OAEI_2025 at https://github.com/qzc438/
ontology-llm/tree/master/campaign/. For large datasets, the complete results are stored in the folder
named OAEI_2025 at https://github.com/qzc438/ontology-llm-large-datasets/tree/master/campaign/.
Under each track folder, predict.csv/predict.xml corresponds to our system alignment, whereas true.csv
corresponds to the reference alignment. The confidence for each mapping in predict.csv/predict.xml
is greater than or equal to 0.90, produced by the setting of   _ℎ ℎ = 0.90 . Note that we
do not include the element &lt;measure&gt; in our alignment file, while the evaluation conducted in the
Matching EvaLuation Toolkit (MELT) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] will add &lt;measure rdf:datatype=“xsd:float”&gt;1.0&lt;/measure&gt;
in this case. The result.csv file reports the measures of precision, recall, and F1 score. Some rows present
intermediate partial results, but rows ending with “llm_with_agent” in the “Alignment”column present
the final matching results. The execution time is not reported due to variations in the API provider
(for commercial API-accessed LLMs) and computational power (for open-source LLMs). It follows a
linear growth with the number of entities if no additional optimisations are applied, such as those in
Section 3.2. The cost.csv file reports the API charge for API-accessed LLMs. Open-source LLMs are
used free-of-charge.
      </p>
      <sec id="sec-4-1">
        <title>2. Results</title>
        <p>Table 3 shows Agent-OM participation in the OAEI 2025 TBox/schema matching tracks. Agent-OM
focuses on one-to-one equivalence mapping in the TBox/schema matching tasks. The current system
has limited support for instance matching and link discovery. We do not participate in the TBox/schema
matching tracks that contain interactive matching and complex matching types or relations. We
refer readers to the oficial website for the results of Agent-OM in the OAEI 2025 campaign: https:
//oaei.ontologymatching.org/2025/results/. Note that the results are produced from a single trial by the
authors, and slight diferences may occur across multiple runs due to the non-determinism of LLMs. The
system variant and chosen LLM are determined by balancing performance and cost eficiency. For small
tasks, we use our premium Agent-OM working with the premium gpt-4o-2024-05-13 model, which
gives our best results at a high cost. For medium-sized tasks, we use Agent-OM with the inexpensive
gpt-4o-mini-2024-07-18 model. For large-scale tasks, we use Agent-OM-Lite with the free-of-charge
open-source llama-3-8b model running entirely on our local machine.
Tracks labelled “-” were unable to publish Agent-OM results due to platform issues or communication failures.
Given the rank of some matcher</p>
        <p>
          on track  , and the number of participants on that track, our
goal is to normalise all reciprocal ranks to a scale of [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ], with 1 corresponding to the highest rank
and 0 to the lowest. Therefore, the normalised reciprocal rank (
∗) and the overall mean reciprocal
rank ( 
∗) are defined as:
 ∗ = 1 −
        </p>
        <p>(rank − 1)
(number of participants − 1)
 
∗ =</p>
        <p>1
 =1
∑  
∗
(1)
Figure 3 shows Agent-OM’s normalised 
∗ per track and overall  
∗ on the tracks shown as
“complete” in Table 3. We can see no pattern in Agent-OM’s precision vs recall performance ranking
running across the tracks, although this may reflect track-wise precision-recall variability in other
matchers. The conference and dh tracks have a notable gap between rankings in precision and recall,
suggesting room for improvement. Regarding</p>
        <p>∗, we observe that Agent-OM’s precision and F1 score
are very similar, suggesting that Agent-OM could be more competitive by prioritising improvements of
recall over precision.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3. General comments and conclusions</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3.1. Comments on the results</title>
      <p>
        (1) We apply Agent-OM to three previously evaluated tracks (anatomy, conference, and mse) in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
and two new tracks (dh and ce). While the mse track is not participating in the 2025 campaign, we
provide our results for this track on GitHub for reference (see Section 1.3). The results indicate that
Agent-OM is resilient in performing tasks from diverse domains with varying levels of complexity
and ontology structural terms used. Although performance by traditional systems on simple tasks
remains comparable, we believe that Agent-OM is paving the way for shifting to general-purpose
domain-independent OM systems.
(2) We apply Agent-OM to two multilingual tracks (multifarm and arch-multiling). We find that
matching ontologies expressed in the same language is more successful than matching diferent languages,
although having English as one of the pair of diferent languages is clearly advantageous. Further,
matches between ontologies using languages from the same language family (e.g. English matched
with European languages) are better than those between diferent language families. In some cases,
these patterns do not apply to the Chinese language. This may be fundamentally due to the high-tech
dominance of the English language, so LLMs are commonly trained in English. English has incorporated
many aspects of European languages (Germanic, French, and Latin) as well as vocabulary from other
global languages. Chinese uses a diferent tokenisation to English, but most LLMs are able to deal with
Chinese, perhaps due to plentiful Chinese training data.
(3) We apply Agent-OM-Lite to two biomedical tracks (bio-ml and biodiv). We find that the computation
time for these two tracks is significantly longer than for other tracks. This is because the ontologies
used in these tracks are large ontologies and Agent-OM always captures syntactic, lexical, and semantic
information for each ontology entity. In general, it is a useful practice because it addresses two common
matching scenarios: the same concept with diferent names and diferent concepts with the same name.
However, in some tasks in the biomedical domain, it is rare for diferent concepts to have the same name,
for example, ncit-doid in bio-ml and fish-zooplankton in biodiv. Therefore, it could be worthwhile to
initially match only by syntactic matching and to assess intermediate results. For those tasks where
performance is excellent, matching could stop there. For those tasks with poor performance, proceeding
to the much more computationally-demanding LLM-based lexical and semantic steps could be justified.
      </p>
    </sec>
    <sec id="sec-6">
      <title>3.2. Discussion on ways to improve the proposed system</title>
      <p>
        (1) Agent-OM can be used for both subsumption matching and one-to-many/many-to-many matching.
However, such matches are susceptible to the similarity threshold chosen. When similarity is very
high, we could declare three matches (i.e. equivalence, subsumption, and inverse subsumption), but we
cannot determine which to use. Although one entity can have multiple closely-matched candidates, it
is hard to determine the best similarity threshold as a cut-of point. In some cases, one could look for a
“gap” in the similarity scores to define the cut-of point, obtaining a diferent cut-of for each mapping.
(2) Agent-OM is currently using bidirectional validation to reduce LLM hallucinations, but it is not
eficient when the input data to the OM system is unbalanced. One of the ontologies may be much
larger than another. In such settings, the system should select the smaller ontology as the starting point
so that validation can be applied to fewer matching candidates.
(3) After LLM validation, an extra step of human validation could be useful for precise mappings.
Although LLMs can serve as oracles acting as domain experts [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], several limitations should be taken
into account. For example, the llama-3-8b model may treat “A is the subclass of B” and “B is the subclass
of A” as contradictions, even though these two statements indicate the equivalence of A and B.
      </p>
      <p>
        In the era of LLMs, we believe that there are two pathways to develop modern LLM-based OM systems.
One is to explore the new LLM infrastructure, and the other is the LLM fine-tuning. The former often
injects external knowledge from retrieval-augmented generation (RAG) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] into the LLMs, while the
latter uses training data to update the internal knowledge of the LLMs. A communication module (e.g.
LLM agent) is the critical component of the LLM infrastructure, while high-quality data is the key
to finding the “Aha! moment” for LLM fine-tuning. Recent research focusing on LLM infrastructure
includes [
        <xref ref-type="bibr" rid="ref1 ref17 ref18 ref19 ref20 ref21">1, 17, 18, 19, 20, 21</xref>
        ], while LLM fine-tuning is addressed in [
        <xref ref-type="bibr" rid="ref22 ref3">3, 22, 23, 24</xref>
        ].
3.3. Comments on the OAEI test cases
(1) The namespaces in reference.xml are mixed with
“http://knowledgeweb.semanticweb.org/heterogeneity/alignment#&gt;” (with #) and “http://knowledgeweb.semanticweb.org/heterogeneity/alignment”
(without #). A script has been provided to normalise the inconsistent use of namespaces according to
the Alignment API format [25].
(2) The ontologies used in the OAEI campaign require a cleaning procedure. Some information irrelevant
to the OM task needs to be removed, for example, the metadata of the entity (creator and date/time),
which is not relevant to the entity’s meaning. Including this metadata can confuse the similarity
assessment of an entity pair.
(3) A complete reference is the key to ensuring a fair comparison. We identify two primary reasons
for the low performance in certain tracks. a) Some reference alignments have missing mappings. We
suggest using LLMs as a tool to validate existing correspondences or to discover missing mappings [26].
b) Some ontologies have some entities with properties (e.g. skos:related) that refer to external resources
with naming conventions using codes. In this case, the name carries no natural language meaning and
may be confusing to LLMs. We suggest removing these references to external ontologies. Alternatively,
we could extend Agent-OM to retrieve external ontologies and use them in our matching process.
(4) For machine learning and LLM fine-tuning for OM, data sampling for the training set needs to be
diverse with respect to concepts so that LLMs can learn the domain knowledge. For example, if the
alignment includes food nutrition, then the training data is expected to include food nutrition concepts.
Several examples of data sampling for OM can be found in [27].
      </p>
    </sec>
    <sec id="sec-7">
      <title>3.4. Comments on the OAEI measures</title>
      <p>(1) The output formats for OM systems vary. OAEI only accepts the Alignment API format. There is a
need to develop a unified pipeline to convert diferent formats to the Alignment API format.
(2) LLMs are non-deterministic by nature. An update to the current platform may be required to ensure
that LLMs are employed in a uniform setting. We suggest introducing a stream “LLM Arena for OM”, in
which all systems are expected to use the same LLM and hyperparameter settings for the campaign.</p>
      <sec id="sec-7-1">
        <title>Acknowledgments</title>
        <p>The authors thank the Ontology Alignment Evaluation Initiative (OAEI) organising committee and
track organisers for their help in dataset curation and clarification. The authors thank Jing Jiang from
the Australian National University (ANU) for helpful advice on the justification of multilingual tracks.
The authors thank the Commonwealth Scientific and Industrial Research Organisation (CSIRO) for
supporting this project. Weiqing Wang is the recipient of an Australian Research Council Discovery
Early Career Researcher Award (project number DE250100032) funded by the Australian Government.</p>
        <p>This is Agent-OM’s first participation in the OAEI campaign. According to the OAEI data policy
(retrieved October 1, 2025), “OAEI results and datasets, are publicly available, but subject to a use policy
similar to the one defined by NIST for TREC . These rules apply to anyone using these data.” Please find
more details from the oficial website: https://oaei.ontologymatching.org/doc/oaei-deontology.2.html.</p>
      </sec>
      <sec id="sec-7-2">
        <title>Declaration on Generative AI</title>
        <p>During the preparation of this work, the authors used Grammarly in order to grammar and spell check,
and to improve the text readability. After using the tool, the authors reviewed and edited the content
and take full responsibility for the publication’s content.
Neural Information Processing Systems, volume 37, Curran Associates, Inc., 2024, pp. 14690–14711.
doi:10.52202/079017-0469.
[23] G. Sousa, R. Lima, C. Trojahn, Complex ontology matching with large language model embeddings,
2025. URL: https://arxiv.org/abs/2502.13619. arXiv:2502.13619.
[24] H. Yang, J. Chen, Y. He, Y. Gao, I. Horrocks, Language models as ontology encoders, in: The
Semantic Web – ISWC 2025: 24th International Semantic Web Conference, Springer, Nara, Japan,
2025, pp. 443–461. doi:10.1007/978-3-032-09527-5_24.
[25] J. David, J. Euzenat, F. Scharfe, C. Trojahn dos Santos, The Alignment API 4.0, Semantic Web 2
(2011) 3–10. doi:10.3233/SW-2011-0028.
[26] Z. Qiang, K. Taylor, W. Wang, How does a text preprocessing pipeline afect ontology matching?,
2024. URL: https://arxiv.org/abs/2411.03962. arXiv:2411.03962.
[27] S. Hertling, E. Norouzi, H. Sack, OAEI machine learning dataset for online model generation, in:
The Semantic Web: ESWC 2024 Satellite Events, Springer, Hersonissos, Crete, Greece, 2024, pp.
239–243. doi:10.1007/978-3-031-78952-6_34.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , Agent-OM:
          <article-title>Leveraging LLM agents for ontology matching</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>18</volume>
          (
          <year>2024</year>
          )
          <fpage>516</fpage>
          -
          <lpage>529</lpage>
          . doi:
          <volume>10</volume>
          .14778/3712221.3712222.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , W. Wang,
          <article-title>OM4OV: Leveraging ontology matching for ontology versioning</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2409.20302. arXiv:
          <volume>2409</volume>
          .
          <fpage>20302</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , W. Wang,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , OAEI-LLM:
          <article-title>A benchmark dataset for understanding large language model hallucinations in ontology matching, in: Proceedings of the Special Session on Harmonising Generative AI and Semantic Web Technologies co-located with the 23rd International Semantic Web Conference</article-title>
          , volume
          <volume>3953</volume>
          , CEUR-WS.org, Baltimore, Maryland, USA,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <article-title>DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning</article-title>
          ,
          <source>Nature</source>
          <volume>645</volume>
          (
          <year>2025</year>
          )
          <fpage>633</fpage>
          -
          <lpage>638</lpage>
          . doi:
          <volume>10</volume>
          .1038/s41586-025-09422-z.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] OpenAI, Open models by
          <source>OpenAI</source>
          ,
          <year>2025</year>
          . URL: https://openai.com/open-models/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Taylor</surname>
          </string-name>
          ,
          <string-name>
            <surname>Precision-Recall-F1 Visualisation</surname>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://github.com/ qzc438/p
          <article-title>-r-f1-vis.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7] OpenAI,
          <fpage>gpt</fpage>
          -
          <lpage>4o</lpage>
          ,
          <year>2024</year>
          . URL: https://platform.openai.com/docs/models/gpt-4o.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8] OpenAI,
          <string-name>
            <surname>gpt-</surname>
          </string-name>
          4o-mini,
          <year>2024</year>
          . URL: https://platform.openai.com/docs/models/gpt-4o-mini.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Meta,</surname>
          </string-name>
          <source>llama-3</source>
          ,
          <year>2024</year>
          . URL: https://www.llama.com/models/llama-3/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10] OpenAI, ada-
          <fpage>002</fpage>
          ,
          <year>2022</year>
          . URL: https://platform.openai.com/docs/models/text-embedding-ada-
          <volume>002</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G. V.</given-names>
            <surname>Cormack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L. A.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Buettcher</surname>
          </string-name>
          ,
          <article-title>Reciprocal rank fusion outperforms condorcet and individual rank learning methods</article-title>
          ,
          <source>in: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , Boston, Massachusetts, USA,
          <year>2009</year>
          , pp.
          <fpage>758</fpage>
          -
          <lpage>759</lpage>
          . doi:
          <volume>10</volume>
          .1145/1571941.1572114.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rhu</surname>
          </string-name>
          ,
          <article-title>The cost of dynamic reasoning: Demystifying AI agents and test-time scaling from an AI infrastructure perspective</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2506.04301. arXiv:
          <volume>2506</volume>
          .
          <fpage>04301</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13] OpenAI, OpenAI rate limits,
          <year>2025</year>
          . URL: https://platform.openai.com/docs/guides/rate-limits.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Portisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          , MELT
          <article-title>- matching evaluation toolkit</article-title>
          ,
          <source>in: Semantic Systems. The Power of AI and Knowledge Graphs</source>
          , volume
          <volume>11702</volume>
          , Springer, Karlsruhe, Germany,
          <year>2019</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>245</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -33220-4_
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lushnei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shumskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shykula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>d'Avila Garcez, Large language models as oracles for ontology alignment</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2508.08500. arXiv:
          <volume>2508</volume>
          .
          <fpage>08500</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for knowledge-intensive NLP tasks</article-title>
          ,
          <source>in: Proceedings of the 34th Annual Conference on Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc., Vancouver, British Columbia, Canada,
          <year>2020</year>
          , pp.
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          , H. Paulheim,
          <article-title>OLaLa: Ontology matching with large language models</article-title>
          ,
          <source>in: Proceedings of the 12th Knowledge Capture Conference</source>
          <year>2023</year>
          , ACM, Pensacola, Florida, USA,
          <year>2023</year>
          , pp.
          <fpage>131</fpage>
          -
          <lpage>139</lpage>
          . doi:
          <volume>10</volume>
          .1145/3587259.3627571.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Babaei Giglou</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Engel</surname>
          </string-name>
          , S. Auer,
          <article-title>LLMs4OM: Matching ontologies with large language models</article-title>
          ,
          <source>in: The Semantic Web: ESWC 2024 Satellite Events</source>
          , Springer, Hersonissos, Crete, Greece,
          <year>2024</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>35</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -78952-
          <issue>6</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Payne</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Zhang,</surname>
          </string-name>
          <article-title>Large language model assisted multi-agent dialogue for ontology alignment</article-title>
          ,
          <source>in: Proceedings of the 2024 International Conference on Autonomous Agents and Multiagent Systems</source>
          , IFAAMAS, Auckland, New Zealand,
          <year>2024</year>
          , pp.
          <fpage>2594</fpage>
          -
          <lpage>2596</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Taboada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arideh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mosquera</surname>
          </string-name>
          ,
          <article-title>Ontology matching with large language models and prioritized depth-first search</article-title>
          ,
          <source>Information Fusion</source>
          <volume>123</volume>
          (
          <year>2025</year>
          )
          <article-title>103254</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.inffus.
          <year>2025</year>
          .
          <volume>103254</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>L.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , E. Barcelos,
          <string-name>
            <given-names>R.</given-names>
            <surname>French</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Wu,</surname>
          </string-name>
          <article-title>KROMA: Ontology matching with knowledge retrieval and large language models</article-title>
          ,
          <source>in: The Semantic Web - ISWC 2025</source>
          , Springer, Nara, Japan,
          <year>2025</year>
          , pp.
          <fpage>629</fpage>
          -
          <lpage>649</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>032</fpage>
          -09527-5_
          <fpage>34</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>Language models as hierarchy encoders</article-title>
          , in: Advances in
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>