<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Amsterdam, The Netherlands
$ u.bhattacharya@uu.nl (U. Bhattacharya); maaike.deboer@tno.nl (M. d. Boer); s.a.sosnovsky@uu.nl (S. Sosnovsky)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Automatic Ontology Term Typing by LLMs: the Impact of Prompt and Ontology Variation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Upal Bhattacharya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maaike de Boer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergey Sosnovsky</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department Data Science</institution>
          ,
          <addr-line>TNO, Anna van Buerenplein 1, 2595 DA, Den Haag</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Information and Computing Sciences</institution>
          ,
          <addr-line>Utrecht Univerisity, Princetonplein 5, 3584 CC, Utrecht</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Large Language Models (LLMs) have been applied to a wide variety of ontology engineering tasks. Building on initial progress, further research is needed to explore potential efects of variation over model-specific and ontology-specific factors. We perform a preliminary study on the ability of an LLM to perform term typing using only its own knowledge through concept retrieval and analyse the efect of domain contextualisation, ontology structure and popularity of ontologies on performance. Our findings suggest that LLMs are reasonably adept at identifying correct individual to concept assertions but are less capable of inferring concept hierarchies when used in a zero-shot setting. Domain contextualisation can enhance performance for structurally complex and less-popular ontologies. Our analysis furthers hints at ontology popularity improving concept retrievability while complexity in terms of structural depth and dispersion makes it dificult for LLMs to identify assertions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLM</kwd>
        <kwd>ontology evaluation</kwd>
        <kwd>ontology learning</kwd>
        <kwd>individual assertion</kwd>
        <kwd>term typing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Large language models (LLMs) have access to substantial factual knowledge [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] and are widely used
for several natural language tasks including knowledge-based tasks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Their ability to "understand"
complex texts and perform reasoning tasks at scale has led to the wide experimentation with LLMs
for ontology enhancement [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7 ref8">4, 5, 6, 7, 8</xref>
        ]. Interest in leveraging LLMs for various ontology learning
tasks [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref9">9, 10, 11, 12</xref>
        ] has also been growing. However, there is still a lot to learn about the nature of the
interaction between LLMs and ontologies. Whereas LLMs are trained on massive data including freely
available ontologies from the Web, recent works highlight that such presence of information does not
translate into strong performance on various ontology learning and knowledge tasks [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ].
      </p>
      <p>While LLM-driven ontology enhancement has been gaining traction, more attention needs to be
paid to exploring the possible efects of variability within LLMs and ontologies on the potency of
such enhancement. The interplay of factors like prompt domain contextualisation, ontology structure
and ontology popularity can significantly afect the ability of LLMs to perform ontology-related tasks.
Understanding the efects of variation in these underlying variables and their interaction can help
identify optimal strategies for LLM-supported ontology development.</p>
      <p>This paper describes a preliminary study on the ability of an LLM to perform term typing in a
zero-shot setting and presents an analysis of the observed performance depending on a small subset of
potentially important LLM-specific and ontology-specific factors. Term typing is an ontology learning
and enrichment task of mapping new individuals to concepts within an ontology. It requires a model
to "understand" (or at least to recognize) the features of concepts within an ontology to make new
individual to concept assertions. The paper addresses the following research question: How capable
are LLMs at ontology term typing through concept retrieval? As a part of the study, we also look
into the LLMs’ ability to implicitly identify concept hierarchies. We investigate performance variation
over prompt domain contextualisation, ontology structure, and ontology popularity on an ontology
learning task. Throughout the study, we outline the importance of identifying critical factors to be able
to explain variation exhibited by LLMs over diferent ontology development and learning tasks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Research on the application of LLMs in ontology development range from techniques to enhance existing
ontologies, to approaches focused on engineering ontologies from scratch, to evaluation frameworks
for assessment of performance of LLMs in various ontology learning tasks.</p>
      <p>
        Several studies focused on using LLMs for automated ontology development to reduce human
intervention [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Dong et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] approached the task of concept placement with a three-step strategy
of edge search, formation and selection using various models. Their results highlight that fine-tuned
BERT-based [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] models outperform larger decoder-only models used in a zero-shot setting. Focusing
on decoder-only LLMs, Giglou et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] modeled ontology alignment as a paired-sentence classification
task by prompting LLMs to answer questions of equivalence of concepts given their immediate parents or
children. He et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] evaluated the ability of LLMs to perform concept matching as a binary classification
task to find that zero-shot decoder-only models do not perform as well as their encoder-only counterparts.
Snijder et al. [17] compared diferent methods for ontology alignment in the application domain of
the labour market. Chen et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] addressed the task of subsumption inference as a paired-sentence
classification task by fine-tuning a BERT model on an ontology entity-based paired-sentence dataset.
      </p>
      <p>
        The ability of LLMs to perform complex tasks and provide formatted outputs prompted their use
in the development of ontologies from the ground up using natural language guidelines or ancillary
tasks. Kommineni et al. [18] used LLMs to generate ontologies and knowledge graphs by tasking the
model to generate competency questions, an ontology and, subsequently, a knowledge graph from
the human-verified questions. Using an LLM to parse natural language sentences into OWL syntax,
Mateiu and Groza [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] were able to create a tool that is capable of generating and populating simple
ontologies from basic sentences. However, the applicability of such a tool for more complex requirements
has not been tested. Funk et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] leveraged natural language as well, adopting a repeated prompt
strategy to identify child concepts and their optimal placements to create a hierarchy. Domain ontology
development requires significant knowledge about the domain itself thereby posing a more nuanced
problem for LLMs. Doumanas et al. [19] investigated the use of LLMs to develop domain ontologies
for the Search and Rescue domain. Their findings highlight impressive capabilities of LLMs to fit new
factual information into an ontology framework and afirm LLMs as capable ontology engineers.
      </p>
      <p>
        The lack of clarity behind the mechanisms responsible for the observed performance of language
models necessitates development of evaluation frameworks for various ontology-related tasks.
Investigating LLM hallucinations over simple information about well-known ontologies, Bombieri et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
probed models for concept labels from the Gene Ontology [20] and the Uberon Ontology [21] using only
their IDs. Their findings show low rate of hallucinations but poor performance in general highlighting
some degree of memorisation proportional to the popularity of the ontologies on the Web. He et al.
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] provided a framework for evaluation of LLMs for subsumption inference. Modelled as a natural
language inference task, their study found decoder-only models to be quite adept at identifying simple
and complex subsumptions. Looking into term typing, taxonomy discovery and relation extraction
performance, Babaei Giglou et al. [22] evaluated models in a zero-shot settings on nine diferent datasets.
Mai et al. [23] gauged reasoning and learning capabilities of language models to find prediction
inconsistencies suggesting that such models tend to fall back to their pre-learnt lexical senses as opposed to
using the provided semantic meanings of concepts in ontologies.
      </p>
      <p>Despite the growing interest and the considerable progress made over the last few years in application
of LLMs for ontology development and evaluation methods for these techniques, there is one important
aspect that has not been suficiently addressed in literature. We need to obtain a better understanding
of performance variability arising from the underlying model-specific and ontology-specific factors.
The interplay of these factors can be quite significant as suggested by variation in reported performance
across studies.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Variation Analysis</title>
      <p>While there are many LLM-specific and ontology-specific factors that can contribute to performance
variability on an ontology learning task, we focus on a specific subset for our study on term typing. We
briefly outline some of these factors contributing to the form and analysis of our experimentation.</p>
      <p>LLM-based Variability: When used without fine-tuning, LLM-based variability (apart from the
choice of diferent LLMs) is primarily driven by prompt variability. We consider the following
elements of a prompt as variables of interest for probing LLMs for our present study:
• Nature of the task: An ontology learning task can be structured in diferent ways e.g.
classification, summarization, retrieval, etc. Prompting an LLM to perform an ontology learning task
using diferent output strategies allows for analysis of the suitability of a particular task to each
performance objective.
• Prompting strategy: Various prompting strategies like zero-shot, few-shot and
retrievalaugmented-generation allow assessment of the amount of example data required by an LLM to
perform an ontology learning task. It indicates the relevancy of the pre-learnt knowledge of an
LLM to perform the task.
• Domain contextualisation: Defining the role of an LLM as an ‘assistant’ or ‘expert’ on a
particular domain or topic in addition to the type of task to be performed e.g. classification,
retrieval, etc. is a powerful tool that encourages LLMs to derive the correct context from the user
input data. In the space of an ontology learning task, this domain in itself is multi-faceted and
can take on any of the following forms:
– Generic: The LLM is not given any role other than that based on the task to be performed.
– General domain of ontologies: The LLM is defined as an expert in ontologies.
– Topic of an ontology: The LLM is defined as an expert in the topic of an ontology e.g.</p>
      <p>‘You are a wine expert’ for an LLM as a wine expert for the Wines Ontology [24]
– Combination of a general ontology and a topic: The LLM is defined as an expert in
ontologies and an expert in the topic of a particular ontology.</p>
      <p>The degree of contextualisation provided by the domain specification in the prompt helps assess
the optimal level of specificity for an LLM to perform an ontology learning task.</p>
      <p>
        Data Variability: The heterogenity of ontologies modelling diferent domains introduces variability
based on their structure and content. We investigate variability along the following variables:
• Ontology Structure: Ontology structure is driven by the nature of the underlying topic. Metrics
for measuring the structural complexity such as depth, breadth, dispersion and tangledness [25]
can help categorise ontologies and reason over similarities and diferences in performance of
LLMs across ontologies and ontology learning tasks based on their structure.
• Popularity: LLMs are capable of memorizing their training data [26] and their performance on
ontology learning tasks is afected by the popularity of the ontology on the Web [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Analysing
the efect of the popularity of the topic of an ontology provides insight into the ability of
LLMs to leverage pre-learnt information and their capability to override it as required.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Term Typing Ranked Retrieval</title>
      <p>We present a preliminary study on term typing by LLMs as a ranked retrieval problem and analyse
the efect of variation based on the following three variables: domain contextualisation, ontology
structure and ontology popularity on the task.</p>
      <p>
        Following a similar investigation of term typing in Giglou et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], we model the task as a retrieval
problem and prompt OpenAI’s GPT-4o [27] model in a zero-shot setting with a modified task of
generating a ranked list of concepts of length up to the depth [28] of the ontology. The choice of
zero-shot prompting forces the model to utilise only its own knowledge of the concepts and individuals
that it is prompted with (using only their labels) in order to make the correct assertions. This choice
highlights the relevancy of an LLM’s world knowledge to a basic ontology learning task. Modelled
as a retrieval task with a retrieval length greater than one provides insight into the LLM’s ability to
accurately identify and infer concept hierarchies on its own when given only the concept labels.
      </p>
      <p>We compute the standard information retrieval (IR) metrics: R-Precision, Mean Average Precision
(mAP) and Normalized Discounted Cumulative Gain (nDCG) with mAP and nDCG computed at depth
 of the ontology (i.e. mAP@k and nDCG@k). R-Precision and mAP provide insight into the general
ability of the model to identify relevant concept assertions and transitive ancestor relations with the
latter laying greater emphasis on the order of retrieval based on the hierarchy. nDCG lays greater
emphasis on retrieving the relevant concepts and parents in the correct hierarchy and acts as an indicator
of the understanding LLMs have of concept hierarchies from just their labels. We define the relevance
of an ancestor concept  for an individual  according to Equation 1 where  is the directly asserted
concept of the individual  and (· , · ) is the edge distance between  and  . For all non-ancestor
concepts, we set the relevance to 0.</p>
      <p>Relevance(, ) =</p>
      <p>1
1 + (,  )
(1)</p>
      <p>We conduct experiments over two ontologies of varying size, complexity and popularity. The Wines
Ontology [24] is a well-known ontology that is relatively small and structurally-simple, and enjoys
significant popularity by itself and in terms of the domain it represents. The CASE Ontology [ 29] is
a larger, more complex and newer ontology focused on accurately capturing the lifecycle of digital
evidence. We inject individuals into the CASE Ontology using the Owl Traficking example provided
on the CASE Ontology website1 and also include concepts from the closely related UCO Ontology
[29]. Hereafter, we refer to this composite constructed ontology as the CASE Ontology itself. Table 1
highlights the relevant structural metrics of the two ontologies to highlight their diferences.</p>
      <p>Metric
Classes (#)
Individuals (#)
Depth (#) [28]
Breadth (#) [28]
Dispersion (#) (max.) [28]</p>
      <p>We design prompts for the both ontologies based on the four types of domain contextualisation.
Domain contextualisation is achieved by specifying the role of the LLM as an expert in a domain
following one of the four types outlined in Section 3. We also specify the task of generating a ranked
list of the most relevant concepts of length equal to the depth of each ontology. For each type of prompt,
we provide a flat list of all the concept labels from the ontology from which the LLM is to generate its
responses2.
4.1. Results and Discussion
1https://caseontology.org/examples/
2The exact prompts used can be found at GitHub.
Generic
Ontology
Topic</p>
      <p>Ontology and Topic
IR Metrics and Pearson Correlation ( ) between retrievability and dispersion [28] for Wines Ontology and CASE
Ontology. Correlation values in italics are significant at a p-value of 0.05. Values in bold indicate the maximum
values for that particular metric.</p>
      <p>We observe that a general ontology contextualisation of an LLM results in better performance
than contextualising LLM as a topic expert. Optimal prompt engineering of this contextualisation
may improve performance but simple prompts highlight that a topic-based domain contextualisation
is not improving the results of term typing (at least for GPT 4.o as the chosen LLM). The generic
prompt performs the best on the smaller and more popular Wines Ontology. The simpler structure and
popularity of the ontology coupled with the best performance suggests that popularity can dominate
other variables of interest thus marginalising the need for carefully considered domain contextualisation
in prompt engineering.</p>
      <p>We measure the Pearson correlation between the retrievability of a concept and its dispersion [28]
to analyse the efect of ontology structure on performance. Dispersion is a measure of ontological
structural complexity defined for a concept as the number of child concepts it has. We define the
retrievability of a concept as the number of times it is predicted as a relevant concept by the LLM, across
all queries. Our observations suggest that domain contextualisation influences structural considerations
during retrieval. Column  in Table 2 outlines the correlation between dispersion and retrievability.
The domain contextualisations involving ontologies show moderate correlation between dispersion and
retrievability. Concepts with high dispersion have several child concepts and represent conceptually
‘broader’ formalisations. Such concepts are possibly semantically wider in their scope and thus could
be easier for LLMs to retrieve using their own knowledge when contextualised to consider hierarchies.
Similar correlation between all contextualisations for the Wines ontology (max. dispersion: 3) indicates
that in simpler ontologies, where dispersion is not well-pronounced, performance is not afected. Future
studies with more ontologies of varying degrees of dispersion would help corroborate this better. We
do not observe statistically significant correlation between the depth of a concept and its retrievability.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We present a preliminary study on the ability of LLMs to perform term typing in a zero-shot setting.
An LLM is tasked with retrieving a ranked list of the most relevant concepts given an individual by
leveraging its own knowledge about the entities from their labels. We summarize the findings of our
study as follows:
• LLMs have reasonable capability in performing term typing using their own world knowledge
but this does not help them identify concept hierarchies, particularly in less popular domains.
• Popularity of domains seems to play an important role in a zero-shot setting and can override
other variables of interest.
• For less popular domains, domain contextualisation can improve performance. Considering
structural experts over topic experts may yield better performance.
• Concepts with greater dispersion may be semantically broader and can therefore be easier for an</p>
      <p>LLM to retrieve.</p>
      <p>Future works will focus on investigating the task of term typing over diferent prompting strategies
and conducting an exhaustive analysis of all relevant factors. Extension of the work to other ontology
tasks will lead to the creation of a comprehensive and robust analysis system that can be utilised to
ensure optimal performance for any ontology development task using LLMs.
Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational
Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. doi:10.18653/v1/N19-1423.
[17] L. L. Snijder, Q. T. S. Smit, M. H. T. de Boer, Advancing Ontology Alignment in the Labor Market:
Combining Large Language Models with Domain Knowledge, Proceedings of the AAAI Symposium
Series 3 (2024) 253–262. doi:10.1609/aaaiss.v3i1.31208.
[18] V. K. Kommineni, B. König-Ries, S. Samuel, From human experts to machines: An LLM supported
approach to ontology and knowledge graph construction, 2024. doi:10.48550/arXiv.2403.
08345. arXiv:2403.08345.
[19] D. Doumanas, A. Soularidis, K. Kotis, G. Vouros, Integrating LLMs in the Engineering of a SAR
Ontology, in: I. Maglogiannis, L. Iliadis, J. Macintyre, M. Avlonitis, A. Papaleonidas (Eds.), Artificial
Intelligence Applications and Innovations, Springer Nature Switzerland, Cham, 2024, pp. 360–374.
doi:10.1007/978-3-031-63223-5_27.
[20] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski,
S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J. C. Matese,
J. E. Richardson, M. Ringwald, G. M. Rubin, G. Sherlock, Gene Ontology: Tool for the unification
of biology, Nature Genetics 25 (2000) 25–29. doi:10.1038/75556.
[21] C. J. Mungall, C. Torniai, G. V. Gkoutos, S. E. Lewis, M. A. Haendel, Uberon, an integrative
multispecies anatomy ontology, Genome Biology 13 (2012) R5. doi:10.1186/gb-2012-13-1-r5.
[22] H. Babaei Giglou, J. D’Souza, S. Auer, LLMs4OL: Large Language Models for Ontology Learning,
in: T. R. Payne, V. Presutti, G. Qi, M. Poveda-Villalón, G. Stoilos, L. Hollink, Z. Kaoudi, G. Cheng,
J. Li (Eds.), The Semantic Web – ISWC 2023, Springer Nature Switzerland, Cham, 2023, pp. 408–427.
doi:10.1007/978-3-031-47240-4_22.
[23] H. T. Mai, C. X. Chu, H. Paulheim, Do LLMs Really Adapt to Domains? An Ontology Learning</p>
      <p>Perspective, 2024. doi:10.48550/arXiv.2407.19998. arXiv:2407.19998.
[24] H. P. P. Filho, Ontology Development 101: A Guide to Creating Your First Ontology (????).
[25] R. Wilson, J. Goonetillake, W. Indika, A. Ginige, A conceptual model for ontology quality
assessment: A systematic review, Semantic Web 14 (2023) 1051–1097. doi:10.3233/SW-233393.
[26] N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramèr, C. Zhang, Quantifying Memorization Across</p>
      <p>Neural Language Models, ArXiv (2022).
[27] Hello GPT-4o, https://openai.com/index/hello-gpt-4o/, ????
[28] A. Gangemi, C. Catenacci, M. Ciaramita, J. Lehmann, Ontology evaluation and validation An
integrated formal model for the quality diagnostic task, 2005.
[29] E. Casey, S. Barnum, R. Grifith, J. Snyder, H. Van Beek, A. Nelson, Advancing coordinated
cyberinvestigations and tool interoperability using a community developed specification language,
Digital Investigation 22 (2017) 14–45. doi:10.1016/j.diin.2017.08.002.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakhtin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Language Models as Knowledge Bases?</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>2463</fpage>
          -
          <lpage>2473</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1250.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <article-title>How Much Knowledge Can You Pack Into the Parameters of a Language Model?</article-title>
          , in: B.
          <string-name>
            <surname>Webber</surname>
            , T. Cohn,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>5418</fpage>
          -
          <lpage>5426</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>437</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , P. Liu,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <source>A Survey of Large Language Models</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>18223</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Perl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geller</surname>
          </string-name>
          ,
          <article-title>Concept placement using BERT trained by transforming and summarizing biomedical ontology structure</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>112</volume>
          (
          <year>2020</year>
          )
          <article-title>103607</article-title>
          . doi:
          <volume>10</volume>
          .1016/ j.jbi.
          <year>2020</year>
          .
          <volume>103607</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hosemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <article-title>Towards Ontology Construction with Language Models</article-title>
          ,
          <source>in: KBC-LM @ ISWC</source>
          <year>2023</year>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2309.09898.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mateiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Groza</surname>
          </string-name>
          ,
          <source>Ontology engineering with Large Language Models</source>
          ,
          <source>2023 25th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)</source>
          (
          <year>2023</year>
          )
          <fpage>226</fpage>
          -
          <lpage>229</lpage>
          . doi:
          <volume>10</volume>
          .1109/SYNASC61333.
          <year>2023</year>
          .
          <volume>00038</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>Exploring large language models for ontology alignment</article-title>
          , in: I.
          <string-name>
            <surname>Fundulaki</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Kozaki</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Garijo</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Gómez-Pérez</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the ISWC 2023 Posters</source>
          ,
          <article-title>Demos and Industry Tracks: From Novel Ideas to Industrial Practice Co-Located with 22nd International Semantic Web Conference (ISWC</article-title>
          <year>2023</year>
          ), Athens, Greece, November 6-
          <issue>10</issue>
          ,
          <year>2023</year>
          , volume
          <volume>3632</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Language</given-names>
            <surname>Model</surname>
          </string-name>
          <article-title>Based Framework for New Concept Placement in Ontologies</article-title>
          , in: A.
          <string-name>
            <surname>Meroño Peñuela</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Troncy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Acosta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Alam</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , P. Lisena (Eds.),
          <source>The Semantic Web</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>99</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -60626-
          <issue>7</issue>
          _
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Antonyrajah</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>BERTMap: A BERT-Based Ontology Alignment System</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>36</volume>
          (
          <year>2022</year>
          )
          <fpage>5684</fpage>
          -
          <lpage>5691</lpage>
          . doi:
          <volume>10</volume>
          .1609/aaai.v36i5.
          <fpage>20510</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>Contextual semantic embeddings for ontology subsumption prediction</article-title>
          ,
          <source>World Wide Web</source>
          <volume>26</volume>
          (
          <year>2023</year>
          )
          <fpage>2569</fpage>
          -
          <lpage>2591</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s11280-023-01169-9.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>Language Model Analysis for Ontology Subsumption Inference</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>3439</fpage>
          -
          <lpage>3453</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .findings-acl.
          <volume>213</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>H. B. Giglou</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Engel</surname>
          </string-name>
          , S. Auer,
          <source>LLMs4OM: Matching Ontologies with Large Language Models</source>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2404</volume>
          .
          <fpage>10317</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bombieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fiorini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Ponzetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rospocher</surname>
          </string-name>
          , Do LLMs Dream of Ontologies?,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2401.14931. arXiv:
          <volume>2401</volume>
          .
          <fpage>14931</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <surname>Can Large Language Models Understand DL-Lite</surname>
            <given-names>Ontologies</given-names>
          </string-name>
          ? An Empirical Study,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2406.17532. arXiv:
          <volume>2406</volume>
          .
          <fpage>17532</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>R. M. Bakker</surname>
            ,
            <given-names>D. L. D.</given-names>
          </string-name>
          <string-name>
            <surname>Scala</surname>
          </string-name>
          ,
          <article-title>From Text to Knowledge Graph: Comparing Relation Extraction Methods in a Practical Context (????).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>