<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Ontology Construction with Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maurice Funk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simon Hosemann</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean Christoph Jung</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carsten Lutz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Scalable Data Analytics and Artificial Intelligence</institution>
          ,
          <addr-line>ScaDS.AI</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Leipzig University</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Embedded Composite Artificial Intelligence</institution>
          ,
          <addr-line>SECAI</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>TU Dortmund University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present a method for automatically constructing a concept hierarchy for a given domain by querying a large language model. We apply this method to various domains using OpenAI's GPT 3.5. Our experiments indicate that LLMs can be of considerable help for constructing concept hierarchies. Ontologies are formal representations of the concepts in a domain and their relations and thus represent highly structured knowledge. However, their manual construction and curation is a dificult engineering task that is both time consuming and costly. This has led to the proposal of various approaches to (semi-)automatic ontology construction, see e.g. the surveys [1, 2]. A particular challenge is that expertise on ontology engineering and domain knowledge are typically not in the same hands. This has been addressed by the design of algorithms that systematically ask questions to a domain expert and construct the ontology based on the answers given. Notable examples include exact learning of ontologies in the style of Angluin [3] and the use of algorithms from formal concept analysis [4, 5]. While such approaches look good on paper, we are not aware that they have been applied in practice. An obvious problem is that the domain expert is forced into a monotonous practice of answering uninteresting questions without knowing their exact purpose. Moreover, the expert still needs to invest considerable time. One may argue, however, that with the advent of large language models (LLMs) trained on huge corpora such as OpenAI's GPT [6, 7, 8, 9], we have available 'experts' on many domains that do not easily become tired of answering questions and that are rather afordable. In fact, LLMs latently contain a significant body of knowledge and starting from [10], there has been a quickly growing literature on exploiting this fact: LLMs have been used directly as knowledge bases, for general question answering, and to complete knowledge graphs such as Wikidata. To the best of our knowledge, however, none of the existing studies considers ontology construction.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The aim of this paper is to take a first step towards the (semi-)automatic construction of
ontologies based on LLMs. Our approach is not based on existing methodologies such as exact
learning or formal concept analysis, but specifically tailored towards LLMs. One main reason is
that existing methods assume that the schema of the ontology (the set of concept and property
names to be used) is chosen in advance and then provided as an input to the methodology.
This, however, does not appear to be a good choice when working with LLMs, for at least two
reasons. First, designing a schema for an entire domain is a non-trivial task itself that requires a
domain expert and involves many design decisions. In fact, designing a schema and a concept
hierarchy are closely entangled. And second, a main strength of LLMs is to generate keywords
and phrases in a context provided by a user and thus they are a perfect tool for proposing
concept and property names for a given domain. It seems very natural to take advantage of this
powerful feature.</p>
      <p>The more expressive the ontology language, the more design decisions have to be taken
during ontology construction. This leads us to start, in this initial paper, with a very simple
‘backbone’ of ontological representation: we only aim to construct a concept hierarchy for a
given domain, that is, we only consider the subconcept/is-a relation, but no other relations. Our
algorithm takes a seed concept 0 (e.g., Animals), that determines the domain in the sense that
all concepts in the generated hierarchy will be subconcepts of 0 (e.g., Mammals, Fish, Lion, . . . ).
We then ‘crawl’ the hierarchy by repeatedly asking the LLM to provide relevant subconcepts
of concepts that are already in the hierarchy and use an established traversal algorithm to
place the new concepts—note that each concept may have more than one superconcept and the
ultimately constructed hierarchy does not take the form of a tree, but that of a directed acyclic
graph (DAG). We also implement a mechanism for verifying the output of the LLM by posing
additional queries to the LLM. Further, we ask the LLM to provide a textual description of each
concept that we make available for inspection.</p>
      <p>To test the feasibility of our method, we apply it to various domains such as Animals, Drinks,
Music, and Plants. As the LLM, we use GPT 3.5. A metric evaluation of the precision and
recall of the constructed ontologies is dificult because there is no ground truth. For the time
being, we thus confine ourselves to a purely subjective evaluation based on manual inspection
of the constructed ontologies. We believe that they are quite reasonable and demonstrate
the utility of LLMs for constructing ontologies. Hallucinations and errors occur, but they can
be significantly reduced by verification and careful prompt engineering. Incompleteness also
occurs, but seems to be outweighed by the fact that our approach is able to suggest a wealth of
classes relevant to a domain as well as their interrelationship in terms of the is-a relation. We
make our ontologies publicly available (without any manual post-processing) and the reader is
invited to take a look.</p>
      <p>In the form presented here, our method is fully automatic. We do not claim, though, that
a fully automatic approach is the solution to ontology construction in practice. Quite to the
contrary, it seems natural and useful to also include interactions with a human domain expert
to guide the construction process. We believe that our method can easily be extended in
this direction. We also believe that our experiments indicate that involving LLMs in ontology
construction can bring about significant benefits compared to a purely manual approach, in a
similar way in which using ChatGPT can bring significant benefits for writing text. In particular,
the LLM can propose relevant concept names with ease and also make useful suggestions
Input: Seed concept 0
 = concept hierarchy that only contains 0
while there is an unexplored concept  in  do
ask LLM whether  has subconcepts
if yes then
ask LLM to provide list  of subconcepts of 
ask LLM to provide descriptions of the concepts in 
forall  ∈  do
ask LLM to verify that  is a subconcept of 
if successful then</p>
      <p>insert  into 
return</p>
      <p>Algorithm 1: Concept Hierarchy Construction
(“existence”)</p>
      <p>(“listing”)
(“description”)
(“verification”)
(“insertion”)
regarding their position in the hierarchy.</p>
      <p>
        Related Work. When the strength of LLMs increased, it became evident that they (latently)
store a massive amount of knowledge which suggests their use as a knowledge source in
applications such as knowledge graph completion, ontology completion, and open domain
question answering. Starting from [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], there is a quickly developing line of work that explores
the use of LLMs for open domain question answering, typically using ‘fill-in-the-blank’ cloze
statements, often in the form of ‘subject-relation-object’ triples with a blank [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14">11, 12, 13, 14, 15</xref>
        ].
In the same spirit but closer to our paper is [16] which uses a crawling approach to extract a
knowledge graph from an LLM using the same kind of statements. In contrast to our work,
however, there is no special focus on concept hierarchies. There seems to be only little work
on using LLMs for completing knowledge graphs or ontologies [17]. Notable exceptions are
[18] and [19] which use fine-tuned BERT models for subsumption prediction with the aim of
completing ontologies. This is similar to the insertion of newly discovered concepts into the
hierarchy in our approach, but it lacks the concept discovery / crawling aspect of our work.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. The Algorithm</title>
      <p>A concept hierarchy is a preordered set, that is, a pair (, ⊑) with  a set and ⊑ a reflexive
and transitive relation on . The relation ‘⊑’ is also called the is-a relation or the subsumption
relation. If  ⊑ , we call  a subconcept of  and  a superconcept of . Note that we do
not demand antisymmetry, and thus for distinct ,  ∈  it is possible that  ⊑  ⊑ . We
then call  and  synonyms. One may equip concept hierarchies with a set-theoretic semantics
as used for example in description logics and in OWL, but this is not necessary for the purposes
of the current paper. It often makes sense to think of the subsumption relation in terms of its
transitive reduction (also called the Hasse diagram), which is a directed acyclic graph (DAG).</p>
      <p>The general strategy that we use to construct concept hierarchies from LLMs is displayed as
Algorithm 1. The algorithm takes as input a seed concept 0 that determines the domain for
which we want to construct a concept hierarchy. For example, one might use here Animals,
Activities, Artists, Music, or even Things. The algorithm then explores every concept  that was
placed in the concept hierarchy, starting with 0, by identifying subconcepts and inserting
them into the hierarchy. We also ask the LLM to provide a textual description of each concept
and use a verification step to filter out erroneous answers. All this is described in detail below.
Note that we do not (additionally) traverse the concept hierarchy upwards by asking also for
superconcepts when exploring a concept  from . Doing so bears the risk of leaving the
domain, though one might invent measures to prevent this. The algorithm may terminate
naturally if at some point no more concepts are proposed by the LLM, but there are no
guarantees.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Existence / Listing / Description / Verification</title>
      <p>We describe our implementation of existence, listing, description, and verification. Insertion
is discussed in Section 4. To give a first impression, here are the central phrases used in the
prompts for existence, listing and description:
• Subconcept existence: “Are there any generally accepted subcategories of ? Answer only
with yes or no.”
• Subconcept listing: “List all of the most important subcategories of . Skip explanations
and use a comma-separated format like this: important subcategory, another important
subcategory, another important subcategory, etc.”
• Concept description: “Give a brief description of every term on the list, considered as a
subcategory of , without the use of examples, in the following form: List element 1: brief
description for list element 1. List element 2: brief description for list element 2. . . . .”
Of course, there are many natural variations of these phrases. In particular, there are obvious
alternatives for the word ‘subcategory’ such as subconcept, subclass, type, and so on. Changing
the phrase has an impact on the results (as almost every reformulation of a prompt) and based
on sampling various examples we decided that subcategory gave the most convincing results.</p>
      <p>To increase the completeness of the constructed hierarchies, our approach to concept listing
is actually more intricate than just using the prompt given above. Ideally, we would like to
consider all (or at least a large number of) answers to that prompt and then include the ones
with the highest probabilities, up to a certain threshold. While LLMs in principle provide
this information, it is not accessible via the GPT API that we use in our implementation. We
therefore resort to a frequency analysis, meaning that we pose the above prompt to GPT many
times and then take all answers that are returned with a certain minimum frequency. As this is
potentially quite costly, we implement it in a slightly diferent way. We set the max _tokens
parameter to 1, meaning that we only ask for the first token of an answer to the above prompt
to be returned.1 We then pose the prompt many times (we choose 100) and take all tokens that
are returned with a certain minimum frequency (we choose the frequency threshold between
5 and 20, out of 100). For each of the tokens  that surpasses the threshold, we once more
ask the subconcept listing prompt from above, extended with the sentence “Start your answer
1Note that GPT cost depends on the number of tokens in the input and output.
with “” ”. The list of subconcepts is then taken to be the union of the lists returned to these
prompts. More information, especially on how to set other parameters (which is crucial) is
given in Section 5.</p>
      <p>Let us discuss the role of the textual descriptions that we request from the LLM. On the one
hand, we provide these as additional context in further prompts, as described below. On the
other hand, the descriptions can also be very useful for a human user to interpret the concepts
proposed by the LLM. With the seed concept Drinks, for example, GPT identified (among many
others) the concepts chocolate porters and chocolaty porters. While these concepts may look
like synonyms, the descriptions produced by GPT reveal that chocolate porters are porters to
which some form of chocolate or cocoa has been added during the brewing process while this
is not true for chocolaty porters which only exhibit an aroma that is reminiscent of chocolate.
A user may then decide whether this distinction is really needed and whether both classes are
relevant and should be kept.2</p>
      <p>When using basic forms of prompting for subconcept existence and listing, a number of
issues arise. In the following, we try to categorize the most important types of errors:
• Sloppiness / Domain Switches.</p>
      <p>The generated concept names are too abbreviated. While such a short name makes sense
in the context of the concept for which it was returned as a subconcept, it does not
contain enough information to stand by itself. When retrieving subconcepts based on
short names, this often results in a departure from the domain set by the seed concept.
Examples include Tree ⊒ Apple ⊒ IPad, Reusable Bottle ⊒ Glass ⊒ Tempered Glass,
and Tree ⊒ Olive ⊒ Stuffed Olives.</p>
      <p>In rare cases, there are also domain switches that are unrelated to sloppiness. An example
is Drink ⊒ Water ⊒ River.
• Attribute Inflation.</p>
      <p>Attributes are added to generate subconcepts, over and over again. This leads to concepts
that, although not outright wrong and sometimes amusing, are irrelevant. Examples
include Underwater Resource Management Games and Customer-driven
Scalabilityfocused Profit-driven Action-oriented Closing Keynote Speeches.
• Hallucination.</p>
      <p>The term hallucination is commonly used to refer to the tendency of LLMs to invent
facts [20]. Here, it occurs in the specific form of irrelevant concepts, mostly by
attribute inflation, as well as erroneous subconcept relations. Examples for the latter
include Non-flowering Plant ⊒ Fungi, Moon ⊒ Solar Eclipse, and Propositional Logic ⊒
Normal Forms.
• Wrong Relation.</p>
      <p>Sometimes the subconcept/subcategory relation is confused with other relations, in
particular with the ‘specific instance of’ and ‘part of’ relations. Examples for the former
include Yvy League University ⊒ Yale University and Word Game ⊒ Scrabble.
Examples for the latter are Feet ⊒ Toes and Legs ⊒ Knees. This may be viewed as a specific
form of hallucination.</p>
      <p>2All our examples are “real” in the sense that they occurred during interactions with GPT. They are, however,
not necessarily part of the concept hierarchies that we provide along with this paper as they might have been
encountered when running earlier versions of our algorithm.</p>
      <p>In the list above, the error types are given roughly in decreasing order of frequency with which
they occurred. In fact, sloppiness and resulting domain switches had a drastic negative efect on
the quality of the constructed hierarchies in early versions of our algorithm. After addressing
them, attribute inflation and hallucination were the most common error types. For some
domains such as Bodypart, ‘part of’ occurred very often as a wrong relation.</p>
      <p>The central phrases for existence and listing given at the beginning of this section have
already been designed to address some error types. In particular, the expressions “generally
accepted subcategories” and “most important subcategories” address attribute inflation. This,
however, is not suficient. To address errors more properly, we use two measures: (i) further
improve the prompts for subconcept existence and listing and (ii) concept verification. We start
with describing the former.</p>
      <p>To address sloppiness and domain switches when asking for existence and listing the
subconcepts of some concept , we add to the prompts the seed concept 0 and the superconcept
 of  from which  was first discovered. This provides additional context and can be seen as
an instance of few-shot learning. For example, the exact prompt for existence is:
“ is a subcategory of 0.  is a subcategory of . Are there any generally accepted
subcategories of ? Answer only with yes or no.”
We have also experimented with adding information about the entire ancestry of , that is, a
complete path from 0 to  in the hierarchy. It seemed, however, that this increased attribute
inflation without much benefit on sloppiness and domain switches. To further reduce domain
switches and to improve the quality of existence and listing, we add to each prompt the textual
concept description of every concept that occurs in it.</p>
      <p>We next describe the verification step which is intended to address attribute inflation,
instances as concepts, and domain switches. Suppose we want to verify that  is a subconcept
of . Verification consists of four steps:</p>
      <sec id="sec-3-1">
        <title>1. Check that  is not an instance.</title>
        <p>We use the prompt “Is  a specific instance or a subcategory of the category 0? Answer
only with Instance or Subcategory.”
2. Check that  is not a mereological part.</p>
        <p>We use the prompt “Is  a part or a subcategory of the category 0? Answer only with
Part or Subcategory.”
3. Check that  is a subcategory of the seed concept 0.</p>
        <p>We use the prompt “Can  be considered a subcategory of 0? Answer only with yes or
no.”
4. Check that  is a subcategory of .</p>
        <p>We use the prompt “ is a subcategory of 0. Is  typically understood as a subcategory
of ? Answer only with yes or no.”
Again, we add to each of the prompts the descriptions of all concepts that occur in them. The
query in Point 1 turns out to be very efective in dealing with instances as concepts and the
query in Point 2 is also efective, albeit less than the first one. If any of the queries in Points 3
and 4 returns “no”, it may be the case that the concept name is too abbreviated and we make
an attempt to find a better name for the concept. This is done using the prompt
“ is a subcategory of 0. The following description outlines the characteristics of a
subcategory of . Provide a concise and unambiguous name for it. Provide only the
name without any explanation.”
followed by the description of . The LLM may then return a better name for the concept that
passes the verification step. Otherwise we drop .</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Insertion</title>
      <p>When we retrieve a new concept  in the listing step, then we already know one of its
superconcepts. To properly insert  into the concept hierarchy constructed so far, however, we
must know all its super- and subconcepts among the existing concepts. We identify those by
additional queries to the LLM. In principle, this can be done in a brute-force way by asking, for
every existing concept , whether  ⊑  and whether  ⊑ . However, this is not practical
as it easily leads to a huge number of queries to the LLM—note that queries to GPT 3.5 via the
OpenAI API are slow and, when asked in large quantities, also expensive in a monetary sense.3</p>
      <p>This parallels the situation of classifying a given ontology when only a computationally
expensive reasoner for deciding single subsumption tests is available. A fundamental algorithm
for this task that aims to minimize the number of subsumption tests has been proposed in
[21], often called the KRIS algorithm; see also [22] for improved versions. The setup in [21]
assumes that all concepts ever to be inserted into the hierarchy are known in advance, but the
algorithm also works in our case where concept discovery and insertion alternate. We use the
original KRIS algorithm, called the enhanced traversal method in [21], but parallelize some
subsumption tests (that is: queries to the LLM) for improved performance.</p>
      <p>The basic idea of the KRIS algorithm is to use, for inserting a new concept , a top search
phase to identify all superconcepts of  and a bottom search phase to identify all subconcepts
of . Both phases crucially exploit the transitivity of the subsumption relation. The top search
phase proceeds top down, meaning that it starts at the most general concepts  to check
whether  ⊑  and then proceeds towards more specific . The rationale is that once a
subsumption test  ⊑  fails, we do not need to test whether  ⊑ ′ for any ′ with
′ ⊑ . The bottom search phase is symmetric, proceeding bottom up. There are some
additional optimizations that we do not describe here in full detail, see [21]. The prompt that
we use for testing whether  ⊑  is the same as for Query 4 in concept verification, again
providing all relevant concept descriptions.</p>
      <p>However, inserting concepts this way can introduce errors into the hierarchy. We discuss
the two most important issues. First, querying GPT 3.5 for subcategories does not result in a
transitive subsumption relation. This is quite interesting as one might argue that this relation,
based on a language model, is indeed not transitive. For example, GPT 3.5 provided us with the
following relations:</p>
      <p>Commercial Building ⊒ Healthcare Facilities ⊒ Hospitals</p>
      <p>Commercial Building ̸⊒ Hospitals.</p>
      <p>3The cost and speed depend on the size of the prompt and on the size of the answer. In our experiments, the
average cost per request was $0.0002 and each request took at least 0.3s, with requests that generate long answers
taking several seconds.</p>
      <p>One explanation is that the subsumption between healthcare facilities and commercial buildings
is plain wrong. A more interesting explanation is that GPT is US-centric and from a US
perspective, this subsumption is actually reasonable; at the same time, it is reasonable to say
that hospitals are healthcare facilities and also that hospitals are not (primarily) commercial
buildings. The point here seems to be that we are dealing with a language model and language
is vague and underspecified. Another example is Hot Beverages ⊒ Coffee ⊒ IcedCoffee.</p>
      <p>Obviously, accepting non-transitivity of the subsumption relation leads the entire idea of a
concept hierarchy ad absurdum: in which sense would it still be a hierarchy? We thus deal with
this issue in a pragmatic way, essentially imposing that the subsumption relation is transitive.
When discovering a concept  as a subconcept of a concept , we take it for granted that
 ⊑  for all concepts  with  ⊑ , without verifying this using the LLM. We also do not
depart from using the KRIS algorithm, which assumes the subsumption relation to be transitive.
If the answers given by the LLM ‘are not transitive’, then this may result in missing sub- and
superconcept relations. It may, in theory, also lead to cycles in the subsumption relation without
all concepts on the cycle being synonyms. In our experiments, however, these efects seemed
to show up only rarely (for the cycles: not at all).</p>
      <p>The second important issue is related to the treatment of synonyms. When inserting a
concept , it may happen that  is classified both as a subconcept and as a superconcept of
an existing concept . The KRIS algorithm then simply classifies  and  as synonyms. In
our algorithm, synonym detection is rather important because the LLM may produce many
small variations of the same concept name, such as singular vs. plural and writing in one vs.
two words (“board game” vs. “boardgames”), especially when rediscovering the same concept
multiple times as a subconcept of diferent superconcepts. But it also often occurs that the
answers of the LLM wrongly identify two concepts as synonyms. When we find two concepts
1 and 2 as candidates to be synonyms, we thus have to analyze the situation further. We
use the following prompt:
“In the context of 0, are 1 and 2 typically used interchangeably? Answer only
with yes or no.”
If the answer is “yes”, we accept that 1 and 2 are synonyms. If the answer is “no”, we believe
that one of 1 ⊑ 2 and 2 ⊑ 1 was hallucinated and ask:
“Consider the terms 1 and 2. Which of the terms is a subcategory of the other
one? Answer in the following scheme: [[X]] is a subcategory of [[Y]].”
We then use the answer to resolve the situation. We catch a significant amount of hallucinated
synonyms in this way, but not all. We also mention again at this point that the concept
descriptions provided by the LLM are very helpful for understanding whether two concepts are
synonyms or not; recall the example of chocolate porters and chocolaty porters.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We have implemented our algorithm in Python based on GPT 3.5 turbo. We do not use the
familiar chat interface to ask our queries as a continuous conversation, but instead pose
each query independently via the chat completion endpoint (API) V1. Whenever possible we
parallelize calls to the API to improve performance.</p>
      <p>We have used our approach to construct concept hierarchies for the following seed concepts:</p>
      <p>Activities, Animals, Buildings, Diseases, Drinks, Fuels, Goats, Music, Plants
Under https://www.informatik.uni-leipzig.de/kr/onto-llm/, we provide visual representations
of the hierarchies as vector graphics for manual inspection. We also provide them as OWL
ontologies in the RDF/XML format for use in ontology editors and ofer a web interface for
browsing. The OWL ontologies also contain the textual descriptions of the concepts provided
by GPT 3.5.</p>
      <p>The constructed hierarchies are not perfect, but we believe that they are quite reasonable
and demonstrate the utility of LLMs for ontology construction. While hallucinations and errors
still occur, verification and prompt engineering have reduced them considerably. Most of the
concept names in the hierarchy are meaningful and belong to the domain. Also the structure
of the hierarchies seems to make sense. As a concrete example, Figure 1 shows an excerpt of
the hierarchy for the seed concept Goats. It is interesting that diferent ways to categorize
goats play a role: by use (dairy, meat, fiber), by breed (Nigerian Dwarf, Saanen, Boer), and by
other aspects (miniature, show). As an example for an error, note that our approach has failed
to identify Nigerian Dwarf and Dwarf Nigerian as synonyms. The hierarchy is also incomplete.
While the concept Miniature Nubian was discovered and correctly placed under Miniature
Goats, the (arguably more important) concept Nubian, a milk goat, was never discovered.</p>
      <p>
        We next discuss some important parameter settings. The temperature parameter determines
the confidence that the LLM has into its most likely predictions when choosing the next
token of an answer. It takes values from the interval [
        <xref ref-type="bibr" rid="ref2">0, 2</xref>
        ], the default being 1. A value of
2 means that the probability distribution will be very ‘flat’ in the sense that many tokens
get similar probabilities. At the other extreme, a value of 0 means that the most likely token
will always get chosen, resulting in almost deterministic behavior. The top_p parameter is
from the interval [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], the default being 1, and it controls (on the level of tokens) which
answers are considered at all [23]. If set, for example, to 0.5, then roughly speaking only
the set of most probable tokens is considered in which the probabilities sum up to 0.5. The
interplay of temperature and top_p provides a powerful way to control GPT. In our algorithm,
we additionally have at our disposal the frequency threshold (see Section 3).
      </p>
      <p>We generally set top_p to 0.99. Note that while this sounds generous, it actually reduces the
number of admitted tokens from a hundred thousand down to (mostly) less than a hundred,
often less than 10. As the temperature, we choose 0 in all prompts except in the sampling phase
of concept listing, where we set it to 2. The rationale is that for sampling we want GPT to
produce as many (reasonable, whence the top_p value) answers as possible for concept listing
while we want to take out randomness as much as possible for all other prompts. Also the
frequency threshold is an interesting parameter. In our experiments, we observed a diference
between domains that have a highly structured and commonly accepted conceptualization
such as animals and domains in which conceptualizations are less structured and more fluent,
such as activities. We refer to these as strongly structured and weakly structured domains,
respectively. As a rule of thumb, lower values for the frequency threshold such as 5 seem
Dairy Goats</p>
      <p>Show Goats</p>
      <p>Mini. Goats</p>
      <p>Meat Goats</p>
      <p>Fiber Goats
Saanen</p>
      <p>Nigerian Dwarf</p>
      <p>Mini. Nubian</p>
      <p>Cashmere
Toggenburg</p>
      <p>Dwarf Nigerian</p>
      <p>Boer</p>
      <p>Nigora
to work better for strongly structured domains while higher values such as 20 seem more
appropriate for weakly structured domains. This can be seen as a trade-of between increased
soundness (achieved by higher values of the parameter) and increased completeness (achieved
by lower values). In the goats ontology, for instance, with the frequency threshold of 20 that we
use, no subconcepts are discovered below Show Goats. With threshold 10, subconcepts such
as Nigerian Dwarf Show Goats and Toggenburg Show Goats appear. With threshold 5, even
those have subconcepts such as Show Quality Nigerian Dwarf Goats and the hallucinated Coat
color/pattern. We provide the Goats ontology with all three thresholds (our favorite choice
being 20) so that the reader can get a sense for the efect of this parameter.</p>
      <p>Regarding termination, we choose an individual exploration depth for each domain; with the
depth of a concept , we mean the length of the shortest path from the seed concept to  in
the transitive reduction of the subsumption relation. Exploration depth  means that existence
and listing is only applied to concepts of depth smaller than . The cutof serves two purposes.
On the one hand, some of the domains have very large concept hierarchies and we want to
avoid excessive size of the provided ontologies. On the other hand, with increasing depth
(and thus increasing specificity of the concepts) there is a tendency towards esoteric concepts.
By this we mean a concept that makes sense in principle, but has too few instances and is
too far from usual concerns to be included in the hierarchy (this often happens via attribute
inflation). This seems to occur already at lower depths for weakly structured domains, but it
happens also for strongly structured domains when the depth increases. We could not identify
a verification/prompting strategy that stops at esoteric concepts without also removing many
non-esoteric ones. However, it is of course easily possible to manually remove esoteric concepts
after the automatic extraction.</p>
      <p>Table 1 provides statistics for the constructed hierarchies. Column co lists the chosen
exploration depth with ‘none’ meaning that we run the algorithm until no more concepts were
found. ft is the frequency threshold that we have chosen,  is the total number of concepts,
 is the number of concepts that were discovered but dismissed by verification, ⊑ is the
total number of direct subsumptions and ′⊑ is the number of direct subsumptions that were
discovered in the insertion phase. / denotes the average number of prompts per concept
and cost denotes the overall cost of all API calls made for constructing the hierarchy. Under
≤ co we list the number of concepts whose depth is not larger than the exploration depth
and &gt; co is the number of concepts whose depth is larger. Note that concepts of the latter
kind may be introduced because the insertion phase might place a concept below a concept
that has (or exceeds) the exploration depth. A more detailed breakdown of concept depths can
be found in Table 2 in the appendix, where we also provide information about the outdegrees.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>We have presented an approach for constructing ontologies, which for now take the form of
concept hierarchies, from large language models such as GPT 3.5. To the best of our knowledge,
we are the first to do so. We believe that there are many interesting follow-up questions to our
work that we discuss in the following.</p>
      <sec id="sec-6-1">
        <title>Interaction with Human Domain Expert.</title>
        <sec id="sec-6-1-1">
          <title>As discussed in the introduction, it seems natural</title>
          <p>to add interaction with a human domain expert to the methodology. After all, an ontology is the
result of a conscious design process. For example, reconsider the ontology for the seed concept
Goats depicted in Figure 1. We believe that it depends on the use case whether the intended
direct subconcepts of Goats are breeds such as Saanen and Nigerian Dwarf or whether they are
related to use such as Dairy Goats and Fiber Goats, potentially with the breeds as subconcepts
below them. We believe that such design decisions cannot assumed to be taken ‘correctly’ by
the LLMs, but human intervention is required. Another useful input from a human user would
be to control the introduction of ‘esoteric’ concepts.</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>Evaluation of Constructed Ontologies.</title>
        <p>As there is no ground truth, already the precision
of the constructed ontologies is dificult to evaluate, and recall is even harder. In this paper,
all evaluation was purely manual and subjective. One may think of more systematic but
still manual evaluation strategies, e.g., via crowdsourcing. One may also try to use existing
taxonomies provided by knowledge bases such as Wikidata and Yago. An obvious challenge is
then that the concept names used by these knowledge sources will diverge from those proposed
by the LLM. It thus seems necessary to include some form of ontology matching, which is
error-prone. In this context, it is also interesting to note that the hierarchies for the strongly
structured domains of animals and plants constructed by our approach correspond more to
an ‘everyday conceptualization’ of these domains rather than reflecting scientific taxonomies.
A related but diferent idea is to use LLMs not for constructing ontologies, but for verifying
the correctness of existing ontologies. We are somewhat sceptical that this will bring about
good results due to the fact that most ontologies include quite a few concepts whose names
are not generally understandable. In the medical ontology Snomed CT, for example, there are
concepts such as “Parameter (observable entity)”, “Number of pieces in fragmented specimen”,
and “Counseling procedure with explicit context”.</p>
        <p>Philosophical Musings. We believe that the constructed ontologies also raise interesting
questions from the perspective of the social sciences. The main one is: What do these ontologies
represent? Since GPT 3.5 was trained on a large fraction of human knowledge (or at least of
internet knowledge) one might ask whether the constructed ontologies represent or
approximate, at least in part, the common conceptualization of the world shared by humanity. Note
that anthropological research has found considerable evidence that independent populations
consistently arrive at highly similar category systems across a range of basic topics, so it is
not absurd to assume that (at least to some extent) such common conceptualizations exist, see
for example [24] and references therein. At the same time, the constructed ontologies show a
cultural bias towards the western world and, most strongly, towards the US. For example, Chai
Tea is a synonym of Spiced Tea, which might be accepted in western countries while in many
other countries, chai is simply a synonym for tea. It might thus also be interesting to construct
ontologies in diferent languages and to compare the outcome for highlighting and analyzing
the cultural impact on conceptualizations.</p>
        <p>
          Querying GPT. It is well-known that performance of LLMs in knowledge acquisition tasks
heavily depends on engineering good prompts, and that small changes to the prompts can
result in drastic changes of the output [25, 26]. While we have put efort into careful prompt
engineering, there is certainly room for improvement and experimentation. Although this is
worthwhile, it is not so clear whether general lessons can be learned from it. Will the prompts
also work for other LLMs or even for the next version of GPT? In the following, we discuss a
few aspects. Our prompts are mostly based on zero-shot learning, meaning that we directly
pose to the LLM the questions that we want to get answers to. Only for existence and listing,
we use a mild form of few-show learning. It is well-known that the newest generation of
LLMs is very good at few-shot learning [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Going beyond that, one could try to fine-tune
the LLM towards ontology construction, hoping to then get by with simpler prompts. It
is also an interesting question whether one should do prompt engineering and fine-tuning
for domain-specific ontology construction rather than in a domain-independent way. For
example, when using the seed concept Animal, we might want to ask for subspecies, rather
than for subcategories. We mention that ontology-dependent fine-tuning is used in BERT-based
ontology completion [18, 19].
        </p>
        <p>Querying GPT? One may also question whether it is a good choice in the first place to use
‘general-purpose’ LLMs such as GPT as an ‘all-domain domain expert’. To construct a
highquality ontology for a specific domain, one might instead try to first train an LLM specifically
on selected and high-quality texts from that domain, and to then extract an ontology from the
resulting domain-specific LLM.</p>
        <p>Expressive Ontologies. An important direction for future work is to construct ontologies
that are more expressive than concept hierarchies. There are many possible directions. For
a start, one could add disjointness constraints between concepts. One can also extract and
add instances of concepts, which brings us closer to knowledge graph construction from
LLMs, see the related work section. Being more adventurous, one could try to construct
ontologies formulated in RDF Schema, in OWL 2 DL, or in OWL 2 QL. In all these cases, one
needs to (use LLMs to) identify also property names that are relevant for the domain under
consideration. Increasing the expressive power brings about more modeling decisions. For
example, should a red car be modeled as a concept RedCar, as a conjunction Red ⊓ Car or even
as Car ⊓ ∃hasColor.Red? It is far from clear how such modeling decisions should be taken.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work is partly supported by BMBF (Federal Ministry of Education and Research) in DAAD
project 57616814 (SECAI, School of Embedded Composite AI) as part of the program Konrad
Zuse Schools of Excellence in Artificial Intelligence.
Prompting as probing: Using language models for knowledge base construction, CoRR
abs/2208.11057 (2022). URL: https://doi.org/10.48550/arXiv.2208.11057. doi:10.48550/
arXiv.2208.11057. arXiv:2208.11057.
[15] A. Haviv, J. Berant, A. Globerson, Bertese: Learning to speak to BERT, in: Proceedings of the
16th Conference of the European Chapter of the Association for Computational Linguistics:
Main Volume, EACL 2021, Online, April 19 - 23, 2021, Association for Computational
Linguistics, 2021, pp. 3618–3623. URL: https://doi.org/10.18653/v1/2021.eacl-main.316.
doi:10.18653/v1/2021.eacl-main.316.
[16] R. Cohen, M. Geva, J. Berant, A. Globerson, Crawling the internal knowledge-base of
language models, in: Findings of EACL, Association for Computational Linguistics, 2023,
pp. 1811–1824. URL: https://aclanthology.org/2023.findings-eacl.139 .
[17] B. Veseli, S. Singhania, S. Razniewski, G. Weikum, Evaluating language models for
knowledge base completion, in: Proc. of ESWC, volume 13870 of LNCS, Springer,
2023, pp. 227–243. URL: https://doi.org/10.1007/978-3-031-33455-9_14. doi:10.1007/
978-3-031-33455-9\_14.
[18] H. Liu, Y. Perl, J. Geller, Concept placement using bert trained by transforming and
summarizing biomedical ontology structure, Journal of Biomedical Informatics 112 (2020)
103607. doi:https://doi.org/10.1016/j.jbi.2020.103607.
[19] J. Chen, Y. He, Y. Geng, E. Jimenez-Ruiz, H. Dong, I. Horrocks, Contextual semantic
embeddings for ontology subsumption prediction, 2023. arXiv:2202.09791.
[20] Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, P. Fung, Survey
of hallucination in natural language generation, ACM Comput. Surv. 55 (2023). URL:
https://doi.org/10.1145/3571730. doi:10.1145/3571730.
[21] F. Baader, B. Hollunder, B. Nebel, H. Profitlich, E. Franconi, An empirical analysis of
optimization techniques for terminological representation systems, or: Making KRIS get
a move on, in: Proc. of KR, 1992, pp. 270–281.
[22] B. Glimm, I. Horrocks, B. Motik, R. D. C. Shearer, G. Stoilos, A novel approach to ontology
classification, J. Web Semant. 14 (2012) 84–101. URL: https://doi.org/10.1016/j.websem.
2011.12.007. doi:10.1016/j.websem.2011.12.007.
[23] A. Holtzman, J. Buys, M. Forbes, Y. Choi, The curious case of neural text degeneration,</p>
      <p>CoRR abs/1904.09751 (2019). URL: http://arxiv.org/abs/1904.09751.
[24] D. Guilbeault, A. Baronchelli, D. Centola, Experimental evidence for scale-induced category
convergence across populations, Nature Communications 12 (2021) 327. URL: https:
//doi.org/10.1038/s41467-020-20037-y. doi:10.1038/s41467-020-20037-y.
[25] Z. Zhao, E. Wallace, S. Feng, D. Klein, S. Singh, Calibrate before use: Improving few-shot
performance of language models, in: M. Meila, T. Zhang (Eds.), Proc. of ICML, volume
139 of Proceedings of Machine Learning Research, PMLR, 2021, pp. 12697–12706. URL:
http://proceedings.mlr.press/v139/zhao21c.html.
[26] A. Holtzman, P. West, V. Shwartz, Y. Choi, L. Zettlemoyer, Surface form competition:
Why the highest probability answer isn’t always right, in: M. Moens, X. Huang, L. Specia,
S. W. Yih (Eds.), Proc. of EMNLP, Association for Computational Linguistics, 2021, pp.
7038–7051. URL: https://doi.org/10.18653/v1/2021.emnlp-main.564. doi:10.18653/v1/
2021.emnlp-main.564.</p>
      <p>Bold values mark the exploration depth of the respective ontology.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ozaki</surname>
          </string-name>
          ,
          <article-title>Learning description logic ontologies: Five approaches. where do they stand?</article-title>
          ,
          <source>Künstliche Intell</source>
          .
          <volume>34</volume>
          (
          <year>2020</year>
          )
          <fpage>317</fpage>
          -
          <lpage>327</lpage>
          . URL: https://doi.org/10.1007/s13218-020-00656-9. doi:
          <volume>10</volume>
          .1007/s13218-020-00656-9.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Vrolijk</surname>
          </string-name>
          , I. Reklos, M. Vafaie, A. Massari,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohammadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          ,
          <article-title>Toward a comparison framework for interactive ontology enrichment methodologies</article-title>
          ,
          <source>in: Proc. of VOILA@ISWC</source>
          , volume
          <volume>3253</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Konev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ozaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wolter</surname>
          </string-name>
          ,
          <article-title>Exact learning of lightweight description logic ontologies</article-title>
          ,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>18</volume>
          (
          <year>2017</year>
          )
          <volume>201</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>201</lpage>
          :
          <fpage>63</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ganter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sertkaya</surname>
          </string-name>
          , U. Sattler,
          <article-title>Completing description logic knowledge bases using formal concept analysis</article-title>
          ,
          <source>in: Proc. of IJCAI</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>230</fpage>
          -
          <lpage>235</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rudolph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Völker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <article-title>Supporting lexical ontology learning by relational exploration</article-title>
          ,
          <source>in: Proc. of ICCS</source>
          , volume
          <volume>4604</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2007</year>
          , pp.
          <fpage>488</fpage>
          -
          <lpage>491</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , E. Sigler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>in: Proc. of NeurIPS</source>
          , volume
          <volume>33</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Rae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Borgeaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Millican</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. F.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Aslanides</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Henderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Young</surname>
          </string-name>
          , E. Rutherford,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hennigan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Menick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cassirer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Powell</surname>
          </string-name>
          , G. van den Driessche,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Hendricks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rauh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Glaese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Welbl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dathathri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uesato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mellor</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Higgins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Creswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>McAleese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Elsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jayakumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Buchatskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Budden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sutherland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paganini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Martens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kuncoro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nematzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gribovskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Donato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lazaridou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lespiau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tsimpoukelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Grigorev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fritz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sottiaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pajarskas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pohlen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Toyama</surname>
          </string-name>
          , C. de Masson d'Autume,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Terzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mikulik</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Babuschkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. de Las Casas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Guy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bradbury</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>B. A.</given-names>
          </string-name>
          <string-name>
            <surname>Hechtman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Weidinger</surname>
            , I. Gabriel, W. Isaac,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Lockhart</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Osindero</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Rimell</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Ayoub</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Stanway</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Bennett</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hassabis</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Kavukcuoglu</surname>
          </string-name>
          , G. Irving,
          <article-title>Scaling language models: Methods, analysis &amp; insights from training gopher</article-title>
          ,
          <source>CoRR abs/2112</source>
          .11446 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2112.11446. arXiv:
          <volume>2112</volume>
          .
          <fpage>11446</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schuh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tsvyashchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Maynez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Prabhakaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Reif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hutchinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pope</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Austin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Isard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gur-Ari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Duke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Levskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghemawat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Michalewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Misra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fedus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ippolito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zoph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Spiridonov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sepassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Omernick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Pillai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pellat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lewkowycz</surname>
          </string-name>
          , E. Moreira,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Polozov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Saeta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Diaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Firat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Catasta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Meier-Hellstern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Fiedel</surname>
          </string-name>
          , Palm:
          <article-title>Scaling language modeling with pathways</article-title>
          ,
          <source>CoRR abs/2204</source>
          .02311 (
          <year>2022</year>
          ). URL: https://doi.org/10.48550/arXiv.2204.02311. doi:
          <volume>10</volume>
          .48550/ arXiv.2204.02311. arXiv:
          <volume>2204</volume>
          .
          <fpage>02311</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Patwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Norick</surname>
          </string-name>
          , P. LeGresley, S. Rajbhandari,
          <string-name>
            <given-names>J.</given-names>
            <surname>Casper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Prabhumoye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zerveas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Korthikanti</surname>
          </string-name>
          , E. Zheng,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Y.</given-names>
            <surname>Aminabadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bernauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shoeybi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , M. Houston,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Catanzaro</surname>
          </string-name>
          ,
          <article-title>Using deepspeed and megatron to train megatron-turing NLG 530b, A large-scale generative language model</article-title>
          ,
          <source>CoRR abs/2201</source>
          .11990 (
          <year>2022</year>
          ). URL: https://arxiv.org/abs/2201.11990. arXiv:
          <volume>2201</volume>
          .
          <fpage>11990</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S. H.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakhtin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Language models as knowledge bases?</article-title>
          ,
          <source>in: Proc. of EMNLP-IJCNLP, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2463</fpage>
          -
          <lpage>2473</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Razeghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L. L.</given-names>
            <surname>IV</surname>
          </string-name>
          , E. Wallace,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>Autoprompt:</surname>
          </string-name>
          <article-title>Eliciting knowledge from language models with automatically generated prompts</article-title>
          ,
          <source>in: Proc. of EMNLP, Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>4222</fpage>
          -
          <lpage>4235</lpage>
          . URL: https://doi.org/10.18653/v1/
          <year>2020</year>
          . emnlp-main.
          <volume>346</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>346</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <article-title>How much knowledge can you pack into the parameters of a language model?</article-title>
          ,
          <source>in: Proc. of EMNLP, Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>5418</fpage>
          -
          <lpage>5426</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eisner</surname>
          </string-name>
          ,
          <article-title>Learning how to ask: Querying lms with mixtures of soft prompts</article-title>
          ,
          <source>in: Proc. of NAACL-HLT, Association for Computational Linguistics</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>5203</fpage>
          -
          <lpage>5212</lpage>
          . URL: https://doi.org/10.18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>410</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>410</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Alivanistos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Santamaría</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kalo</surname>
          </string-name>
          , E. van Krieken, T. Thanapalasingam,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>