<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Knowledge-centric Prompt Composition for Knowledge Base Construction from Pre-trained Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xue Li</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anthony Hughes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Majlinda Llugiqi</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fina Polat</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Groth</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fajar J. Ekaputra</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Language</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TU Wien</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Wolverhampton</institution>
          ,
          <addr-line>Wolverhampton</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Vienna University of Economics and Business</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Pretrained language models (PLMs), exemplified by the GPT family of models, have exhibited remarkable proficiency across a spectrum of natural language processing tasks and have displayed potential for extracting knowledge from within the model itself. While numerous endeavors have delved into this capability through probing or prompting methodologies, the potential for constructing comprehensive knowledge bases from PLMs remains relatively uncharted. The Knowledge Base Construction from Pre-trained Language Model Challenge (LM-KBC) [1] looks to bridge this gap. This paper presents the system implementation from team thames to Track 2 of LM-KBC. Our methodology achieves 67 % F1 score on the test set provided by the organisers outperforming the baseline by over 40 points, which ranked 2nd place for Track 2. It does so through the use of additional prompt context derived from both training data and the constraints and descriptions of the relations. All code and results can be found on GitHub1.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The field of Artificial Intelligence (AI) has seen huge improvements in tasks related to language
due to Pre-trained Language Models (PLMs)[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and the computational eficiency introduced
by transformers [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This significant improvement can be seen in areas such as translation,
summarisation, and classification [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ].
      </p>
      <p>
        Given their efectiveness in many information extraction tasks [
        <xref ref-type="bibr" rid="ref5 ref7">5, 7</xref>
        ], there has been a
movement by the community to study their use in tasks focused specifically on knowledge
base construction [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. As part of that larger interest, The Knowledge Base Construction from
Pre-trained Language Model Challenge (LM-KBC) was launched in 2022 to better understand
the role that PLMs can play as a source of knowledge themselves [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Essentially, providing a
framework to study how one can construct a knowledge base directly from a PLM.
      </p>
      <p>This report presents our approach and results for the second edition of the LM-KBC challenge
at the 22nd International Semantic Web Conference (ISWC 2023). The challenge is to predict
objects for a given subject-predicate pair. An example is given a subject, Matt Damon, and the
relationship we are targeting, person has number of children, retrieve the object, in this case, a
number, for that pair. It may be that the model needs to predict an existing Wikidata object,
example being, the subject country Fiji, has associated geographical states, and the objects we
wish to retrieve are those Wikidata states.</p>
      <p>We propose a pipeline for knowledge base construction by prompting large language models,
specifically, GPT-3.5 and GPT-4. We explore diferent setups with in-context learning by utilizing
an example selector and knowledge-enriched prompts to provide more contextually relevant
prompts. Our results show rule-based example selectors considering cardinality per relation
exhibit significant performance on the task. Furthermore, enriching entities and relations with
additional properties obtained from GPT-4 help boost the performance even further.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The notion of using a language model as a source of knowledge itself was brought to the fore
by the LAMA paper in 2019 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This can be seen as one part of the larger move towards
prompting PLMs to solve NLP tasks. We refer the reader to the survey by Min et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] for
a deeper dive into prompting and associated architectures for NLP. Here, we focus on work
directly related to the LM-KBC challenge. An overview of the various approaches can be found
in the 2022 challenge introduction [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Specifically, our work follows on from the winner of task 2 of last year’s challenge, “Prompting
as Probing”[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In their work, they prompt GPT3 with manually curated prompt templates,
including 4 examples from the training set in their prompts. These are then updated with the
specific subject entity of interest during the prompting workflow. Additionally, they include a
post-processing step called “fact-probing” in which the PLM is asked to judge whether a given
result produced by PLM is indeed true. This helps improve the precision of the model. The
authors went on to perform an ablation study outside of the challenge whereby they utilised
Wikidata to help improve entity disambiguation during the post-processing step. By using
Wikidata information, such as the hypothesised concepts type in relation to the relationship
used during prediction, to validate the prediction. This study proved an slight performance
gains, however this was not allowed to be part of the reporting in 2022, but in 2023, such
retrieval augmentation is allowed. We employ a similar approach here.
      </p>
      <p>
        Our approach difers because we focus on dynamically selecting examples from the training
set to include in the prompt. Additionally, our prompts provide more context than those used
by Alivanistos et al [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Also, we note that we use a newer version of GPT.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], a benchmark is provided to establish the ability of models to construct knowledge
graphs from text. The authors provide an ontology description as part of their prompts. The
prompt consistently employs the relation(subject, object) format to represent relationships and
expects the model’s output to adhere to this notation. We also employ ontology descriptions
(i.e. extra knowledge base context) in our prompts.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. LM-KBC Challenge Definition</title>
      <p>The Language Model Knowledge Base Construction (LM-KBC) challenge task is defined as
follows. Take a set of subject () predicate () pairs (&lt; ,  &gt;) and predicting a set of objects
(1, 2...) in relation to those pairs. The target set of objects can be; (1) a wikidata identifier, (2)
a numerical value, or (3) empty.</p>
      <p>The LM-KBC Challenge provides two distinct tracks for participants. The first, known as
the "Small-model Track," restricts participants from using pre-trained Language Models with
no more than 1 billion parameters and excludes the use of contextual information. The second
termed the "Open Track," imposes no limitations on the model size and permits the inclusion of
contextual data. For the purposes of our research, we tackle the Open Track.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>The dataset available for LM-KBC comprises 5820 samples (i.e. triples) evenly divided over
training, validation and test set. The objects within the test set were withheld during the
time period when our system was developed. The dataset encompasses an array of 21 distinct
relations, where a diverse range of subject-entities are provided for each relation. Each triple
contains both the Wikidata identifiers and also lexicalizations of each element of the triple as
English text.</p>
        <p>Each relation in the train and validation sets is accompanied by a set of ground truth
objectentities, curated to align with specific subject-relation pairs. It is noteworthy that the length of
object-entities afiliated with a given subject-relation pairing exhibits variability. Meaning, in
the test set, the implementation must correctly predict sets of objects and empty sets.</p>
        <p>We note that there are four relations, e.g. PersonHasPlaceOfDeath, where there are potentially
zero relations to the subjects in the available sets. In comparison, CountryHasStates requires
from between one and twenty objects for the prediction of the related subjects. Furthermore,
objects are not limited to other Wikidata entries, the entries could also be numerical, e.g.
SeriesHasNumberOfEpisodes.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>
        In-context learning is a fundamental capability in many language modelling approaches. The
approach is prominent in GPT models particularly starting from GPT-3 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Our approach
centers around in-context learning (i.e. prompting, few-shot learning), which combines the
capabilities of pre-trained language models with the contextual information available in the text
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Specifically, we focus on the design and utilization of few-shot prompts. Prompting refers
to the use of specific instructions and/or statements to induce the model to complete certain
tasks. Few-shot learning is an approach where the language model learns how to perform a
task from minimal data points, i.e., learning the task from only a few examples (few shots).
Considering prompting and few-shot learning principles, we carefully designed our prompts that
list of wikidata relations;
cardinality of relations
Wikidata context
extraction
prompts &amp;
results
      </p>
      <p>Test Data
input
test data
prompts &amp;
results</p>
      <p>GPT
prompt
methods</p>
      <p>Prompt execution
prompt
results</p>
      <p>Result
postprocessing
Wikidata entity
label &amp; ID
final
results
integrate both static and dynamic elements. An overview of our method is provided in Figure 1.
From the training dataset, we derive a list of wikidata relations, along with their cardinality
distribution. We use the set of relations for two purposes: example selector design and Wikidata
context extraction that is prompted from GPT models. The prompt template is subsequently
defined by these two components, followed by a refining process for hyperparameters such
as the number of examples selected based on the final model performance on the evaluation
set. We then execute the prompts with test data through GPT models. Finally, the generations
are post-processed and connected to Wikidata IDs. We will explain the components of our
approach in the following sub-sections.</p>
      <sec id="sec-4-1">
        <title>4.1. Prompt Template Definition</title>
        <p>In compiling our prompts, we employ static prefix and sufix components while dynamically
selecting examples in between based on the given subject-predicate pair. We incorporate static
elements at the beginning and end of each prompt to provide a consistent context for the
language model. The prefix scopes the prompt, guiding the model to understand the relevant
parameter space, while the sufix ensures structure and uniformity. The crux of our methodology
lies in selecting and integrating dynamic examples from the training dataset into few-shot
prompts. This process is facilitated by the use of two distinct example selectors, designed to
guide the language model’s comprehension of the extraction task at hand.</p>
        <p>
          GPT-3.5 and GPT-4 have been fine-tuned utilizing dialogue and instruction datasets [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. On
top of this finetuning, they are both optimized for dialogue and instruction followed by using
Reinforcement Learning with Human Feedback (RLHF) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. RLHF is a machine-learning approach
that involves mapping out optimal strategies based on human responses. This technique allows
the language model to learn more complex behaviours and concepts that are dificult to define
or specify explicitly in a traditional reinforcement learning setup. By incorporating humans in
training, the model can inherit a more nuanced understanding of several tasks [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Leveraging
this background knowledge, we carefully phrase the static components of the prompts.
        </p>
        <p>Initially, the prefix assigns a role to the LLM, i.e. "Act as a knowledge base", "Imagine that
you are Wikidata", etc. Then, a brief task description is given, followed by an explicit statement
indicating that the prompt will continue with examples. A fixed example template has been
devised to be populated by the example selectors. The sufix then delineates the conclusion of
the selected examples, stating that it is now the LLM’s turn for prediction. The prompt ends
with a template to be filled with the input subject-predicate pair and a signal of continuation,
i.e. ":", "[", etc.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Example Selectors for In-Context Learning</title>
        <p>
          Context sensitivity is a recognized phenomenon in in-context learning [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The immediate
textual content that appears prior to the prediction point is the sole form of the input. Everything
the model generates from that point is a continuation of the prompted input. This sensitivity
can be both beneficial and problematic. An advantage is that it enables the model to adapt
rapidly to changing task requirements and examples. As a drawback, the high sensitivity can
lead to issues with model consistency and predictability, resulting in a hallucinatory generation.
Therefore, prompts play a crucial role when it comes to extracting knowledge from the language
model.
        </p>
        <p>We account for context sensitivity in the selection of both static and dynamic components
of the prompt. However, the dynamic selection of the most relevant examples is specifically
designed to leverage context sensitivity. Our example selectors pick out the most relevant
instances from the training set. The rule-based selector follows certain rules for the selection
while the similarity-based selector leverages cosine similarity. The selectors are detailed in the
following subsections.</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. Rule-based Example Selector</title>
          <p>The rule-based example selector is designed as a systematic approach to sample examples from
the training set. Given that instances may have zero or more objects, this approach ensures that
the diverse nature of examples is taken into account. To instil this understanding, we enriched
our prompts with five specific examples for each instance. The selection criteria for these five
examples are as follows:
• Minimum Object Example: We selected one example with the fewest number of objects
for a given relation.
• Maximum Object Example: We selected one example with the highest number of objects
for a given relation.
• Random Selection: To add an element of variability and ensure broader coverage, we
incorporated three additional examples. These were chosen at random from the training
set.</p>
          <p>This strategy helps in achieving a balanced representation of the data, ensuring that the
model does not develop a bias towards any particular pattern.
{
}
},
...</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Similarity-based Example Selector</title>
          <p>The similarity-based example selector operates by using semantic similarity measures to identify
instances that are akin to the input instance. This approach allows for the dynamic selection of
examples that are contextually compatible with the input text. The functioning of this system
relies on embeddings and necessitates a list of vectorized or embedded examples, to which
the given input can be compared. Furthermore, it computes a semantic similarity score, such
as cosine similarity or dot product, in order to select the closest examples from the pool of
embedded examples.</p>
          <p>As far as performance is concerned when applied to GPT-3.5, this selector is noticeably
slower than the rule-based selector, which is expected given its operation at the embedding
level. Semantic similarity-based selection methods are more suited to tasks that harbour a high
degree of variation and ambiguity. However, the task at hand in this case, shows a lower degree
of variation as it is limited to 21 relations.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Prompt Improvement through Wikidata Context Extraction</title>
        <p>We hypothesise, that given a subject-predicate pair, we can gain greater accuracy when
predicting the object if the model is given the correct context for that pair. We extend this further
by utilising the schema and knowledge base from which the subject and predicate came from.
Specifically for this task, we state that, for a subject-predicate pair from Wikidata, it is possible
to use the qualities from that particular knowledge base to enhance the prompt. To this end,
we prompt GPT-3.5 to provide a set of relevant contexts related to the given properties. The
prompt that we use to extract these contexts is available in our GitHub repository1. An excerpt
of the result is available in Listing 1
Listing 1: An excerpt of the extracted Wikidata context on the given properties from GPT-3.5
"CompanyHasParentOrganisation": {
"value": "P749",
"wikidata_id": "P749",
"wikidata_label": "parent organization",
"domain": "organization",
"range": "organization",
"explanation": "This property is used to indicate the parent organization
of a company."</p>
        <p>An example context usage is provided in Listing 2. For this example, the subject, AT&amp;T,
the Wikidata ID Q35476 and relation, CompanyHasParentOrganisation, are provided by the
competition dataset. Using Wikidata context generated through prompt, we are able to enhance
the context by finding related information to the given relation. These contexts include subject
and object class type, domain and range information, the label of the given relation, and a full
description of that label. This context information is injected into the relevant sections of the
prompt.</p>
        <p>Listing 2: An example wikidata context usage within the prompt
Your task is to predict objects based on the given subject and relation.
- Given Subject: ('AT&amp;T','Q35476')
- Subject Type: 'organization'
- Object Type: 'organization'
- Relation: 'CompanyHasParentOrganisation'
- Relation Wikidata ID: 'P749'
- Relation Label (Wikidata): 'parent organization'
- Relation Explanation (Wikidata): 'This property is used to indicate
the parent organization of a company.'
==&gt;
Predicted Objects:</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Prompt Execution and Post-processing</title>
        <p>We adapted prompt execution and post-processing parts of the baseline code provided by the
challenge. The adaptations are mostly to cater for the needs of debugging and testing. One
exception, however, pertains to the post-processing entity with title and subtitle in the results
(i.e., results containing character ":"). We noticed that some results in the validation set only
matched results without the subtitle part. Therefore, we add a slight modification in the Wikidata
disambiguation to check for only the main title in case of full string did not return Wikidata
IDs. This update helps to improve the results, especially for the PersonHasAutobiography
relation. Additionally, with our post-processor, we notice that model tends to generate duplicated
results and therefore we added a de-duplication step.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We summarise the results of our system implementation and experiments in Table 1. We then
present a more detailed comparison between GPT-3.5 and GPT-4 with the highest performing
example selection methodology utilised when prompting each model, this comparison is shown
in Table 2. Also discussed are the results for zero-object cases, this is where the system should
correctly not predict an object for some subject-predicate pairs.</p>
      <sec id="sec-5-1">
        <title>5.1. Overview</title>
        <p>In our methodology, we discuss two potential example selection mechanisms, where the selected
examples are injected into various prompts. We first performed experiments with GPT-3.5 and
the similarity-based selection methodology, we then used our proposed rule-based methodology
for both models. From our experiments, we find that a rule-based approach to prompt creation
yields greater scores in recall and F1 regardless of the underlying model. The use of GPT-4 in
combination with the rule-based approach gave the best results overall for all F1 metrics.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Rule-based prompts</title>
        <p>Given that rule-based prompts ofer the best results overall in our experiments, we present a
more detailed comparison between GPT-3.5 and GPT-4 where a full breakdown of all predicate
scores is available in Table 2.
Table 2 demonstrates the eficacy of GPT-4 over GPT-3.5. GPT-4 outperforms in 18 of
the 21 relations that require predictions. The two relations where GPT-3.5 outperforms are
CountryBordersCountry and PersonHasEmployer, while for one relation PersonHasSpouse, the
two models are tied. We look to further break down the relations in GPT-4 to identify any
patterns that may emerge. The three lowest performing classes are PersonHasAutobiography,
PersonHasEmployer, PersonHasProfession. All three of these relations are the subject of type
Person. This pattern of poorer performance on Person related relations is common to both
models.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Zero-object cases</title>
        <p>As discussed in Section 3.1, it is possible for one of the given subject-predicate pairs that the
target object to be predicted is empty. In Table 5.3, we present the F1 scores in regard to this
specific issue.</p>
        <p>Overall, GPT-4 is the better-performing model for this specific task when using rule-based
example selection for prompt construction.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>The findings and observations from our study have shed light on a few perspectives when using
PLMs for KBC, including how contexts afect in-context-learning performance, the impact of
post-processing and what the limitations of GPT-family models are for this task.</p>
      <p>Contextual Relevance in In-Context Learning: In our experiments, we observe that both
demonstrated examples and additional knowledge of the entities play a crucial role in enhancing
a model’s understanding and generation. This aligns with the fundamental idea that a richer
context helps produce a more coherent response. To improve contextual relevance, one can select
more relevant demonstrations given relations and entities to predict. Additionally, providing
extra knowledge for relations and entities can help generate more accurate responses.</p>
      <p>Impact of Post-Processing: PLMs do not always follow the given instructions. Due to the fact
that PLMs are often fine-tuned with natural question-answering style tasks, the generation
of answers often comes as a natural language-style answer. Hence, being able to unify the
answers and quantitatively evaluate them is a challenge. Follow from this is that efective
post-processing strategies are necessary for the generation of quality results.</p>
      <p>Performance Enhancement of GPT-4: Our results corroborate the general consensus that
GPT-4 improves performance compared to its predecessors, such as GPT-3.5.</p>
      <p>Hallucinations on Relation Types: Although GPT-4 has shown significant ability for predicting
the objects given subject and relation, the model still shows signs of hallucination. The model
especially struggles with specific types of relations such as PersonHasProfession and
PersonHasEmployer. When allowed to generate multiple answers, the model tends to hallucinate after
the first correct answer and generate related professions but not factually correct. This might
be improved if the model can fact-check with every answer it produces.</p>
      <p>Temporal misalignment between Wikidata and GPT-family Models: The dataset from the
organizers is from Wikidata, which contains up-to-date knowledge, while GPT-family models
were only trained with text till September 2021, resulting in a performance bottleneck due to
the nature of the dataset.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion &amp; Future work</title>
      <p>We proposed a PLM-based pipeline centered on in-context learning for performing knowledge
base construction, specifically, for the task of predicting objects given a subject and relation.
We explored diferent approaches to prompting including the use of contextual information
from training and an associated knowledge graph. Our results indicate that providing examples
with higher contextual relevance, including the type of relations, and the possible cardinality of
the objects, can help with knowledge base construction.</p>
      <p>Our results show that PLMs have great potential to perform KBC tasks when prompted
efectively. However, we still observe a list of limitations during the process: (1) The temporal
information gap within the GPT-family of models may result in providing inaccurate responses.
(2) The free-form of generation of generative PLMs makes the evaluation of the model’s true
capacity challenging. (3) Models struggle with the actual number of answers for relations such
as "PersonPlaysInstrument" and potentially will hallucinate by returning answers that
should not be returned. (4) We require humans to design the prompt template and hence will
need to re-design when adapting to other tasks. In the future, diferent paths can follow to
address current limitations.</p>
      <p>1. Utilizing automatic prompt optimization techniques such as in [19]. Instead of human
modifying prompts, we can learn the most optimal prompts automatically.
2. Chaining large language model prompts [20] to iteratively feed the output of the previous
response to the next, aiming to amplify the advantages at each step and provide a more
structured interaction with the model. Given this technique, we might be able to address
the hallucination problem to some extent. A chain-of-thought prompt provides internal
validation and improves the models’ robustness in responses.
3. Exploring the efects of example selectors with diferent attributes. Currently, the example
selector only considers the types of relations and the possible number of objects. Another
avenue to explore is to select examples based on the properties of each relation type.
Being able to understand how exactly example selectors afect the response could help to
generalize to other tasks.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We thank the organizers of the 2023 Knowledge Prompting Hackathon2 where this research
was started. We also thank Data Language3 for providing resources for our experiments. This
research is supported in part by Dutch Research Council (NWO) through grant MVI.19.032, the
European Union’s Horizon 2020 research and innovation programme within the OntoTrans
project (No. 862136) and the ENEXA project (grant agreement no. 101070305), as well as the
Austrian Science Fund (FWF) within the HOnEst project (No. V 754-N).</p>
    </sec>
    <sec id="sec-9">
      <title>Authors’ contribution</title>
      <p>Authors’ contribution according to CRediT (https://credit.niso.org/): Investigation, Methodology,
Conceptualization (XL, AH, ML, FP, FE); Software (XL, AH, ML, FP, FE); Supervision (PG, FE);
Writing (XL, AH, ML, FP, PG, FE).
2https://king-s-knowledge-graph-lab.github.io/knowledge-prompting-hackathon/
3www.datalanguage.com
[19] T. Shin, Y. Razeghi, R. L. L. IV, E. Wallace, S. Singh, Autoprompt: Eliciting knowledge from
language models with automatically generated prompts, CoRR abs/2010.15980 (2020). URL:
https://arxiv.org/abs/2010.15980. arXiv:2010.15980.
[20] T. Wu, M. Terry, C. J. Cai, Ai chains: Transparent and controllable human-ai interaction
by chaining large language model prompts, in: Proceedings of the 2022 CHI conference
on human factors in computing systems, 2022, pp. 1–22.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Singhania</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Kalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <article-title>Lm-kbc: Knowledge base construction from pre-trained language models, semantic web challenge @ iswc, CEUR-WS (</article-title>
          <year>2023</year>
          ). URL: https://lm-kbc.github.io/challenge2023/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Language models: Past, present, and future</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>65</volume>
          (
          <year>2022</year>
          )
          <fpage>56</fpage>
          -
          <lpage>63</lpage>
          . URL: https://doi.org/10.1145/3490443. doi:
          <volume>10</volume>
          .1145/3490443.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          , et al.,
          <article-title>Palm: Scaling language modeling with pathways</article-title>
          ,
          <source>arXiv preprint arXiv:2204.02311</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Beeching</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fourrier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Habib</surname>
          </string-name>
          , S. Han,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lambert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rajani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sanseviero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tunstall</surname>
          </string-name>
          , T. Wolf, Open LLM Leaderboard, https://huggingface.co/spaces/HuggingFaceH4/open_ llm_leaderboard,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6] OpenAI, Gpt-4
          <source>technical report</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>P.-L. Huguet</surname>
            <given-names>Cabot</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          , REBEL:
          <article-title>Relation Extraction By End-to-end Language generation, in: Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics</article-title>
          , Punta Cana, Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>2370</fpage>
          -
          <lpage>2381</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .findings-emnlp.
          <volume>204</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .findings-emnlp.
          <volume>204</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosselut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rashkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Malaviya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Celikyilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          , Comet:
          <article-title>Commonsense transformers for automatic knowledge graph construction</article-title>
          ,
          <source>in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4762</fpage>
          -
          <lpage>4779</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Melnyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dognin</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. Das</surname>
          </string-name>
          ,
          <article-title>Grapher: Multi-stage knowledge graph construction using pretrained language models</article-title>
          ,
          <source>in: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Singhania</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.-P.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          ,
          <article-title>LM-KBC: Knowledge Base Construction from Pre-trained Language Models</article-title>
          , in: S. Singhania,
          <string-name>
            <given-names>T.-P.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , S. Razniewski (Eds.),
          <source>Proceedings of the Semantic Web Challenge on Knowledge Base Construction from Pre-trained Language Models</source>
          <year>2022</year>
          , volume
          <volume>3274</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEUR,
          <string-name>
            <surname>Virtual</surname>
            <given-names>Event</given-names>
          </string-name>
          , Hanghzou,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3274</volume>
          /#paper0.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakhtin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Language models as knowledge bases?</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>2463</fpage>
          -
          <lpage>2473</lpage>
          . URL: https://aclanthology.org/D19-1250. doi:
          <volume>10</volume>
          .18653/ v1/
          <fpage>D19</fpage>
          -1250.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sulem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P. B.</given-names>
            <surname>Veyseh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sainz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Heintz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>Recent advances in natural language processing via large pre-trained language models: A survey</article-title>
          ,
          <source>ACM Comput. Surv</source>
          . (
          <year>2023</year>
          ). URL: https://doi.org/10.1145/3605943. doi:
          <volume>10</volume>
          .1145/3605943, just Accepted.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Alivanistos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Santamaria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Kalo</surname>
          </string-name>
          , E. v. Krieken, T. Thanapalasingam,
          <article-title>Prompting as Probing: Using Language Models for Knowledge Base Construction</article-title>
          , in: S. Singhania,
          <string-name>
            <given-names>T.-P.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , S. Razniewski (Eds.),
          <source>Proceedings of the Semantic Web Challenge on Knowledge Base Construction from Pre-trained Language Models</source>
          <year>2022</year>
          , volume
          <volume>3274</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEUR,
          <string-name>
            <surname>Virtual</surname>
            <given-names>Event</given-names>
          </string-name>
          , Hanghzou,
          <year>2022</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>34</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3274</volume>
          /#paper2.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mihindukulasooriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lata</surname>
          </string-name>
          ,
          <article-title>Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2308</volume>
          .
          <fpage>02357</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , E. Sigler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/ paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hayashi</surname>
          </string-name>
          , G. Neubig,
          <article-title>Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Wainwright</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Agarwal,
          <string-name>
            <given-names>K.</given-names>
            <surname>Slama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schulman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kelton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Welinder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Christiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leike</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lowe</surname>
          </string-name>
          ,
          <article-title>Training language models to follow instructions with human feedback</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2203</volume>
          .
          <fpage>02155</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Christiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leike</surname>
          </string-name>
          , T. Brown, M. Martic,
          <string-name>
            <given-names>S.</given-names>
            <surname>Legg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Deep reinforcement learning from human preferences</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>