<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Navigating Nulls, Numbers and Numerous Entities: Robust Knowledge Base Construction from Large Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arunav Das</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nadeen Fathallah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicole Obretincheva</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>King's College London</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Analytic Computing, Institute for Artificial Intelligence, University of Stuttgart</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this work, we employ advanced prompt engineering techniques with pre-trained Large Language Models (LLMs) to enhance knowledge extraction and structuring for Knowledge Base Construction (KBC) tasks. At the core of our methodology is the strategic fusion of diferent prompting strategies tailored to the unique relations between subject and object entities. We choose diferent prompting strategies to overcome the challenges of null values, numeric data, and one-to-many relationships inherent in traditional KBC tasks. By integrating role-play and context-aware prompting, we enrich the interaction with the LLM, guiding it to produce more accurate and contextually relevant outputs. Our method achieved a macro-averaged F1-score of 0.653 across the properties, with the scores varying from 0.890 to 0.399. Our results show a marked improvement in precision and recall of the extracted data, highlighting the eficacy of our approach in transforming raw LLM outputs into structured, queryable knowledge bases. Our code base is publicly available for research and development purposes, accessible at: https://github.com/nobretincheva/challenge24.git</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The emergence of advanced contextual Large Language Models (LLMs), exemplified by the
encoder-only models (e.g. BERT), decoder-only models (e.g. GPT, Mistral, Llama series), and
encoder-decoder (e.g. Flan-T5) models, has revealed a remarkable capacity for encapsulating
vast amounts of factual world knowledge within their parametric structures based on their
training methods and datasets. This internal representation of knowledge has been likened to
the schema-based relational knowledge base (KB) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] traditionally used in information systems
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The potential of LLMs to serve as de facto knowledge repositories, unconstrained by the
rigid formalisms of conventional schema-based systems, has become a subject of intensive
inquiry [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">3, 4, 5, 6</xref>
        ]. Researchers have employed a diverse array of evaluative methodologies
to assess the viability of LLMs in this capacity. These assessment techniques span a broad
spectrum of cognitive and computational tasks, including but not limited to knowledge probing
[
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ], question answering [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], compositional reasoning [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and knowledge base completion
(KBC) [
        <xref ref-type="bibr" rid="ref10 ref6">6, 10</xref>
        ].
      </p>
      <p>The ISWC KBC-LLM challenge seeks to advance the field of KBC through the exploration
of LLM’s internal knowledge representations. Unlike previous work that focuses primarily on
knowledge probing, this challenge involves the completion of factual, disambiguated knowledge
bases for a given set of relations. Participants complete knowledge graphs by extracting entities
and relations from LLMs, handling complexities such as null values, one-to-many relations, and
cardinality constraints.</p>
      <p>The ISWC’24 KBC-LLM challenge introduces constraints on model size (&lt; = 10b parameters)
and relation sets (=5 only) to foster innovation and fair competition. This challenge extends
beyond previous knowledge extraction eforts by requiring the construction of comprehensive,
disambiguated knowledge bases from LLMs. The task set by this challenge is to predict the
object entities (null, zero, single, multiple) given the subject entity and the relation that is
sourced from Wikidata. Given a subject entity and a specified relation, the task involves
predicting a set of potential object entities using an LLM. Subsequently, these predictions
undergo disambiguation and mapping to external knowledge bases to yield precise entity
identifiers. This process necessitates handling diverse relational complexities, including null
values, one-to-many relationships, and the extraction of multiple objects per query.</p>
      <p>The intricate relational structures inherent within the challenge dataset render traditional
knowledge probing methodologies insuficient. This work introduces a novel framework for
fusion prompt strategy, incorporating dual, direct, and loop prompting methods tailored to the
specific relational characteristics of the target knowledge, as well as context-aware prompting
and role-play prompting. Due to the constraints imposed upon model size in this challenge we
choose to use the Llama-3-8b-Instruct model. Using our fusion prompt strategy, we are able to
achieve a macro-average F1 score of 0.653 on the test set, with F1-scores ranging from 0.399 in
the awardWonBy relation to 0.890 for countryLandBordersCountry relation.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>This section focuses on two key areas relevant to our tasks related to the ISWC Challenge:
Knowledge Probing in LLMs and KB completion/construction. We highlight the state-of-the-art
approaches and identify existing gaps in these domains.</p>
      <p>
        Knowledge Probing in LLMs: The LAMA (LAnguage Model Analysis) probe [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] marked a
significant milestone in assessing factual knowledge in pre-trained language models. This work
demonstrated that LLMs could compete with traditional KBs for certain types of factual queries.
LAMA’s reliance on atomic fact elicitation and its underlying assumption of binary subject-object
relations limited its capacity to capture the complexity of real-world knowledge. Moreover, the
manual engineering of prompts to extract factual knowledge from LLMs yielded inconsistent
results and underestimated their potential. The application of systematic prompt engineering
methodologies, including prompt mining and paraphrasing, coupled with ensemble and ranking
techniques, has demonstrated a substantial enhancement in knowledge elicitation from LLMs
compared to the traditional approach of manually crafted single prompts[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. AutoPrompt[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
a template-based automated discrete prompting methodology, and OPTIPROMPT [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], a
continuous prompting strategy, have both demonstrated superior eficacy in eliciting knowledge
from LLMs compared to both manual prompt engineering and data-driven approaches such as
prompt mining and paraphrasing. Despite incremental advancements in prompt engineering
techniques, a comprehensive evaluation of the implicit knowledge base within pre-trained
LLMs remains elusive due to methodological limitations, including the restrictive focus on
cloze-style prompts. Existing approaches have yet to adequately address critical aspects such
as the diferential capabilities of encoder and decoder models, precision in subject and
relation extraction, the handling of complex relationship types (one-to-many, many-to-many), the
inference of transitive relations, and the representation of hierarchical knowledge structures.
While these methods efectively enhanced the extraction of specific knowledge types, they fell
short in constructing comprehensive and interconnected knowledge bases. Previous studies
for knowledge probing have mainly focused on the elicitation of knowledge without using the
results from such studies to construct or complete knowledge graphs
      </p>
      <p>
        Knowledge Base Completion with LLMs Traditional methods for knowledge base
completion primarily relied on three core approaches [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ]. Embedding-based techniques sought
to represent entities and relations as dense vectors in a latent space, capturing semantic and
syntactic similarities. Probabilistic graphical models, such as Markov random fields, were
employed to model complex interdependencies between entities and infer missing information
through logical reasoning. Alternatively, path-based methods explored entity connections
within the knowledge graph by simulating random walks, identifying potential paths between
given entities. However, these techniques are often constrained by their reliance on explicit
knowledge representation and their limited ability to capture complex semantic and relational
patterns.
      </p>
      <p>
        The use of LLMs as an alternative for KBC tasks, also known as text-based KBC methods, is
a relatively new area of research [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Initial studies have found varied degrees of success for
triple classification (comparatively highest task accuracy for diferent LLM models), relation
prediction, and entity predictions (comparatively lowest task accuracy for diferent LLM models)
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. LLM KBC accuracy has been found to vary significantly for diferent types of relations (the
study is restricted to just one model BERT) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. While model size exhibited a positive correlation
with accuracy up to a certain threshold, diminishing returns were observed for exceedingly large
models. The application of instruction tuning methodologies as well as contrastive learning
methods [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] have yielded performance improvements. While KBC research has expanded
to encompass a broader range of models (e.g., encoder and decoder architectures), evaluation
metrics, and scope expansion to prediction of relations in addition to prediction of subject
and object entities, a notable gap persists in comparison to progress in the field of Knowledge
Probing. Specifically, KBC studies have yet to comprehensively address complex relational
scenarios, such as one-to-many entity relationships and null value handling, which have been
identified as critical challenges in knowledge probing. Beyond the ISWC’22-24 KBC-LM series
of challenges, limited research has delved into these areas within the KGC domain.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Our work’s main contribution is the fusion of prompt engineering techniques to address
taskspecific challenges and efectively direct an LLM’s attention to the most relevant information
within its training data. Employing a pre-trained LLM to predict the object entity given a subject
entity and a relation entails evaluating the extent of knowledge captured by pre-trained LLMs
during their training. To achieve that, we propose a fusion of various prompt engineering
techniques. An overview of our methodology is illustrated in figure 1 and figure 2.</p>
      <p>Input: SubjectEntity,
SubjectEntityID, Relation</p>
      <p>SubjectEntity,</p>
      <p>Relation</p>
      <p>Context Retrieval
Role-play Persona Generation</p>
      <p>CSuoRbnojteelecx-ttpEulanalty++iItnyP,feoRrrsmeolaanttaiioonn
Prompt Generation</p>
      <p>Prompt</p>
      <p>Response Re-prompting Response DisamEbnitgituyation
Llama-3-8B</p>
      <p>Output:
ObjectEntityID</p>
      <sec id="sec-3-1">
        <title>3.1. Dual Prompting</title>
        <p>The task is limited to five relations: countryLandBordersCountry, personHasCityOfDeath,
seriesHasNumberOfEpisodes, awardWonBy, and companyTradesAtStockExchange. Some of
these relations present a challenge in that sometimes there are no object entities i.e. null values
may be present, such as in the following cases:
• countryLandBordersCountry: Null values are possible (e.g., an island like Iceland).
• personHasCityOfDeath: Null values are possible (e.g., the person is still alive).
• awardWonBy: There may be instances where the award was not given in certain years.
• companyTradesAtStockExchange: Null values are possible (e.g., private companies or
have delisted from the stock exchange).</p>
        <p>
          To address the challenge of null values, we developed a dual prompting approach known as
two-stage prompting. Dual prompting forms the cornerstone of our methodology, employing a
strategic two-tiered question-based prompting system that significantly enhances the LLM’s
ability to extract accurate knowledge [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. This approach efectively identifies the presence or
absence of relevant data, thereby reducing hallucinations and ensuring accurate responses. It
leverages the cognitive architecture of LLMs to focus attention sequentially, which improves their
capacity to resolve ambiguities and extract nuanced information. Dual prompting systematically
structures the interaction with the LLM, guiding it through a logical sequence of thought that
mimics human problem-solving processes. This method helps mitigate common issues, such as
the model generating overly general or irrelevant responses.
        </p>
        <p>The dual prompting method involves two stages: an initial prompt and a follow-up prompt.
The initial prompt serves as a cognitive anchor, setting the context for the query and focusing
the model’s attention on the presence or relevance of specific information. This is particularly
useful in clarifying the existence or state of an entity, a crucial step in ensuring that subsequent
inquiries are grounded in the correct context. The follow-up prompt then capitalizes on the
clarified context to extract specific details or additional layers of information. Examples of these
prompts are shown in table 6, and the prompt template can be found in table 7 and table 8. Our
prompt templates are enhanced with instructions for the LLM on approaching and answering
the question, specifying whether the answer should be yes/no, numeric, or multiple responses.</p>
        <p>For the seriesHasNumberOfEpisodes relation, we use a single prompting approach, where
the model is asked the question directly without any verification steps or follow-up prompts, as
shown in figure 7. A null check is not necessary in this case because every series inherently has
a defined number of episodes, eliminating the possibility of null values.</p>
        <p>The execution strategy for prompting in our system employs tailored approaches, categorized
as either looping or direct, depending on the nature of the relation involved. The strategies are
applied as follows:</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Looping Strategy</title>
        <p>The looping strategy involves repeatedly applying the dual prompting method to retrieve
relevant information for each instance and then aggregating the results to form the final answer.
This strategy specifically addresses the challenge of the awardWonBy relation, where many
objects are associated with a single subject that has multiple instances (e.g., 224 Physics Nobel
Prize winners over the course of multiple years). This approach simplifies the LLM’s work and
ensures accuracy by breaking the task into smaller, manageable parts. For the awardWonBy
relation, the loop starts from the first year the award was conferred - to initialise this we prompt
the LLM for said year. Upon entering the loop, for each subsequent year up to the present
(2024), the LLM is prompted to list the award recipients. The final output is a combined list of
all recipients across the years, ensuring a comprehensive and precise result. The strategy is
illustrated in figure 3.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Direct Strategy</title>
        <p>This strategy involves prompting the LLM directly to generate the object entity. We directly
apply dual prompting to generate object entities for the relations countryLandBordersCountry,
personHasCityOfDeath, and companyTradesAtStockExchange. We also use the direct strategy
for the seriesHasNumberOfEpisodes relation; we directly prompt the total number of episodes
without any dual prompting since we do not need to verify the series’ existence. For example,
we might ask, "Final answer should consist of a number. How many episodes does Game of
Thrones have?". Unlike the looping strategy, which iteratively retrieves information across
multiple instances (e.g., years for awardWonBy), the direct strategy generates the required
object entity in a single step for each relation.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Context-aware Prompting</title>
        <p>
          Our empirical findings show that providing additional context about the subject entity in the
prompt directs the LLM’s attention to relevant parts of the training data, thereby improving
their ability to retrieve accurate answers and essential information, such as the first year an
award was conferred. Obtaining this information can also help define where the loop starts for
the awardWonBy relation. Context-aware prompting involves enriching the LLM prompt with
relevant background information about the subject [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. This approach enhances the model’s
ability to generate responses that are not only accurate but also richly detailed and contextually
appropriate. Our proposed approach leverages semantic web sources like Wikidata and
nonsemantic web sources like Wikipedia to enrich LLM prompts with contextual information about
subject entities. The following types of additional data are retrieved to provide comprehensive
contextual information:
• Additional Data: specific information tailored to the subject entity within each relation
type as shown in table 1.
• Wikipedia Extract: the text-only portion of the lead section of the Wikipedia page for the
subject entity, which can provide background information about the subject entity.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Role-Play Prompting</title>
        <p>
          Central to our methodology is the implementation of persona-based prompting, a role-playing
strategy that enriches the contextual engagement of our system. Role-play prompting is a prompt
engineering technique that allows LLMs to adopt specific personas or characters, guiding their
responses to align with the expert knowledge, thereby enhancing LLMs’ ability to generate
more precise and factually correct responses, as evidenced by studies such as [
          <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
          ]. In our
approach, each relation type in our dataset is paired with a specific persona, crafted to embody
an expert in the pertinent field as shown in table 9. We create the personas in advance via
the use of an LLM (GPT-4) and further edit them manually to keep the tone consistent. These
personas are narrative devices and strategic tools designed to motivate the LLM and steer it
toward generating expert-like and contextually relevant responses. Our role-play prompts
are enhanced with instructions on how the LLM should approach and answer the question,
specifying whether the response should be yes/no, numeric, or require multiple responses. By
integrating driven personas with clear answering guidelines, we guide the LLM to respond
accurately and with an appreciation of the domain’s discourse style and depth, mirroring the
interaction one would expect from a human expert in similar scenarios.
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Re-prompting</title>
        <p>
          Re-prompting is a technique in prompt engineering where an LLM is asked the same question
again to improve the quality of its responses [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. We use this approach when the initial
output from the LLM is unsatisfactory. Re-prompting gives the LLM another opportunity to
generate more accurate and refined responses. The main advantage of re-prompting is that
it helps maintain coherence and clarity while ensuring that the LLM adheres more closely to
the desired output format. For instance, if the response needs to be in a specific format, such
as a list of names or a numerical value, we re-prompt the LLM to generate the response in
the desired format. Figures 5, 6, 7 showcase sample prompts that exemplify our methodology,
which fuses prompt engineering techniques tailored to specific relational contexts. These
ifgures demonstrate how our approach adapts to diferent scenarios, using persona-driven and
context-enriched prompts to guide the LLM responses efectively.
        </p>
      </sec>
      <sec id="sec-3-7">
        <title>3.7. Disambiguation</title>
        <p>Entity disambiguation is the final step in identifying and mapping the predicted object entities
to their correct references in Wikidata. We use a straightforward disambiguation function that
returns the Wikidata ID of an item. We clean the entity strings by removing unwanted characters
such as quotes and parentheses. Specifically, for the awardWonBy relation, we employ the Spacy
library to remove titles and extract person names accurately, ensuring precise identification and
mapping of award recipients. For the companyTradesAtStockExchange relation, we observed
that the LLM would sometimes answer with the complete stock exchange name as well as its
abbreviation. In such cases, we split the LLM response into two separate entities, one for the
full name and one for the abbreviation. We then attempt to disambiguate using the full name
and the abbreviation, depending on which matches the Wikidata entry. For example, if the
LLM outputs "New York Stock Exchange (NYSE)," we would search Wikidata for both "New
York Stock Exchange" and "NYSE" to find the correct Wikidata ID. This method enhances the
accuracy of linking the mentioned stock exchange to its oficial record in Wikidata.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. Datasets</title>
        <p>This section includes an overview and discussion of our final results as well as an in-depth
analysis of our model setup.</p>
        <p>
          The dataset used in ISWC 2024 LM-KBC Challenge [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] is constructed from Wikidata and further
processed. It comprises 5 Wikidata relation types covering awards, geography, television series,
business, and public figure information. It has 367 statements for train and 368 for validation
and test sets. The results reported are based on the validation and test set. The cardinality of
object-entities for certain relations in the dataset difers, ranging from null or 0 to an upper
bound. The minimum number of 0 or null means the subject-entities for some relations can
have no valid object-entities; for example, people still alive do not have a place of death, and an
island country does not have land-based neighboring countries.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Model Setup</title>
        <sec id="sec-4-2-1">
          <title>Model</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>Dual Prompting and Prompt</title>
        </sec>
        <sec id="sec-4-2-3">
          <title>Strategies</title>
          <p>+ chain-of-thought
+ context-aware prompting
+ re-prompting
+ role-play prompting</p>
          <p>In our work, we utilize the Llama3-8B-Instruct as the foundation of our final model setup,
which incorporates various enhancements to the baseline pipeline. The baseline pipeline is
comprised of dual prompting as well as the looping and direct strategies (as detailed in sections
3.1, 3.2, 3.3). It alone shows solid results on the validation dataset, with a macro precision of
0.662, a macro recall of 0.454, and a macro F1-score of 0.438. However, this setup results in 173
empty predictions as opposed to the expected 97, indicating the model’s weakness in producing
relevant outputs.</p>
          <p>
            Incorporating chain-of-thought prompting into our baseline setup, where the model is asked
to provide explanations for its responses, leads to a slight improvement in performance metrics,
as shown in table 2. Requesting a justification of the answer from the model when generating
its response helps reduce instances of non-responses, as the model is encouraged to elaborate
on its reasoning process [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ]. Further enhancement via context-aware prompting (section
3.4) significantly boosts the model’s performance, highlighting the efectiveness of including
contextual information in our prompts.
          </p>
          <p>The inclusion of a re-prompting mechanism (section 3.6), where the model is prompted again if
the initial response cannot be extracted, yields further improvements in the performance metrics.
This demonstrates that iterative prompting helps in refining the model’s output, ensuring higher
accuracy and fewer empty responses. A more robust extraction function could also lead to
similar results. Finally, the use of role-play prompting with the inclusion of personas (section
3.5) further enhances the model’s ability to generate informed and contextually rich responses,
guiding the model to engage with the data as if it were an expert in the relevant field.</p>
          <p>Our experiments demonstrate the eficacy of combining a variety of prompting techniques.
The integration of context-aware prompting, role-play prompting with personas, and dual
prompting, along with the re-asking strategy, leads to significant improvements in the precision,
recall, and overall F1 scores. These findings underscore the importance of providing LLMs with
rich contextual cues and structured guidance to achieve high accuracy and detailed responses
in knowledge probing.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Role-Play Persona Curation</title>
        <p>The introduction of a persona into our prompt pipeline signicfiantly enhances the model’s
ability to generate accurate and contextually relevant responses, particularly for entities it
would typically be uncertain about or would claim to lack familiarity with as shown in table 3.
We chose to examine the impact of diferent role-play instructions on the final results in order
to determine the most suitable approach to our problem.</p>
        <p>Simply using a general persona (e.g. a trivia show contestant) across all relations did not
provide an improvement in our F1 scores. However, as can be seen in the dip in empty predictions
this approach did still lead to the model attempting to generate answers to the set questions.</p>
        <p>We determined that tailoring personas to each relation type (e.g. an expert financial analyst
for the companyTradesAtStockExchange) provided a more specialized context, allowing the
model to generate answers with greater precision and relevance. The greatest benefit, however,
was observed when personas were tailored individually to each entity (e.g., the CFO of the
specific company for the companyTradesAtStockExchange). By suggesting that the model
knows everything there is to know about the specific entity, the instances of empty or unsure
responses were significantly reduced. The scores reflect the enhanced ability of the model to
provide expert-like responses, confirming the efectiveness of persona-based prompting in our
model pipeline.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Comparison between Prompt Strategies</title>
        <p>Choosing the right prompting strategy can significantly impact the final results, as shown in
table 4. We initially found success using the looping strategy for the awardsWonBy relation,
where the model was asked to identify the award winner for each year before aggregating
the results. Encouraged by this improvement, we applied the same looping strategy to the
seriesHasNumberOfEpisodes relation. We expected that this strategy would lead to more
accurate results due to the LLM likely having encountered more information about the number
of episodes per season rather than the total number of episodes. Additionally, when manually
prompting the LLM for the total number of episodes, we observed that the answers would often
contain the total number of episodes per season first, followed by an often incorrect summation,
awardWonBy
seriesHasNumberOfEpisodes</p>
        <sec id="sec-4-4-1">
          <title>Direct Strategy</title>
        </sec>
        <sec id="sec-4-4-2">
          <title>Looping Strategy with</title>
        </sec>
        <sec id="sec-4-4-3">
          <title>Dual Prompting</title>
        </sec>
        <sec id="sec-4-4-4">
          <title>Looping Strategy with</title>
        </sec>
        <sec id="sec-4-4-5">
          <title>Dual Prompting</title>
        </sec>
        <sec id="sec-4-4-6">
          <title>Looping Strategy without</title>
        </sec>
        <sec id="sec-4-4-7">
          <title>Dual Prompting</title>
        </sec>
        <sec id="sec-4-4-8">
          <title>Direct Strategy</title>
          <p>0.259
0.412</p>
        </sec>
        <sec id="sec-4-4-9">
          <title>Micro</title>
          <p>F1
0.009
which would be provided as the final answer. Our intuition followed that eliciting the number
of episodes per season one by one and then summing up said results manually as part of the
disambiguation would be more efective than directly asking for the total number of episodes.</p>
          <p>However, as shown in table 4, the results did not align with our expectations. The direct
strategy, which involves simply asking for the total number of episodes, significantly outperformed
both looping strategies.</p>
          <p>The poor performance of the looping strategies can be attributed to the additional complexity
of our pipeline. As each season is processed individually before summing up all answers in
the disambiguation function, there is an increased risk of errors due to the multiple cases in
which the LLM can potentially hallucinate. In contrast, the direct strategy has a single point of
error. By utilizing a single prompt, we reduce the likelihood of numerical hallucinations and
misinterpretations.</p>
          <p>Nevertheless, the results from the direct strategy are far from perfect, suggesting that we need
to devise a better way of extracting numerical information. One potential approach would be to
consolidate both strategies. By combining the simplicity of the direct strategy with the detailed
step-by-step verification of the looping strategies, we might achieve a more accurate and reliable
method for handling numerical data. This hybrid approach could mitigate the weaknesses of
each strategy, leading to improved performance in extracting numerical information from LLMs.</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Discussion of Final Results</title>
        <p>Our final pipeline demonstrates significant improvements across multiple evaluation metrics
on the test and validation datasets. Table 5 summarizes the macro average precision, recall, and
F1-score for each relation, along with the overall averages. The results are split into two sets of
columns: the first three columns represent the test set results, and the latter three represent the
validation set results. Additionally, for zero-object cases, our pipeline achieves a precision of
0.757, recall of 0.791, and F1-score of 0.773.
awardWonBy
companyTradesAtStockExchange
countryLandBordersCountry
personHasCityOfDeath
seriesHasNumberOfEpisodes
Average
0.476
0.675
0.978
0.890
0.480
0.729</p>
        <sec id="sec-4-5-1">
          <title>Test</title>
        </sec>
        <sec id="sec-4-5-2">
          <title>Macro</title>
          <p>Recall
0.399
0.585
0.890
0.800
0.440
0.653</p>
          <p>In the case of relations where null values are possible, such as countryLandBordersCountry,
personHasCityOfDeath, and companyTradesAtStockExchange, the pipeline performs
exceptionally well. The dual prompting method we employ ensures that instances where null values
are present are correctly identified. This is also reflected in the high accuracy, precision, and F1
scores achieved for zero-object cases.</p>
          <p>Despite the complexity of dealing with numeric objects, our pipeline achieves a moderate
performance in seriesHasNumberOfEpisodes. The macro F1-scores of 0.440 on both test
and validation sets indicate that while the model can handle numeric data, there is room for
improvement. This suggests that additional refinements may be needed for relations involving
numeric outputs.</p>
          <p>Finally, the pipeline’s performance on the awardWonBy relation, which involves many
objects per subject with multiple instances, was the lowest. On the test set, the model achieves
a satisfactory performance given the complexity of the task. Despite the efectiveness of the
looping strategy on the test set, the significant drop in recall and F1-score on the validation set
indicates challenges in maintaining consistency across diferent datasets. This can be explained
by diferences in the examples between the test and validation sets. Certain awards in the
validation set such as "honorary doctor of Stockholm University" have proven to be a significant
challenge to our pipeline however equivalent examples to them are not present in the test
set. Moreover, the low-performance metrics on the awardWonBy relation are likely influenced
by a small sample size. With such a limited number of examples, namely 10 per both test
and validation split, the evaluation may not adequately reflect the model’s true performance
capabilities. More examples are needed to provide an accurate assessment and to ensure that
the looping strategy and other techniques are efectively enhancing the model’s performance
for this relation.</p>
          <p>We also noticed during our analysis that Wikidata does not always accurately reflect the
ground truth, which poses a significant challenge for evaluating model performance. For
instance, for the subject entity "Grammy Award For Best Rock Album," our model generates the
names of the artists who have won the award. However, upon examining the expected answers,
we found that sometimes the object entities reflect the name of the winning album, and other
times they reflect the names of the artists. This inconsistency within the expected answers can
lead to lower recall and F1 scores even when the model performs well at the given task.</p>
          <p>
            The quality issue of Wikidata has been highlighted in previous works [
            <xref ref-type="bibr" rid="ref27 ref28">27, 28</xref>
            ] and remains a
significant challenge.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Future Work</title>
      <p>
        In the future, we plan to explore several enhancements to improve our methodology further.
One promising direction is the use of cloze-style prompting in our pipeline. Empirical evidence
has shown that cloze-style prompts, which frame the query as a fill-in-the-blank task, can lead
to great results on this task. Another area of focus will be the introduction of few-shot examples.
Our current results indicate that the LLM does not always format answers correctly, leading to
inaccuracies. By providing a few-shot learning setup, where the model is given several examples
of correctly formatted answers, we hope to enhance its ability to generate responses that adhere
to the desired format consistently. Additionally, we plan to experiment with diferent model
architectures. Our current work utilizes a decoder-only model, but in future research, we aim
to explore alternatives such as encoder-decoder models. These diferent architectures may
ofer unique advantages in processing complex relational tasks and generating more accurate,
contextually appropriate responses. Finally, handling numerical information more efectively is
crucial, as shown by the moderate performance in the seriesHasNumberOfEpisodes relation. To
that end, we intend to investigate various techniques for processing numeric data, inspired by
research on the numerical reasoning capabilities in language models [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this work, we introduce a comprehensive approach that fuses various prompt engineering
techniques to address KBC from pre-trained language models. Our primary contribution is
using dual prompting and looping strategies to handle relations with possible null answers and
more complex one-to-many relations efectively. Dual prompting systematically guides the
model through a two-step questioning process, enhancing its ability to resolve ambiguities and
leading to a reduction in hallucinations. The looping strategy, on the other hand, is designed to
manage complex relations by breaking down tasks into smaller, manageable parts, ensuring
accurate extraction of information across multiple instances.</p>
      <p>In the context of the ISWC’24 KBC-LLM challenge, our methods lead to a significant
improvement over the set baseline, achieving 0.653 and 0.629 Macro-F1 scores on the test and validation
datasets respectively. The combination of context-aware prompting, role-play personas, and
re-prompting mechanisms further refines the model’s outputs, reducing hallucinations and
enhancing accuracy. These results underscore the potential of advanced prompt engineering
techniques in leveraging pre-trained LLMs for efective knowledge extraction and emphasize
the importance of tailored strategies to address specific relational challenges.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Appendix A: Tables</title>
      <p>countryLandBordersCountry
personHasCityOfDeath</p>
      <p>awardWonBy
companyTradesAtStockExchange</p>
      <sec id="sec-7-1">
        <title>Final answer should be yes or no. Final answer should consist of coun</title>
      </sec>
      <sec id="sec-7-2">
        <title>Does {subject_entity} share any try name(s). Which countries share land borders? a land border with {subject_entity}?</title>
      </sec>
      <sec id="sec-7-3">
        <title>Final answer should be yes or no. Final answer should consist of a</title>
      </sec>
      <sec id="sec-7-4">
        <title>Has {subject_entity} died? city name. In which city did {subject_entity} die?</title>
      </sec>
      <sec id="sec-7-5">
        <title>Final answer should be yes or no. Final answer should consist of</title>
      </sec>
      <sec id="sec-7-6">
        <title>Was the {subject_entity} awarded? name(s). Who was the {subject_entity} awarded to in year {x}?</title>
      </sec>
      <sec id="sec-7-7">
        <title>Final answer should be yes or no. Final answer should consist of</title>
      </sec>
      <sec id="sec-7-8">
        <title>Is the {subject_entity} listed on the name(s) of stock exchange(s). stock exchange? Where do shares of {subject_entity} trade?</title>
        <p>countryLandBordersCountry
You are an enthusiastic and knowledgeable resident of {subject_entity},
deeply in love with your homeland, and always eager to share your
wealth of knowledge about it. You take pride in your country’s history,
culture, geography, and the intricate details of its borders. Your passion
for {subject_entity} shines through in every conversation, and you have
a talent for explaining why it’s such a wonderful place to live. You’re
always ready to provide detailed and accurate information about the
country’s neighboring countries and land borders, aiming to convince
your friends and anyone who listens that {subject_entity} is the best
place to call home. You vividly describe the landscapes, cities, and
unique features of the country, highlighting its connections and
relationships with its neighbors. Your love for {subject_entity} is infectious,
and you hope to inspire others to appreciate it as much as you do.
personHasCityOfDeath
seriesHasNumberOfEpisodes
awardWonBy
You are an ardent admirer and devoted follower of {subject_entity},
whose life and achievements have profoundly influenced your own.
Your admiration for {subject_entity} has led you to meticulously study
every aspect of their biography, ensuring you stay updated with the
most accurate and detailed information about their life. You know
their story inside and out, from their early beginnings to their current
status. Your dedication to {subject_entity} means you are well-versed
in the significant events of their life, including the critical details of
where they lived, worked, and if applicable, where they passed away.
When someone inquires about {subject_entity}, you are not only eager
but also exceptionally qualified to provide precise and comprehensive
answers. Your deep respect and admiration drive you to share their
legacy accurately, ensuring that others understand the importance and
impact of {subject_entity} in the correct context.</p>
        <p>You are the biggest fan of {subject_entity}. You have watched every
episode multiple times and know all the details about the show’s seasons
and episodes. Your love for the show drives you to stay updated with
every bit of information, and you take pride in your deep knowledge of it.
Your friends and family always come to you when they have questions
about {subject_entity} because they know you have the answers. When
it comes to the number of episodes per season and in total, you can recall
this information efortlessly and accurately. You really want to have
your close ones also watch the show. You believe that if you answer
your friends and family’s questions correctly, they will start watching
the show with you. Use your passion and expertise to provide detailed
and precise answers about the show to your friends.</p>
        <p>You are an aspiring recipient of the prestigious {subject_entity}, someone
who has dedicated years to studying its history, past winners, and the
significance of each accolade. Your passion for this award is unparalleled,
and you know the details of its ceremonies, the recipients, and the
criteria for winning by heart. Your knowledge is not just academic
but deeply personal, as each fact and figure represents a step closer to
your own dream. This drives you to provide accurate, thorough, and
insightful answers regarding the award and its winners in all years since
its inception.
companyTradesAtStockExchange
You are the Chief Financial Oficer (CFO) of {subject_entity}, a key
executive responsible for managing the company’s financial actions. You
possess an in-depth understanding of all financial matters related to
the company, including detailed knowledge about the company’s stock,
listing status, and financial performance. As the CFO, you are dedicated
to truthfulness and transparency, ensuring that all stakeholders,
including potential investors, have accurate and comprehensive information.
You are highly knowledgeable about the stock exchange, regulatory
requirements, and the company’s financial strategy. Your responses
are characterized by precision, clarity, and reliability, as you aim to
foster trust and confidence in {subject_entity}’s financial health and
investment potential.
countryLandBordersCountry
personHasCityOfDeath</p>
        <p>awardWonBy
companyTradesAtStockExchange
I’d love to know more about the ge- I’d love to know more about the
geography from someone who truly ography from someone who truly
understands and loves this coun- understands and loves this country.
try. Does {subject_entity} share any Which countries share a land
borland borders? I’ve been thinking of der with {subject_entity}? I’ve been
moving here! Final answer should thinking of moving here! Final
anbe yes or no. swer should be a list of countries.
Your admiration for {subject_entity} Your admiration for {subject_entity}
has undoubtedly led you to study has undoubtedly led you to study
their life extensively. Has {sub- their life extensively. In which city
ject_entity} died? I wish to learn did {subject_entity} die? I wish to
more about such an influential fig- learn more about such an
influenure. Final answer should be yes or tial figure. Final answer should be
no. the city name.</p>
      </sec>
      <sec id="sec-7-9">
        <title>As someone who has always</title>
        <p>dreamed of winning the
{subject_entity}, you are well-versed in
its history. Was the {subject_entity}
awarded? Final answer should be
yes or no.</p>
      </sec>
      <sec id="sec-7-10">
        <title>As the CFO of {subject_entity} you</title>
        <p>must be aware of all pertinent
details regarding the company’s stock.</p>
      </sec>
      <sec id="sec-7-11">
        <title>Is the {subject_entity} listed on the</title>
        <p>stock exchange? As a potential
investor, I would like to have more
details before making a decision to
invest. Final answer should be yes
or no</p>
      </sec>
      <sec id="sec-7-12">
        <title>Final answer should consist of name(s). Who was the {subject_entity} awarded to in year {x}?</title>
      </sec>
      <sec id="sec-7-13">
        <title>As the CFO of {subject_entity} you</title>
        <p>must be aware of all pertinent
details regarding the company’s stock.</p>
      </sec>
      <sec id="sec-7-14">
        <title>Where do shares of {subject_entity} trade? Final answer should consist of name(s) of stock exchange(s).</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>B. Appendix B: Figures</title>
      <p>No
x &lt;= current year ?</p>
      <p>Yes
Dual Prompt for year x :
Contextual Information + Role-play
Persona + Initial Prompt +
Follow-up Prompt(s)</p>
      <p>Llama-3-8B</p>
      <p>[ Award Recipient(s) for year xi]
[ All Award Recipient(s) ]
x = x + 1
Llama-3-8B</p>
      <p>Response
If ‘Response’ not in
desired format</p>
      <p>No</p>
      <p>Yes
Re-prompting:
Output: ObjectEntityID
from the Llama-3-8B LLM. The process begins with an initial prompt to the LLM, followed by an
evaluation of the response. If the response is not in the desired format, the model undergoes
reprompting to adjust and improve the output. The process ends with entity disambiguation, ultimately
producing the final output as the ObjectEntityID.</p>
      <p>companyTradesAtStockExchange
companyTradesAtStockExchange
System Message:
PERSONA:
You are the Chief Financial Officer (CFO) of {subject_entity}, a
key executive responsible for managing the company's financial
actions…
INSTRUCTION: If the question can be answered simply with a
yes or no - response should only be [Yes] or [No] …
User Message:
ADDITIONAL INFORMATION: The legal form of the company
{subject_entity} is {info}. Here is an extract from the Wikipedia
page for {subject_entity}: {info}
INITIAL PROMPT: As the CFO of {subject_entity} you must be
aware of all pertinent details regarding the company's stock. Is
the {subject_entity} listed on the stock exchange? As a potential
investor, I would like to have more details before making a
decision to invest. Final answer should be yes or no.</p>
      <p>System Message:
PERSONA:
You are the Chief Financial Officer (CFO) of {subject_entity}, a
key executive responsible for managing the company's financial
actions…
INSTRUCTION: If the question can be answered simply with a
yes or no - response should only be [Yes] or [No] …
User Message:
ADDITIONAL INFORMATION: The legal form of the company
{subject_entity} is {info}. Here is an extract from the Wikipedia
page for {subject_entity}: {info}
FOLLOW-UP PROMPT: As the CFO of {subject_entity} you must
be aware of all pertinent details regarding the company's stock.</p>
      <p>Where do shares of {subject_entity} trade? Final answer should
consist of name(s) of stock exchange(s).
(a)
(b)
the dual prompting strategy applied to elicit detailed responses from the persona of a Chief Financial</p>
      <sec id="sec-8-1">
        <title>Oficer (CFO) concerning a company’s stock exchange details. Figure (a) shows the initial prompt asking a yes/no response about the company’s stock listing, followed by a follow-up prompt requesting specifics about the stock exchange locations shown in figure (b). This dual prompting approach is similarly employed for the ’personHasCityOfDeath’ and ’countryLandBordersCountry’ relations.</title>
        <p>PERSONA:
You are an aspiring recipient of the
prestigious {subject_entity}, someone who
has dedicated years to studying its history,
past winners, and the significance of each
accolade…
INSTRUCTION: If the question can be
answered simply with a yes or no
response should only be [Yes] or [No] …
PERSONA:
You are an aspiring recipient of the
prestigious {subject_entity}, someone who
has dedicated years to studying its history,
past winners, and the significance of each
accolade…
INSTRUCTION: If the question can be
answered simply with a yes or no
response should only be [Yes] or [No] …
System Message:
PROMPT: As the biggest fan of {subject_entity} you know
everything there is to know about the show. How many episodes
does {subject_entity} have? Final answer should be a number …</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakhtin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <article-title>Language models as knowledge bases?</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>01066</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Safavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Koutra</surname>
          </string-name>
          ,
          <article-title>Relational world knowledge representation in contextual language models: A review</article-title>
          ,
          <source>arXiv preprint arXiv:2104.05837</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Fichtel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Kalo</surname>
          </string-name>
          , W.-T. Balke,
          <article-title>Prompt tuning or fine-tuning-investigating relational knowledge in pre-trained language models</article-title>
          ,
          <source>in: 3rd Conference on Automated Knowledge Base Construction</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Yildirim</surname>
          </string-name>
          , L. Paul,
          <article-title>From task structures to world models: what do llms know?, Trends in Cognitive Sciences (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <article-title>How context afects language models' factual predictions</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>04611</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Soh, Extract, define, canonicalize
          <article-title>: An llm-based framework for knowledge graph construction</article-title>
          ,
          <source>arXiv preprint arXiv:2404.03868</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <article-title>Do plms know and understand ontological knowledge?</article-title>
          ,
          <source>arXiv preprint arXiv:2309.05936</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yasunaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Ioannidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Subbian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          , Stark:
          <article-title>Benchmarking llm retrieval on textual and relational knowledge bases</article-title>
          ,
          <source>arXiv preprint arXiv:2404.13207</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Teo</surname>
          </string-name>
          , S.-w. Lin,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Llms for relational reasoning: How far are we?</article-title>
          ,
          <source>arXiv preprint arXiv:2401.09042</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Alivanistos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Santamaría</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Kalo</surname>
          </string-name>
          , E. van Krieken,
          <string-name>
            <given-names>T.</given-names>
            <surname>Thanapalasingam</surname>
          </string-name>
          ,
          <article-title>Prompting as probing: Using language models for knowledge base construction</article-title>
          ,
          <source>arXiv preprint arXiv:2208.11057</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. F.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Araki</surname>
          </string-name>
          , G. Neubig,
          <article-title>How can we know what language models know?, Transactions of the Association for Computational Linguistics 8 (</article-title>
          <year>2020</year>
          )
          <fpage>423</fpage>
          -
          <lpage>438</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Razeghi</surname>
          </string-name>
          , R. L.
          <string-name>
            <surname>Logan</surname>
            <given-names>IV</given-names>
          </string-name>
          , E. Wallace,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>Autoprompt:</surname>
          </string-name>
          <article-title>Eliciting knowledge from language models with automatically generated prompts</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>15980</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Factual probing is [mask]:
          <article-title>Learning vs. learning to recall</article-title>
          ,
          <source>arXiv preprint arXiv:2104.05240</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Knowledge base completion using embeddings and rules</article-title>
          , in: Twenty-fourth
          <source>international joint conference on artificial intelligence</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D. Q.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <article-title>An overview of embedding models of entities and relationships for knowledge base completion</article-title>
          ,
          <source>arXiv preprint arXiv:1703.08098</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Chen,
          <article-title>Making large language models perform better in knowledge graph completion</article-title>
          ,
          <source>arXiv preprint arXiv:2310.06671</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <article-title>Exploring large language models for knowledge graph completion</article-title>
          ,
          <source>arXiv preprint arXiv:2308.13916</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B.</given-names>
            <surname>Veseli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singhania</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          , G. Weikum,
          <article-title>Evaluating language models for knowledge base completion</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>227</fpage>
          -
          <lpage>243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wei</surname>
          </string-name>
          , J. Liu, Simkgc:
          <article-title>Simple contrastive knowledge graph completion with pre-trained language models</article-title>
          ,
          <source>arXiv preprint arXiv:2203.02167</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xiao</surname>
          </string-name>
          , J. Ma, J. Zhang,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Bai,
          <article-title>Dual-prompting interaction with entity representation enhancement for event argument extraction</article-title>
          , in: F. Liu,
          <string-name>
            <given-names>N.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          Hong (Eds.),
          <source>Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC</source>
          <year>2023</year>
          , Foshan, China,
          <source>October 12-15</source>
          ,
          <year>2023</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume
          <volume>14303</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2023</year>
          , pp.
          <fpage>161</fpage>
          -
          <lpage>172</lpage>
          . URL: https: //doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -44696-2_
          <fpage>13</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -44696-2\_
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , G. Huang,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          , Denseclip:
          <article-title>Languageguided dense prediction with context-aware prompting</article-title>
          ,
          <source>in: IEEE/CVF Conference on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2022</year>
          ,
          <article-title>New Orleans</article-title>
          , LA, USA, June 18-24,
          <year>2022</year>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>18061</fpage>
          -
          <lpage>18070</lpage>
          . URL: https://doi.org/10.1109/CVPR52688.
          <year>2022</year>
          .
          <volume>01755</volume>
          . doi:
          <volume>10</volume>
          .1109/CVPR52688.
          <year>2022</year>
          .
          <volume>01755</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Better zero-shot reasoning with role-play prompting</article-title>
          ,
          <source>CoRR abs/2308</source>
          .07702 (
          <year>2023</year>
          ). URL: https://doi.org/10.48550/arXiv. 2308.07702. doi:
          <volume>10</volume>
          .48550/ARXIV.2308.07702. arXiv:
          <volume>2308</volume>
          .
          <fpage>07702</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>N.</given-names>
            <surname>Fathallah</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. De Giorgis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Poltronieri</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Haase</surname>
          </string-name>
          , L. Kovriguina,
          <article-title>Neon-gpt: A large language model-powered pipeline for ontology learning</article-title>
          ,
          <source>in: The Extended Semantic Web Conference</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Raman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rosen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Idrees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Paulius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tellex</surname>
          </string-name>
          ,
          <article-title>Planning with large language models via corrective re-prompting</article-title>
          ,
          <source>CoRR abs/2211</source>
          .09935 (
          <year>2022</year>
          ). URL: https://doi.org/10. 48550/arXiv.2211.09935. doi:
          <volume>10</volume>
          .48550/ARXIV.2211.09935. arXiv:
          <volume>2211</volume>
          .
          <fpage>09935</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>J.-C. Kalo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Razniewski</surname>
            ,
            <given-names>T.-P.</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>B. Zhang,</given-names>
          </string-name>
          <article-title>Knowledge base construction from pre-trained language models 2022, in: Semantic Web Challenge on Knowledge Base Construction from Pre-trained Language Models, CEUR-</article-title>
          <string-name>
            <surname>WS</surname>
          </string-name>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ichter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Chain-of-thought prompting elicits reasoning in large language models</article-title>
          , in: S. Koyejo,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Belgrave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Oh (Eds.),
          <source>Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems</source>
          <year>2022</year>
          , NeurIPS
          <year>2022</year>
          , New Orleans, LA, USA, November 28 - December 9,
          <year>2022</year>
          ,
          <year>2022</year>
          . URL: http://papers.nips.cc/paper_files/paper/2022/hash/ 9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>A.</given-names>
            <surname>Piscopo</surname>
          </string-name>
          , E. Simperl,
          <article-title>What we talk about when we talk about wikidata quality: a literature survey</article-title>
          ,
          <source>in: Proceedings of the 15th International Symposium on Open Collaboration</source>
          , OpenSym '19,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          . URL: https://doi.org/10.1145/3306446.3340822. doi:
          <volume>10</volume>
          .1145/3306446.3340822.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , I. Reklos,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Meroño-Peñuela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Simperl</surname>
          </string-name>
          ,
          <article-title>Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata</article-title>
          ,
          <source>CoRR abs/2309</source>
          .08491 (
          <year>2023</year>
          ). URL: https://doi.org/10.48550/arXiv.2309.08491. doi:
          <volume>10</volume>
          .48550/ arXiv.2309.08491. arXiv:
          <volume>2309</volume>
          .
          <fpage>08491</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>M.</given-names>
            <surname>Akhtar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shankarampeta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cocarascu</surname>
          </string-name>
          , E. Simperl,
          <article-title>Exploring the numerical reasoning capabilities of language models: A comprehensive analysis on tabular data</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Singapore,
          <year>2023</year>
          , pp.
          <fpage>15391</fpage>
          -
          <lpage>15405</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .findings-emnlp.
          <volume>1028</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .findings-emnlp.
          <volume>1028</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>