<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jack Boylan</string-name>
          <email>jackboylan@quantexa.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shashank Mangla</string-name>
          <email>shashankmangla@quantexa.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dominic Thorn</string-name>
          <email>dominicthorn@quantexa.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Demian Gholipour Ghalandari</string-name>
          <email>demiangholipour@quantexa.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Parsa Ghafari</string-name>
          <email>parsaghaffari@quantexa.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chris Hokamp</string-name>
          <email>chrishokamp@quantexa.com</email>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Text2KG, Knowledge Graph Evaluation, Knowledge Graph Completion, Large Language Models,</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>This study explores the use of Large Language Models (LLMs) for automatic evaluation of knowledge graph (KG) completion models. Historically, validating information in KGs has been a challenging task, requiring large-scale human annotation at prohibitive cost. With the emergence of general-purpose generative AI and LLMs, it is now plausible that human-in-the-loop validation could be replaced by a generative agent. We introduce a framework for consistency and validation when using generative models to validate knowledge graphs. Our framework is based upon recent open-source developments for structural and semantic validation of LLM outputs, and upon flexible approaches to fact checking and verification, supported by the capacity to reference external knowledge sources of any kind. The design is easy to adapt and extend, and can be used to verify any kind of graph-structured data through a combination of model-intrinsic knowledge, user-supplied context, and agents capable of external knowledge retrieval.</p>
      </abstract>
      <kwd-group>
        <kwd>Evaluating KG completion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Knowledge Graphs (KGs) are flexible data structures used to represent structured information
about the world in diverse settings, including general knowledge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], medical domain models
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], words and lexical semantics [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and semantics [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Most KGs are incomplete [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], in the
sense that there is relevant in-domain information that the graph does not contain. Motivated
by this incompleteness, knowledge graph completion research studies methods for augmenting
      </p>
      <sec id="sec-2-1">
        <title>KGs by predicting missing links [6].</title>
        <p>
          External data
Web
unknown triples, leading to significant time and cost implications. Eforts to improve the
eficiency of human-driven KG evaluation include strategies like cluster sampling, which aims
to reduce costs by modeling annotation eforts more economically [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. An illustration of these
evaluation paradigms is shown in Figure 2.
        </p>
        <p>KGValidator Framework: Motivated by these challenges, we introduce KGValidator as a
lfexible framework to evaluate KG Completion using LLMs. At its core, this framework validates
the triples that make up a KG using context. This context can be the inherent knowledge of
the LLM itself, a collection of text documents provided by the user, or an external knowledge
source such as Wikidata or an Internet search (refer to Figure 1 for a high-level overview).
Importantly, our framework does not require any gold references, which are often only available
for popular benchmark datasets. This enables evaluation of a wider range of KGs using the
same framework.</p>
        <p>KGValidator makes use of the Instructor1 library, Pydantic2 classes, and function calling to
control the generation of validation information. This ensures that the LLM follows the correct
guidelines when evaluating properties, and outputs the correct data structures for calculating
evaluation metrics. Our main contributions are:
• A simple and extensible framework based on open-source libraries that can be used to
validate KGs with the use of LLMs3.
• An evaluation of our framework against popular KG completion benchmark datasets to
measure its efectiveness as a KG validator.
1https://github.com/jxnl/instructor
2https://docs.pydantic.dev/
3Unfortunately, IP restrictions currently prevent us from sharing our implementation, but we are happy to directly
correspond with interested researchers who wish to reproduce our results
• An investigation of the impact of providing additional context to SoTA LLMs in order to
augment evaluation capabilities.
• A straightforward protocol for implementing new validators using any KG alongside any
set of knowledge sources.</p>
        <p>Test Set</p>
        <p>External
annotator
/ LLM</p>
        <p>Incomplete triple
(James Joyce, author of, ?)</p>
        <p>KGC model
predictions</p>
        <p>Ulysses
Moby Dick
...</p>
        <p>Finnegans Wake
Dubliners
Eveline</p>
        <p>The rest of the paper is structured as follows: Section 2 discusses key related work, Section 3
covers our approach in detail, Section 4 presents several experiments designed to validate the
framework, and Section 5 discusses results and possible extensions to this work.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Background</title>
      <sec id="sec-3-1">
        <title>2.1. Knowledge Graph Construction</title>
        <p>
          Knowledge Graphs can be represented as multi-relational directed property graphs [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], where
nodes represent entities (for a general definition of entity), and edges are predicates or relations.
Any KG can thus be rendered as a list of triples ( ,   ,  )4, also called statements5.
        </p>
        <p>
          An early line of work on knowledge graph construction focused on the TAC 2010 Knowledge
Base Population (KBP) shared task [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], which introduced a popular evaluation setting that
separates knowledge base population into Entity Linking and Slot Filling subtasks. Early
methods to address these tasks used pattern learning, distant supervision and hand-coded rules
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>
          Knowledge Graph Completion (KGC) is a KG construction task that has gained popularity
recently. It involves predicting missing links in incomplete knowledge graphs [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The
subtasks include triple classification , where models assess the validity of (head, relation, tail)
triples; link prediction, which proposes subjects or objects for incomplete triples; and relation
4several standards and formats exist for representing triples and optionally including additional metadata, including
RDF, Turtle, N-triples, JSON-LD, and others.
5https://www.wikidata.org/wiki/Help:Statements
prediction [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], identifying relationships between subject and object pairs. Models for these
tasks are frequently benchmarked against subsets of well-established knowledge bases such as
WordNet [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], Freebase [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], and domain-specific KGs like UMLS [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
        <p>
          Evaluation methodologies for KG completion primarily utilize ranking-based metrics. These
include Mean Rank (MR), Mean Reciprocal Rank (MRR), and Hits@K, which gauge a model’s
ability to prioritize correct triples over incorrect ones, ofering a quantifiable measure of
performance [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          Outside these tightly defined tracks, various approaches have been proposed to construct or
populate knowledge graphs. For example, NELL (Never-Ending Language Learner) [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] is a
self-supervised system that was designed to interact with the internet over years to populate a
growing knowledge base of topical categories and factual statements.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. LLMs and Knowledge Graphs</title>
        <p>
          Studies have shown that pretrained language models (PLMs) possess factual and relational
knowledge which makes them efective at downstream knowledge-intensive tasks such as open
question-answering, fact verification, and information extraction [
          <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
          ]. KG-BERT [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] uses
PLMs for KG completion by fine-tuning BERT on all KG completion sub tasks, treating the
problem as a sequence classification task.
        </p>
        <p>
          Pretrain-KG [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] introduces a framework that enriches knowledge graph embedding (KGE)
models with PLM knowledge during training, which proves to be particularly useful for
lowresource scenarios of link prediction and triple classification.
        </p>
        <p>
          Knowledge Graph Construction Using Generative AI With the proliferation of
generalpurpose LLMs [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], open information extraction (OpenIE) has become one of the most popular
industry applications of generative AI [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. OpenIE is closely related to knowledge graph
construction, and so LLMs have naturally been applied to KG completion tasks such as link
prediction and triple classification, proving to be successful in both fine-tuned [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and zero-shot
settings[
          <xref ref-type="bibr" rid="ref27 ref28">27, 28</xref>
          ]. The dominant paradigm is to include the desired schema of the output in the
user prompt along with the input itself (refer to Figure 3).
        </p>
        <p>
          Khorashadizadeh et al. demonstrate the capabilities of GPT 3.5 in the task of KG construction
using an in-context learning approach [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. Emphasis is placed on the importance of good
prompt design under this setting. LLM2KB [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] fine-tunes open-source LLMs to predict tail
entities given a head entity and relation, incorporating context retrieval from Wikipedia to
enhance the relevance and accuracy of the predicted entities. Zhu et al. [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] investigate GPT-4’s
[
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] capabilities for diferent steps of knowledge graph construction. They show that while
GPT-4 exhibits modest performance on few-shot information extraction tasks, it excels as an
inference assistant due to it’s strong reasoning capabilities. Their experiments also show that
GPT-4 generalizes well to new knowledge by creating a virtual knowledge extraction task.
        </p>
        <p>
          Complementing these advancements, resources such as the Text2KG Benchmark [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] ofer
valuable tools for researchers to develop and test LLM-backed KG completion models. This
benchmark, specifically designed for evaluating knowledge graph generation from unstructured
text using guideline ontologies, marks a significant step towards standardizing and accelerating
research in this field.
        </p>
        <p>User prompt
Extract all named entities and any relevant
properties about them in the following text:
Tesla, Inc. announced on March 15, 2024,
that its new electric vehicle model, the
Tesla Cybertruck, will be launched in the
United States in July 2024. Elon Musk,
CEO of Tesla, stated that the Cybertruck
represents a significant advancement in
electric vehicle technology.</p>
        <p>OpenIE
System</p>
        <p>
          A comprehensive survey on the unification of LLMs and KGs [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ] highlights the emergence
of KG-enhanced LLMs, LLM-augmented KGs, and Synergized LLMs and KGs. Validation and
evaluation of KGs with LLMs has been less explored, but is also a promising and important
avenue for research.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Structuring and Validating Language Model Output</title>
        <p>
          Constraining language models to produce outputs that conform to specific schemas is
challenging but essential for applications like natural language to SQL (NL2SQL) [
          <xref ref-type="bibr" rid="ref35 ref36">35, 36</xref>
          ]. Recent
developments include tools like Guidance6, Outlines7, JSONFormer8, and Guardrails9, which
facilitate constrained decoding of structured outputs from large language models (LLMs).
Additionally, semantic validation techniques like those enabled by the Instructor library use Pydantic
classes to ensure outputs meet both structural and semantic accuracy. This advancement is
crucial for tasks such as knowledge graph (KG) completion, where precision in data parsing
significantly enhances model utility [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>2.4. Knowledge-Grounded LLMs</title>
        <p>
          The tendency of LLMs to hallucinate poses a significant challenge in their application to
downstream tasks [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ]. Retrieval-Augmented Generation (RAG) mitigates this by grounding
LLM responses in verified information, significantly enhancing accuracy and reliability [
          <xref ref-type="bibr" rid="ref38 ref39">38,
39</xref>
          ]. RAG integrates a retrieval component that leverages external knowledge during the
generation process, improving performance across various natural language processing tasks
[
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]. Additionally, role-playing approaches using LLMs have been developed to create detailed,
6https://github.com/guidance-ai/guidance
7https://github.com/outlines-dev/outlines
8https://github.com/1rgs/jsonformer
9https://github.com/guardrails-ai/guardrails
organized content similar to Wikipedia articles, drawing on trusted sources for factual grounding
[
          <xref ref-type="bibr" rid="ref41">41</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>2.5. Knowledge Graph Evaluation</title>
        <p>
          Evaluating automatically constructed knowledge graphs is challenging. Huaman et al. present
a comprehensive evaluation of state-of-the-art validation frameworks, tools, and methods for
KGs [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ]. They highlight the challenges in validating KG assertions against real-world facts
and the need for scalable, eficient, and efective semi-automatic validation approaches. Gao
et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] have highlighted the trade-ofs between human annotation cost and meaningful
estimates of accuracy. As discussed above, a common flaw reported in existing KG evaluation
frameworks is use of a closed-world assumption. Specifically, this means treating unknown
predicted triples as false [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ]. Sun et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] find that several recent KG completion techniques
have reported significantly higher performance compared to earlier SoTA methods, in some
cases due to the inappropriate evaluation protocols used. Cao et al. [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ] suggest that triple
classification evaluation under the closed-world assumption leads to trivial results. Additionally,
Cao et al. note that current models lack the capacity to distinguish false triples from unknown
triples. Yang et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] confirm the existing gap between closed and open world settings in the
performance of KG completion models.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Approach</title>
      <p>We assume the existence of a triple-extractor model, which produces a stream of candidate
statements from unstructured data feeds. The triple-extractor model could be implemented by a
KG completion model, one or more LLMs with well-designed prompts, or by a more traditional
information extraction pipeline consisting of several distinct models that perform parsing,
named entity recognition, relationship classification, and other relevant sub-tasks. For each
predicted triple from the stream, we wish to validate whether it is correct in the presence of
context. Once a statement has been validated, it can be written into a knowledge graph or
another data store, and statements that do not pass validation can be flagged for further review.
A high-level overview of the validation stage is illustrated in Figure 4.</p>
      <p>{
}</p>
      <p>Unvalidated triple
In this work we use existing standard KGC datasets for our experiments, so in practice the
candidate triples in this work are produced by streaming through existing datasets (see Section
4).</p>
      <sec id="sec-4-1">
        <title>Possible sources of context for validation include:</title>
        <p>• Knowledge accrued in the LLM parameters during pretraining.
• User-provided context in the form of document collections or reference KGs represented
in string format.
• Agents that can interact with the world to search and retrieve information in various
ways.</p>
        <p>
          Further detail on use of context in our validator implementations is provided in Section 3.1.
Basic Settings for Validation: The first step is to obtain KG completion predictions in the
format of a list of (ℎ,  ,  ) triples, each consisting of a head entity ℎ, a relation  and a tail entity  .
All validators are instantiated in a zero-shot setting with an LLM backbone; this may be a model
from OpenAI’s model family, such as gpt-3.5-turbo-0125 [
          <xref ref-type="bibr" rid="ref32 ref45">45, 32</xref>
          ], or an open-source model from
the Llama family [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ]. Additionally, validators have access to various tools which allow them
to query external knowledge sources.
        </p>
        <p>Validation via Pydantic Models Pydantic is a data validation and settings management
library which leverages Python type annotations. It allows for the creation of data models,
where each model defines various fields with types and validation requirements. By using
Python’s type hints, Pydantic ensures that the incoming data conforms to the defined model
structure, performing automatic validation at runtime.</p>
        <p>KG triples are passed to the validator via the Instructor library, which uses a patched version
of popular LLM API clients. This patch enables the request of structured outputs in the form of
Pydantic classes. It is within these Pydantic classes that we specify the structural and semantic
guidelines that the LLM must follow during validation. An example of this form of prompting is
shown in Figure 7. Specifically, we request that, for every triple (ℎ,  , ) , the model must provide
values for a number of fields:
1. triple is valid: A boolean indicating whether the proposed triple is generally valid,
judged against any given context. The model can reply with True, False, or "Not enough
information to say".</p>
        <p>2. reason: An open-form string describing why the triple is or is not valid.</p>
        <sec id="sec-4-1-1">
          <title>3.1. Validation Contexts</title>
          <p>
            This section discusses the contextual information that is available to diferent validator
instantiations. We use context to mean all information that is available to a validator, including the
information stored in trained model parameters.
3.1.1. Validating with LLM Knowledge
This is the most straightforward method of triple validation. Given a triple (ℎ,  ,  ), the objective
is to classify the triple using the LLM’s inherent knowledge about the world, learned during the
pretraining stage, and stored in the model parameters. The process is illustrated in Figure 4 and
an example can be found in appendix Figure 10. This is a powerful and simple way to verify
triples with no additional data.
3.1.2. Validation using Textual Context(s)
Inspired by the success of Retrieval-Augmented Generation (RAG) in knowledge-intensive tasks
such as question-answering [
            <xref ref-type="bibr" rid="ref38">38</xref>
            ], we implement tooling to retrieve relevant information from
a reference text corpus (see Section 2.4). In this instance, the model is prompted with textual
context alongside the candidate triple, as shown in Figure 5. This approach is particularly useful
for a number of scenarios:
• When we wish to verify a set of triples about the same entity or group of entities and we
have a collection of trustworthy sources within which we assume there will be evidence
for or against the predicted triple, for example a given entity’s Wikipedia page.
• When building KGs using private or domain-specific data feeds.
          </p>
          <p>
            Textual Documents
This provided corpus can be of arbitrary length and can contain a collection of documents.
The corpus will be recursively chunked and encoded by an embedding model from either
the sentence transformers library [
            <xref ref-type="bibr" rid="ref47">47</xref>
            ] or OpenAI’s family of embedding models [
            <xref ref-type="bibr" rid="ref48">48</xref>
            ], and a
searchable index is created. A string representation for each triple is then constructed, and this
is used to query the corpus index, which retrieves the most semantically similar chunks of text,
according to cosine similarity. This forms the context against which the LLM will validate the
given triple.
3.1.3. Validation using a Reference KG
We also consider validating proposed KG triples by cross-referencing against established, reliable
KGs. Wikidata, with its expansive and well-structured repository of knowledge, serves as an
ideal reference point for such validations, and will serve as the reference KG in our experiments.
However, we note that any KG can be used as a reference by following the method outlined in
3.1.3.
          </p>
          <p>The Wikidata knowledge graph is built from two top-level types: Entities and Properties:
Entities: Entities represent all items in the database. An item is a real-world object, concept,
or event, such as “Earth” (Q2), “love” (Q316), or “World War II” (Q362). Items can be linked
to each other to form complex statements via properties. In the context of KG completion, a
statement can be thought of as a triple. Each entity is identified by a unique identifier, which is
a Q-prefix followed by a sequence of numbers, e.g., Q42 for Douglas Adams.</p>
          <p>Properties: Properties in Wikidata define the characteristics or attributes of items and
establish relationships between them. They are the predicates in statements, linking subjects (items)
to their object (value or another item). For example, in the statement “Douglas Adams (Q42)
profession (P106) - writer (Q36180)”,“profession” is the property that describes the relationship
between “Douglas Adams” and “writer”.</p>
          <p>Reference KG Implementation Our approach to integrating Wikidata as a source of
contextual information is simple. Given triple  , an agent module searches Wikidata using the
string of the subject as a query. The top Wikidata entity from the search API is returned – if no
results are found for the query, a warning is thrown, and the validator will default to using its
inherent knowledge. The Wikidata item is parsed to remove a list of trivial properties. Among
Wikidata’s 11,000 Properties, over 7,000 of these are identifiers to external databases such as
IMDb and Reddit 10. In this work, we are not interested in verifying such information, and so
we discard these properties.</p>
          <p>A string representation of the Wikidata page is now passed through the same RAG pipeline as
described in Section 3.1.2, from which relevant sections are retrieved and passed to the validator
as context alongside each predicted triple  . This implementation is illustrated in appendix
Figure 9.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>3.2. Validation using Web Search</title>
          <p>In some cases, the triples we wish to validate cannot be captured with a query to Wikidata, and
we do not have a collection of textual information to provide the model with additional context.
To overcome this, the validator is given access to collect information relevant to the triple via a
web-searching agent. The triple is formatted as a string query. An agent then searches the web
using the DuckDuckGo API11. The top results for the given query are parsed and stored as a
collection of documents. The validation then follows the same pattern as Section 3.1.2, whereby
relevant chunks of text are retrieved as context for triple validation. This method is illustrated
in appendix Figure 8.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Experiments</title>
      <p>
        We conduct a series of triple classification experiments to validate the efectiveness of an
LLM-backed validator for KG Completion. Our experiments make use of a number of popular
10https://wikiedu.org/blog/2022/03/30/property-exploration-how-do-i-learn-more-about-properties-on-wikidata/
11https://github.com/deedy5/duckduckgo_search
benchmark KG datasets: UMLS [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], WN18RR [
        <xref ref-type="bibr" rid="ref49">49</xref>
        ], FB15K-237N, Wiki27k [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and
CoDeXS [
        <xref ref-type="bibr" rid="ref50">50</xref>
        ]. FB15k-237N is derived from Freebase, and was obtained by removing the relations
containing mediator nodes in FB15K-237. Wiki27K was created from Wikidata and manually
annotated with real negative triples. UMLS is a medical ontology describing relations between
medical concepts. WN18RR is a dataset about English morphology derived from WordNet. We
investigate the performance of gpt-3.5-turbo-0125 and gpt-4-0125-preview and present
our results in Tables 1 , 2 and 3. Setup details and results for open-source LLM experiments can
be found in Section A.3 and Table 4 in the appendix.
      </p>
      <sec id="sec-5-1">
        <title>4.1. Experiment Settings</title>
        <p>Prompt as a Hyperparameter: We emphasize the notion of a prompt as a model
hyperparameter, and manually tuning it to fit a subset of data is a form of over-fitting or evaluation
set leakage. In this work, we thus formulate a generic model prompt, and apply this prompt to
all benchmark datasets without further changes. We include the prompt in the appendix (see
Figure 6).</p>
        <p>Through the following experiments we attempt to answer the question: Given context, can
our model judge whether an unseen triple (ℎ,  ,  ) is correct?</p>
        <p>We are primarily interested in observing the change in evaluation performance of an LLM
when it has access to context under the following settings:
• LLM Inherent Knowledge: Evaluates the model’s native understanding without external
data sources.
• Wikidata: Uses structured data from Wikidata as the reference KG context.
• Web: Incorporates information retrieved directly from the internet.
• WikidataWeb: Combines data from both Wikidata and web sources.
• WikipediaWikidata: Utilizes a mix of Wikipedia and Wikidata to provide a
comprehensive context.</p>
        <p>API Cost and Rate-Limiting Constraints Due to OpenAI API constraints, we run
experiments using a subset of 150 examples from each dataset. This is indicated by the -150 sufix to
each dataset name.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Discussion</title>
      <sec id="sec-6-1">
        <title>5.1. Analysis</title>
        <p>Our analysis reveals notable variations in performance across datasets, as evidenced by the
results obtained using diferent validators powered by GPT-3.5 and GPT-4 language models.
Specifically, the GPT-3.5 World Knowledge validator shows limited efectiveness on the
FB15K237N-150, Wiki27k-150, and CoDeX-S-150 datasets (as detailed in Tables 1 and 3). However,
the introduction of contextual information from Wikidata and web searches gives a strong
performance boost, with the performance on the CoDeX-S-150 dataset in particular improving
accuracy from 0.54 to 0.91 when using the WikidataWeb validator.</p>
        <p>GPT-4 configurations exhibit strong performance across the board, particularly excelling in
the FB15K-237N-150 and Wiki27k-150 datasets, where GPT-4 achieves the highest accuracy
of 0.83 and 0.89 respectively. However, both GPT-3.5 and GPT-4 models demonstrate less
satisfactory results on the UMLS-150 dataset, as indicated in Table 2.</p>
        <p>It is noteworthy that the incorporation of context from external knowledge sources, especially
web searches and Wikidata, proves beneficial for both models. Despite this, the open-source
Llama2 model performs poorly on this task, as shown in Table 4 and inference examples 11 and
12. We hypothesize that future open-source LLMs may perform much better than the those
currently available.</p>
        <p>GPT-4 validators display efectiveness on the WN18RR-150 dataset, both with and without
supplemental context. This robust performance is hypothesized to stem from the model’s
superior grasp of English morphology and nuanced language comprehension, aligning with the
linguistic focus of the WN18RR dataset.</p>
      </sec>
      <sec id="sec-6-2">
        <title>5.2. Key Findings</title>
        <p>Inherent Knowledge Insuficiency: In the case of GPT-3.5 and Llama2 70B, reliance solely
on the inherent knowledge of LLM validators often leads to unquestioned acceptance of predicted
triples. This indicates a limitation in the models’ ability to challenge the veracity of the
information, underscoring the need for external validation mechanisms. This corroborates with
ifndings from prior studies which find that LLMs struggle to memorize knowledge in the long
tail [51].</p>
        <p>Challenge in Verifying Ambiguous Triples: Our evaluation of each dataset reveals that
additional information is neccesary to verify many triples. For example, a positive triple in the
UMLS dataset reads ("age_group", "performs", "social_behavior"). Ambiguous triples
in the UMLS and WN18RR datasets require understanding of specific ontologies, rendering web
or Wikidata searches inefective for retrieving relevant context. This complexity is contrasted
by datasets like FB15K-237N and Wiki27k, which involve concrete entities or facts (e.g., people,
locations) more amenable to validation through widely available external sources. For example,
a positive example in FB15K-237N reads "Tim Robbins", "The gender of [X] is [Y] .",
"male",
The Importance of Relevant Context: Performance is weaker on datasets requiring
domain-specific knowledge, such as UMLS, where no model tested achieved satisfying
results. This is attributed to the challenge of sourcing pertinent context for validation, as
exemplified by the clinical domain triple from UMLS: ("research_device", "causes",
"anatomical_abnormality"). This highlights the critical role of context in enabling
accurate validation, emphasizing the need for targeted search strategies to augment the model’s
knowledge base.</p>
        <p>
          Limitations in Zero-Shot Triple Classification by Current Open-Source LLMs: Table 4
shows the performance of a version of the LLama-2-70B-chat model [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ] on the triple verification
task. Upon manual inspection, the model nearly always returns a True prediction for all triples,
irrespective of the provided context, resulting in a recall of 1.0 and precision of about 0.5
across all settings. This tendency suggests that while the model can superficially engage with
the context—evident from relevant factoids appearing in the reason field—it often resorts to
fabricating agreeable responses rather than accurately assessing the triple’s validity. Figures 11
and 12 illustrate this behaviour. It is worth noting that our experiments were conducted using
a single open-source model; however, alternative models could potentially deliver superior
performance. We propose this as an avenue for future research.
        </p>
        <p>Adoption of Other Open-Source LLMs: At present, we find that only OpenAI and Llama
models are usable with the Instructor framework. More recent models, such as Mixtral [52] and
Gemma [53], are beginning to receive support under this library, but issues with constraining
model output has delayed implementation. We are particularly interested in observing how
other open-source models perform at this task in the future.</p>
      </sec>
      <sec id="sec-6-3">
        <title>5.3. Ethical and Social Risks</title>
        <p>Building on the framework by Weidinger et al. [54], we highlight key ethical and social risks
associated with using LLMs for KG validation. LLMs, trained on large-scale internet datasets,
may perpetuate biases [55], discriminating against marginalized groups and potentially
reinforcing stereotypes within KGs. Additionally, the alignment of LLM outputs with human
preferences can introduce biases favoring certain languages and perspectives [56]. Privacy
concerns also arise from LLMs potentially leaking sensitive information [57]. Furthermore, the risk
of spreading misinformation through inaccurate validation poses serious challenges, especially
in sensitive domains like medicine or law. Lastly, the environmental impact of training and
deploying LLMs, including significant carbon emissions and water usage, underscores the need
for sustainable practices in LLM-driven KG validation [58, 59].</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions</title>
      <p>We have introduced a flexible framework for utilizing large language models for the validation of
triples within knowledge graphs, capitalizing on both the inherent knowledge embedded within
these models, and upon supplementary context drawn from external sources. As demonstrated in
our experiments (Section 4), the approach significantly enhances the accuracy of zero-shot triple
classification across several benchmark KG completion datasets, provided that the appropriate
context can be retrieved from external sources.</p>
      <p>Use Cases: From experimentation, LLMs have demonstrated the potential to be efective
validators for KG completion methods. They also open up the possibility of updating existing KG
datasets with new knowledge from external sources, ensuring their relevance as gold-standard
benchmarks. A practical application of this is the development of automated systems, such as
bots, designed to enrich platforms like Wikidata with real-world data. These bot contributions
could be systematically verified by SoTA LLMs to ensure accuracy and relevance.</p>
      <p>As of January 2024, Wikidata encompassed nearly 110 million pages, a figure increasing at
an accelerating rate. The decade between 2014 and 2023 saw an annual average of 9.57 million
new pages and 191.5 million edits, and cumulative annual growth rates of 12.83% and 12.16%
respectively [60]. The volume and pace of such expansion highlights the challenge of relying
on manual verification methods. Leveraging LLMs to flag incorrect or unsupported edits made
by users or bots could be an excellent aid to the Semantic Web community.</p>
      <p>Future Research: As the quality of general-purpose LLMs improves, this framework should
become increasingly efective in validating KG completion models. Instructor has already
begun work to support other open-source LLMs, which would enable even greater flexibility in
validator configuration.</p>
      <p>Enriching models with domain-specific context and graph structural features could boost
their performance across diverse datasets. Moreover, fine-tuning strategies tailored to LLMs may
unlock even better performance when a model is fine-tuning specifically for the KG validation
task.</p>
      <p>As discussed in Sections 1 and 2, a growing body of work studies knowledge graph creation
and augmentation using generative models. Knowledge graph creation is outside the scope
of this paper, but we plan to explore this in future work. Given an information extraction
model which produces KG triples from raw text, our verification pipeline could be connected
to the entity and property stores of an existing KG, and automatically update the KG with
high-accuracy information extracted from textual data feeds such as news. We note this is likely
to be easier for some domains then others, and current SoTA LLMs will probably not be good
verifiers for domain specific KGs.
doi:10.18653/v1/2020.emnlp-main.669.
[51] A. Mallen, A. Asai, V. Zhong, R. Das, D. Khashabi, H. Hajishirzi, When not to trust language
models: Investigating efectiveness of parametric and non-parametric memories, 2023.
arXiv:2212.10511.
[52] A. Q. Jiang, A. Sablayrolles, A. R. et al., Mixtral of experts, 2024. arXiv:2401.04088.
[53] G. Team, T. Mesnard, C. H. et al., Gemma: Open models based on gemini research and
technology, 2024. arXiv:2403.08295.
[54] L. Weidinger, J. Mellor, M. e. a. Rauh, Ethical and social risks of harm from Language
Models, 2021. URL: http://arxiv.org/abs/2112.04359. doi:10.48550/arXiv.2112.04359,
arXiv:2112.04359 [cs].
[55] E. M. Bender, T. Gebru, A. McMillan-Major, S. Shmitchell, On the Dangers of Stochastic
Parrots: Can Language Models Be Too Big?, in: Proceedings of the 2021 ACM Conference
on Fairness, Accountability, and Transparency, FAccT ’21, Association for Computing
Machinery, New York, NY, USA, 2021, pp. 610–623. URL: https://dl.acm.org/doi/10.1145/
3442188.3445922. doi:10.1145/3442188.3445922.
[56] M. J. Ryan, W. Held, D. Yang, Unintended Impacts of LLM Alignment on Global
Representation, 2024. URL: http://arxiv.org/abs/2402.15018. doi:10.48550/arXiv.2402.15018,
arXiv:2402.15018 [cs].
[57] N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown,
D. Song, U. Erlingsson, A. Oprea, C. Rafel, Extracting Training Data from Large Language
Models, 2021. URL: http://arxiv.org/abs/2012.07805. doi:10.48550/arXiv.2012.07805,
arXiv:2012.07805 [cs].
[58] D. Mytton, Data centre water consumption, npj Clean Water 4 (2021) 1–6. URL: https://
www.nature.com/articles/s41545-021-00101-w. doi:10.1038/s41545-021-00101-w,
publisher: Nature Publishing Group.
[59] D. Patterson, J. Gonzalez, Q. Le, C. Liang, L.-M. Munguia, D. Rothchild, D. So, M. Texier,
J. Dean, Carbon Emissions and Large Neural Network Training, 2021. URL: http://arxiv.
org/abs/2104.10350. doi:10.48550/arXiv.2104.10350, arXiv:2104.10350 [cs].
[60] Wikimedia Foundation, Wikimedia statistics - wikidata, https://stats.wikimedia.org/, ????</p>
      <p>Accessed: 2024-03-07.
[61] Tom Jobbins, Thebloke/llama-2-70b-chat-gguf, https://huggingface.co/TheBloke/</p>
      <p>Llama-2-70B-Chat-GGUF/, ???? Accessed: 2024-03-13.
[62] G. Gerganov, gguf.md, https://github.com/ggerganov/ggml/blob/master/docs/gguf.md,
2023. Accessed: [2024-03-15].</p>
    </sec>
    <sec id="sec-8">
      <title>A. Appendix</title>
      <p>A.1. Prompt Templates
@staticmethod
def validate_statement_with_no_context(entity_label, predicted_property_name,
↪ predicted_property_value):
'''Validate a statement about an entity with no context
a statement is a triple: entity_label --&gt; predicted_property_name --&gt;
↪ predicted_property_value</p>
      <p>e.g Donald Trump --&gt; wife --&gt; Ivanka Trump
],
max_retries=3,
temperature=0,
model=MODEL,
)
return resp
Figure 6: The prompt used across all experiments. The LLM response is captured as a Pydantic model.</p>
      <sec id="sec-8-1">
        <title>A.2. Validators with Context Illustrations</title>
        <p>Web Search API</p>
        <p>Search Results
“subject”: “anaheim_ducks”,
“relation”: “teamplaysport”,
“object”: “football”
query: the subject of the
unvalidated triple
e.g. anaheim_ducks</p>
        <p>Relevant and iltered triples
((aannaahheeiimm__dduucckkss,, lheoamgueev,eNnautei,oHnaolnHdaocCkeenytLere)ague) } properties to keep
(anaheim_ducks, instance of, ice hockey team)</p>
        <p>...
(anaheim_ducks, Instagram username, AnaheimDucks) } properties to discard
(anaheim_ducks, X username, anaheimducks)</p>
        <p>...</p>
        <p>LLM</p>
      </sec>
      <sec id="sec-8-2">
        <title>A.3. Open-source Experimental Setup</title>
        <p>
          To evaluate the capabilities of open-source LLMs in our framework, we employed a version
of the LLama-2-70B-chat model [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ]. We selected a model applying Q5_K_M quantization
to LLama2 70b-chat, chosen for its minimal quantization levels and low reported impact on
quality, provided by [61]. To implement this, we take advantage of Instructor’s integration
with llama-cpp-python12, which supports quantized models in GGUF [62] format available on
Hugging Face Hub13. Our experimental setup for open-source LLMs uses two NVIDIA A100
GPUs.
        </p>
      </sec>
      <sec id="sec-8-3">
        <title>A.4. Open-source LLM Experimental Results</title>
        <p>12https://github.com/abetlen/llama-cpp-python
13https://huggingface.co/docs/hub
A.5. Validator Inference Examples
&gt;&gt;&gt; inp = [{"predicted_subject_name": "Heinrich Rudolf Hertz",
"predicted_relation": [
"occupation"
],
"predicted_object_name": "theologian",}
]
&gt;&gt;&gt; WorldKnowledgeKGValidator(**{'triples': inp})
&gt;&gt;&gt; {
{</p>
        <p>}
"triple_is_valid": false,
"reason": "Heinrich Rudolf Hertz was a German physicist who made significant
↪ contributions to the field of electromagnetism and is best known for his
↪ discovery of electromagnetic waves. There is no evidence or indication that he
↪ was a theologian. The predicted relation 'occupation' with the object
↪ 'theologian' is not valid based on historical and factual information about
↪ Hertz's life and work.",
"predicted_subject_name": "Edward Norton",
"predicted_relation": "The profession of Edward Norton is record producer",
"predicted_object_name": "record producer",
"triple_is_valid": true,
"reason": "Edward Norton is indeed a record producer, as he has produced several films
↪ and documentaries through his production company, Class 5 Films.",
"sources": [
{
"relevant_text": "['Edward Norton - IMDb Edward Norton - IMDb ![]()', 'Edward
↪ Norton\u2019s impact on the film industry goes beyond his on-screen performances.
↪ His dedication to storytelling, whether through acting, directing, or producing,
↪ reflects a commitment to meaningful and thought-provoking narratives.
↪ Norton\u2019s ability to tackle complex characters and engage with diverse
↪ genres has left an enduring mark on cinema, earning him a place among the most
↪ respected and accomplished figures in the entertainment world. As he continues
↪ to navigate the evolving landscape']"
"predicted_subject_name": "Ricky Jay",
"predicted_relation": "The gender of Ricky Jay is female",
"predicted_object_name": "female",
"triple_is_valid": true,
"reason": "Ricky Jay was born as Richard Jay Potash, but he legally changed his name to
↪ Ricky Jay in 1982. Although he has been known to keep his personal life private, it
↪ is generally accepted that he identifies as male."</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          . URL: https://doi.org/10.1145/2629489. doi:
          <volume>10</volume>
          .1145/2629489.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Koné</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Babri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          ,
          <article-title>Snomed ct: A clinical terminology but also a formal ontology</article-title>
          ,
          <source>Journal of Biosciences and Medicines</source>
          (
          <year>2023</year>
          ). URL: https://api.semanticscholar. org/CorpusID:265433665.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Wordnet: a lexical database for english</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>38</volume>
          (
          <year>1995</year>
          )
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          . URL: https://doi.org/10.1145/219717.219748. doi:
          <volume>10</volume>
          .1145/219717.219748.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Fillmore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Lowe</surname>
          </string-name>
          ,
          <article-title>The berkeley framenet project</article-title>
          ,
          <source>in: COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          , E. Gabrilovich, G. Heitz,
          <string-name>
            <given-names>W.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Strohmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Zhang,</surname>
          </string-name>
          <article-title>Knowledge vault: a web-scale approach to probabilistic knowledge fusion</article-title>
          ,
          <source>in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , KDD '14,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2014</year>
          , p.
          <fpage>601</fpage>
          -
          <lpage>610</lpage>
          . URL: https://doi.org/10.1145/2623330.2623623. doi:
          <volume>10</volume>
          .1145/2623330. 2623623.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , J. Cheng,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <article-title>Knowledge graph completion: A review, Ieee Access 8 (</article-title>
          <year>2020</year>
          )
          <fpage>192435</fpage>
          -
          <lpage>192456</lpage>
          . URL:
          <article-title>"https://ieeexplore</article-title>
          .ieee.org/stamp/ stamp.jsp?arnumber=
          <volume>9220143</volume>
          ".
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Reiter</surname>
          </string-name>
          ,
          <source>On Closed World Data Bases, Technical Report, CAN</source>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>X.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Do pre-trained models benefit knowledge graph completion? a reliable evaluation and a reasonable approach</article-title>
          , in: S. Muresan,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Villavicencio (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2022</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>3570</fpage>
          -
          <lpage>3581</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .findings-acl.
          <volume>282</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          . findings- acl.282.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vashishth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sanyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Talukdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>A re-evaluation of knowledge graph completion methods</article-title>
          , in: D.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schluter</surname>
          </string-name>
          , J. Tetreault (Eds.),
          <article-title>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>5516</fpage>
          -
          <lpage>5522</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>489</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl- main.489.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Zhang,</surname>
          </string-name>
          <article-title>Rethinking knowledge graph evaluation under the open-world assumption</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2209</volume>
          .
          <fpage>08858</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. E.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sisman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Eficient knowledge graph accuracy evaluation</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1907</year>
          .09657.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          ,
          <article-title>The property graph database model</article-title>
          ,
          <source>in: Alberto Mendelzon Workshop on Foundations of Data Management</source>
          ,
          <year>2018</year>
          . URL: https://api.semanticscholar.org/CorpusID: 43977243.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Grishman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. T.</given-names>
            <surname>Dang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Grifitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ellis</surname>
          </string-name>
          ,
          <article-title>Overview of the tac 2010 knowledge base population track</article-title>
          ,
          <source>in: Third text analysis conference (TAC</source>
          <year>2010</year>
          ), volume
          <volume>3</volume>
          ,
          <year>2010</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>3</lpage>
          . URL: https://blender.cs.illinois.edu/paper/kbp2011.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Grishman</surname>
          </string-name>
          ,
          <article-title>Knowledge base population: Successful approaches and challenges</article-title>
          , in: D.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Matsumoto</surname>
          </string-name>
          , R. Mihalcea (Eds.),
          <article-title>Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Portland, Oregon, USA,
          <year>2011</year>
          , pp.
          <fpage>1148</fpage>
          -
          <lpage>1158</lpage>
          . URL: https: //aclanthology.org/P11-1115.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Cheng,
          <article-title>A comprehensive overview of knowledge graph completion, Knowledge-Based Systems (</article-title>
          <year>2022</year>
          )
          <fpage>109597</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          , Wordnet, in: Theory and applications of ontology: computer applications, Springer,
          <year>2010</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bollacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Paritosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sturge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , Freebase:
          <article-title>a collaboratively created graph database for structuring human knowledge</article-title>
          ,
          <source>in: Proceedings of the 2008 ACM SIGMOD international conference on Management of data</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1247</fpage>
          -
          <lpage>1250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>O.</given-names>
            <surname>Bodenreider</surname>
          </string-name>
          ,
          <article-title>The unified medical language system (umls): integrating biomedical terminology</article-title>
          ,
          <source>Nucleic acids research</source>
          <volume>32</volume>
          (
          <year>2004</year>
          )
          <fpage>D267</fpage>
          -
          <lpage>D270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hruschka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Talukdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Betteridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Carlson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dalvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gardner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kisiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krishnamurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mazaitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nakashole</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Platanios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ritter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Samadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Settles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wijaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saparov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Greaves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Welling</surname>
          </string-name>
          ,
          <article-title>Never-ending learning</article-title>
          ,
          <source>in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Razeghi</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. L. L. I. au2</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>Autoprompt:</surname>
          </string-name>
          <article-title>Eliciting knowledge from language models with automatically generated prompts</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2010</year>
          .15980.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakhtin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <article-title>Language models as knowledge bases?</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1909</year>
          .01066.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <article-title>Kg-bert: Bert for knowledge graph completion</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1909</year>
          .03193.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , X. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pretrain-KGE</surname>
          </string-name>
          :
          <article-title>Learning knowledge representation from pretrained language models</article-title>
          , in: T. Cohn,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>259</fpage>
          -
          <lpage>266</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .findings-emnlp.
          <volume>25</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .findings-emnlp.
          <volume>25</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , P. Liu,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>A survey of large language models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>18223</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>D.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , E. Chen,
          <article-title>Large language models for generative information extraction: A survey</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2312</volume>
          .
          <fpage>17617</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Chen,
          <article-title>Making large language models perform better in knowledge graph completion</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>06671</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>Structgpt: A general framework for large language model to reason over structured data</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>09645</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <article-title>Exploring large language models for knowledge graph completion</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2308</volume>
          .
          <fpage>13916</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>H.</given-names>
            <surname>Khorashadizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mihindukulasooriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Groppe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Groppe</surname>
          </string-name>
          ,
          <article-title>Exploring in-context learning capabilities of foundation models for generating knowledge graphs from text</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>08804</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nayak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. P.</given-names>
            <surname>Timmapathini</surname>
          </string-name>
          , Llm2kb:
          <article-title>Constructing knowledge bases using instruction tuned context aware large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2308.13207</source>
          (
          <year>2023</year>
          ). URL: https://arxiv.org/pdf/2308.13207.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>N. Zhang,</surname>
          </string-name>
          <article-title>Llms for knowledge graph construction and reasoning: Recent capabilities</article-title>
          and future opportunities,
          <year>2024</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>13168</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32] OpenAI, :,
          <string-name>
            <given-names>J.</given-names>
            <surname>Achiam</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. A.</surname>
          </string-name>
          et al.,
          <source>Gpt-4 technical report</source>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mihindukulasooriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Enguix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lata</surname>
          </string-name>
          ,
          <article-title>Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2023</year>
          , pp.
          <fpage>247</fpage>
          -
          <lpage>265</lpage>
          . URL: https://arxiv.org/pdf/2308.02357.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Unifying large language models and knowledge graphs: A roadmap, IEEE Transactions on Knowledge and Data Engineering (</article-title>
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          . URL: http://dx.doi.org/10.1109/TKDE.
          <year>2024</year>
          .
          <volume>3352100</volume>
          . doi:
          <volume>10</volume>
          .1109/tkde.
          <year>2024</year>
          .
          <volume>3352100</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-H.</given-names>
            <surname>So</surname>
          </string-name>
          , W.-S. Han,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Natural language to sql: where are we today?</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>13</volume>
          (
          <year>2020</year>
          )
          <fpage>1737</fpage>
          -
          <lpage>1750</lpage>
          . URL: https://doi.org/10.14778/3401960.3401970. doi:
          <volume>10</volume>
          .14778/3401960.3401970.
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>T.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Content enhanced bert-based text-to-sql generation</article-title>
          , ArXiv abs/
          <year>1910</year>
          .07179 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Frieske</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ishii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. J.</given-names>
            <surname>Bang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>Survey of hallucination in natural language generation</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          . URL: http://dx.doi.org/10.1145/3571730. doi:
          <volume>10</volume>
          .1145/3571730.
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for large language models: A survey</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2312</volume>
          .
          <fpage>10997</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Semnani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. Z.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <article-title>Wikichat: Stopping the hallucination of large language model chatbots by few-shot grounding on wikipedia</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>14292</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W. tau Yih, T. Rocktäschel,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for knowledgeintensive nlp tasks</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <year>2005</year>
          .11401.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Kanell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Khattab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <article-title>Assisting in writing wikipedialike articles from scratch with large language models</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2402</volume>
          .
          <fpage>14207</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>E.</given-names>
            <surname>Huaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kärle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fensel</surname>
          </string-name>
          , Knowledge graph validation,
          <year>2020</year>
          . arXiv:
          <year>2005</year>
          .01389.
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mayfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. W.</given-names>
            <surname>Finin</surname>
          </string-name>
          ,
          <article-title>Evaluating the quality of a knowledge base populated from text, in: AKBC-WEKEX@NAACL-</article-title>
          <string-name>
            <surname>HLT</surname>
          </string-name>
          ,
          <year>2012</year>
          . URL: https://api.semanticscholar.org/CorpusID: 1851959.
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <surname>H. Zhang,</surname>
          </string-name>
          <article-title>Are missing links predictable? an inferential benchmark for knowledge graph completion</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2108</volume>
          .
          <fpage>01387</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <article-title>Improving language understanding by generative pre-training</article-title>
          ,
          <year>2018</year>
          . URL: https://api.semanticscholar.org/CorpusID:49313245.
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. S.</surname>
          </string-name>
          et al.,
          <source>Llama</source>
          <volume>2</volume>
          :
          <article-title>Open foundation and fine-tuned chat models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2307</volume>
          .
          <fpage>09288</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bertnetworks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          . URL: https: //arxiv.org/abs/
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. P.</surname>
          </string-name>
          et al.,
          <article-title>Text and code embeddings by contrastive pre-training</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2201</volume>
          .
          <fpage>10005</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dettmers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Minervini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stenetorp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <source>Convolutional 2d knowledge graph embeddings</source>
          ,
          <year>2018</year>
          . arXiv:
          <volume>1707</volume>
          .
          <fpage>01476</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>T.</given-names>
            <surname>Safavi</surname>
          </string-name>
          , D. Koutra,
          <article-title>CoDEx: A Comprehensive Knowledge Graph Completion Benchmark</article-title>
          , in: B.
          <string-name>
            <surname>Webber</surname>
            , T. Cohn,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>8328</fpage>
          -
          <lpage>8350</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>669</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>