<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiaohan Feng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xixin Wu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Helen Meng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of System Engineering and Engineering Management, Chinese University of Hong Kong</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose an ontology-grounded approach to Knowledge Graph (KG) construction using Large Language Models (LLMs) on a knowledge base. An ontology is authored by generating Competency Questions (CQ) on knowledge base to discover knowledge scope, extracting relations from CQs, and attempt to replace equivalent relations by their counterpart in Wikidata. To ensure consistency and interpretability in the resulting KG, we ground generation of KG with the authored ontology based on extracted relations. Evaluation on benchmark datasets demonstrates competitive performance in knowledge graph construction task. Our work presents a promising direction for scalable KG construction pipeline with minimal human intervention, that yields high quality and human-interpretable KGs, which are interoperable with Wikidata semantics for potential knowledge base expansion.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graph</kwd>
        <kwd>Relation Extraction</kwd>
        <kwd>Large Language Model</kwd>
        <kwd>Wikidata</kwd>
        <kwd>Interpretable AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge Graphs (KGs) are structured representations of information that capture entities
and their relationships in a graph format. By organizing knowledge in a machine-readable
way, KGs enable a wide range of intelligent applications, such as semantic search, question
answering, recommendation systems, and decision support [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The ability to construct
highquality, comprehensive KGs is thus critical for harnessing the power of these technologies
across various domains.
      </p>
      <p>
        Traditionally, the process of constructing KGs has relied heavily on manual efort by domain
experts to define the relevant entities and relationships, populate the graph with valid facts,
and ensure logical consistency [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, this manual curation approach is time-consuming,
expensive, and dificult to scale to large, evolving domains. There is a strong need for
(semi)automatic methods that can aid the KG construction process by extracting structured knowledge
from unstructured data sources such as text.
      </p>
      <p>
        Recent years have seen growing interest in leveraging Large Language Models (LLMs) for
various knowledge capture and reasoning tasks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Pre-trained on vast amounts of text data,
LLMs can generate fluent natural language and have been shown to memorize and recall
factual knowledge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [5]. However, directly applying LLMs to KG construction still faces
several challenges. First, LLMs may generate inconsistent or redundant facts due to the lack
of an explicit, unified schema [
        <xref ref-type="bibr" rid="ref5">6</xref>
        ]. Second, the generated KGs may be incomplete or biased
towards the knowledge present in the LLM’s training data, which may not fully cover the
target domain, especially for proprietary documents not included in pre-training set. Finally,
it can be challenging to integrate LLM-generated KGs with existing knowledge bases due to
misalignment with standard ontologies.
      </p>
      <p>In this work, we propose a novel approach
that harnesses the reasoning power of LLMs
and the structured schema of Wikidata to
construct high-quality KGs for proprietary
knowledge domains. Our approach begins by
discovering the scope of knowledge through
the generation of Competency Questions (CQ)
and answers from unstructured documents.</p>
      <p>We then summarize the relations and
properties from these QA pairs into an ontology,
matching candidate properties against those
defined in Wikidata and extending the schema
as needed. Finally, we use the resulting on- Figure 1: Flowchart of proposed approach.
tology to ground the transformation of
CQanswer pairs into a structured KG. By incorporating the Wikidata schema into our pipeline
and grounding generation of KG on the same ontology, we aim to reduce redundancy, leverage
the implicit knowledge captured during LLM pretraining while improving interpretability, and
ensure interoperability with public knowledge bases. The generated KGs could be parsed with
RDF parsers and used in downstream applications, or audited for correctness.</p>
      <p>The main contributions of this work are as follows:
1. We propose a novel ontology-grounded approach to LLM-based KG construction that
leverages ontology based on Wikidata schema to guide the extraction and integration of
knowledge from unstructured text.
2. We introduce a pipeline that combines competency question generation, ontology
alignment, and KG grounding to systematically construct high-quality KGs that are consistent,
complete, and interoperable with existing knowledge bases.
3. We demonstrate the efectiveness of our approach through experiments on benchmark
datasets, showing improvements in KG quality compared to traditional methods alongside
with interpretability and utility of generated KGs.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>
        Knowledge graph construction has been an active area of research in recent years, with a wide
range of approaches proposed for extracting structured knowledge from unstructured data
sources [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Early methods relied heavily on rule-based systems and hand-crafted features to
identify entities and relations in text [
        <xref ref-type="bibr" rid="ref6">7</xref>
        ]. With the advent of deep learning, neural
networkbased approaches have become increasingly popular, enabling more flexible and scalable KG
construction [
        <xref ref-type="bibr" rid="ref7">8</xref>
        ].
      </p>
      <p>
        One prominent line of work focuses on using distant supervision to automatically generate
training data for relation extraction [
        <xref ref-type="bibr" rid="ref8">9</xref>
        ]. These methods assume that if two entities are mentioned
together in a sentence and also appear in a knowledge base as subject and object of a relation,
then that sentence is likely to express the relation. While distant supervision has been shown
to be efective at scale, it often sufers from noise and incomplete coverage.
      </p>
      <p>
        Another important direction is the development of unsupervised and semi-supervised
methods for KG construction [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ]. These approaches aim to reduce the reliance on large amounts
of labeled data by leveraging techniques such as bootstrapping, graph-based inference, and
representation learning. However, they often struggle with consistency and quality control
issues. More recently, there has been growing interest in using large language models for
KG construction [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">13</xref>
        ]. These methods take advantage of the vast knowledge
captured in pretrained Language Models (LM) to generate KG triples through prompt engineering
and fine-tuning. While promising, these LM-based approaches only produces triplets without
canocalization, which makes portability and interoperability dificult. Additionally, some
methods rely on vector-based similarity measures to deduce relationships between entities in KG,
which yields good performance but falls short in interpretability [
        <xref ref-type="bibr" rid="ref13">14</xref>
        ].
      </p>
      <p>As mentiond in Introduction, despite the significant progress in KG construction and LLM
applications, performance, interpretability, coverage of proprietary documents, and interaction
with other knowledge base remain issues. Our pipeline address these by grouding KG generation
on ontology based on Wikidata schema, which ensures that output KG is human-readable and
makes integrating with Wikidata or other KG easier; In the experiments below we show that
these benefits can be also achieved on private documents with decent performance.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method: Ontology-grounded KG Construction</title>
      <p>Our proposed approach for ontology-grounded KG construction using LLMs consists of four
main stages: 1) Competency Question Generation, 2) Relation Extraction and Ontology Matching,
3) Ontology Formatting, and 4) KG Construction. Figure 1 provides an overview of the pipeline.</p>
      <sec id="sec-3-1">
        <title>3.1. Competency Question (CQ)-Answer Generation</title>
        <p>The first step in our pipeline is to generate a set of competency questions (CQs) and answers
that capture the key information needs of the target domain. We employ an LLM to generate
CQs based on the input documents. The LLM is provided with a set of instructions and examples
to guide the generation process, encouraging the creation of well-formed, relevant questions
that can be answered using the given documents. This step helps to scope the KG construction
task within the knowledge domain, and ensure that the resulting KG aligns with the intended
use cases. This also allows further ontology expansion by incorporating user-submitted
domaindefining questions when interacting with the knowledge base, which serves as a user friendly
interface of refining ontology by submitting new CQs and use our proposed pipeline to attach
the incremental knowledge scope to existing ontology.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Relation Extraction and Ontology Matching</title>
        <p>
          During our preliminary experiments of prompting LLMs to directly generate ontology on
documents, we noted that the LLM spontaneously recalled Wikidata knowledge in response,
consistent with previous works [
          <xref ref-type="bibr" rid="ref14">15</xref>
          ]. In our preliminary experiments, this behaviour also
transfers to small 7B/14B models.
        </p>
        <p>Following this direction, in the second step we extract relations from CQs and match them
against Wikidata properties to better elicit model memories on Wikidata when constructing and
using ontology. We first prompt LLMs to extract properties from CQ and write brief description
on usage of extracted properties, including their domain and range, following editing guidelines
of Wikidata. To match these properties against existing entries in Wikidata ontology, we
prepopulate a candidate property list with all Wikidata properties after filtering out properties
related to external database/knowledge base IDs. These extracted properties are then matched
against the candidate list by a vector similarity search between description of properties. The
representation for property is sentence embedding constructed from description of properties,
and the top 1 closest candidate is retrieved for each extracted property. This matching result
between each pair of extracted property and matched top 1 candidate is then vetted by LLM to
see if they are really semantically similar as a final deduplication step. If a match is validated,
the candidate property is added to the final property list; otherwise, the newly minted property
is kept in the final list if we allow expansion from the candidate property list derived from
Wikidata, and discarded when the final property list is required to be a subset of candidate
property list. The first scenario is suitable for cases when no prior schema is known for the
domain and some new properties outside of common ontology are expected, whereas the latter
is for a known target list of possible properties.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Ontology Formatting</title>
        <p>In the third stage, we use LLM to generate an OWL ontology based on the matched and
newly created properties. We copy the description, domain and range field from all properties
under Wikidata semantics. For new properties, LLM is prompted to infer and summarize classes
for the domain and range of the relations to output a complete OWL ontology, following the
format of copied Wikidata properties. This step ensures that the resulting KG is grounded in a
formal, machine-readable ontology that captures relationships between entities, and close to
the semantics of Wikidata for interoperability.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. KG Construction</title>
        <p>In the final stage , we use the LLM to construct a KG based on the CQs and related answers
grounded by the generated ontology in the previous stage. For each (CQ, answer) pair, LLM
extracts relevant entities and maps them to the ontology using the defined properties. The
output is a set of RDF triples that constitutes the final KG.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Discussion</title>
      <sec id="sec-4-1">
        <title>4.1. Experiment settings</title>
        <p>
          We evaluate our ontology-grounded approach to KG construction (KGC) on three datasets for
KGC datasets: Wiki-NRE [
          <xref ref-type="bibr" rid="ref15">16</xref>
          ], SciERC [
          <xref ref-type="bibr" rid="ref16">17</xref>
          ], and WebNLG [
          <xref ref-type="bibr" rid="ref17">18</xref>
          ]. As Wiki-NRE and WebNLG
are partially based on Wikidata and DBpedia (derived from Wikipedia contents), and in our
proposed pipeline, Wikidata schema is utilized, we include SciERC for a more robust evaluation,
since SciERC contains relation types that are not equivalent by nature to properties in Wikidata.
        </p>
        <p>
          We used a subset Wiki-NRE’s test dataset containing 1,000 samples with 45 relation types
following the split in [
          <xref ref-type="bibr" rid="ref18">19</xref>
          ], due to cost constraints. SciERC’s test set contain 974 samples under
a schema with 7 relation types. For WebNLG, we used test set in Semantic Parsing (SP) task,
with 1,165 samples and 159 relation types. For evaluation, we adopt partial F1 on KG triplets
based on standards in [
          <xref ref-type="bibr" rid="ref17">18</xref>
          ]. All experiments are conducted for one-pass
        </p>
        <p>
          We note in the previous reports that annotation in KGC reports may be incomplete in terms
of both possible relation types and KG triplets [
          <xref ref-type="bibr" rid="ref18">19</xref>
          ], [
          <xref ref-type="bibr" rid="ref19">20</xref>
          ].
        </p>
        <p>As our pipeline is designed to autonomously uncover knowledge structure with no prior
assumption on knowledge schema, we report our result in two ways, corresponding to the two
configurations of final de-duplication step in Section 3.2:
1. Target schema constrained: In this setting, we match all relation types in test sets to its
closest equivalent in Wikidata and constrict ontology to the relation universe in test set.
2. No schema constraint: In this setting, we do not filter matched ontology, even if they are
not in schema of test dataset. This setting is close to real-life applications when processing
documents with unknown schema.</p>
        <p>For property conjunction, evaluate for, compare, feature of in SciERC, we select the closest
properties proposed by LLM based on our subjective opinion.</p>
        <p>To highlight our system’s competency, rather than directly prompting triplets, we parse
output KG with RDF parser and extract all valid RDF triples for KG related to each document in
test set, and present triplets to evaluation script for assessment. This ensures that our evaluation
is on the generated KG ready to be consumed in downstream application.</p>
        <p>
          We test our pipeline on both Mistral-7B-instruct [
          <xref ref-type="bibr" rid="ref20">21</xref>
          ] and GPT-4o1. Due to cost constraints,
we have only tested GPT-4o on target schema constrained setting. For embedding property usage
comment, we select bge-small-en [
          <xref ref-type="bibr" rid="ref21">22</xref>
          ]. We use GenIE [
          <xref ref-type="bibr" rid="ref22">23</xref>
          ], PL-Marker [
          <xref ref-type="bibr" rid="ref23">24</xref>
          ], and ReGen [
          <xref ref-type="bibr" rid="ref24">25</xref>
          ]
as fine-tuned baseline for Wiki-NRE, SciERC, and WebNLG dataset, respectively (collectively
named Non-LLM Baseline). For LLM-based systems, we use results reported in [
          <xref ref-type="bibr" rid="ref18">19</xref>
          ] for
WikiNRE and WebNLG on the same Mistral model, and GPT-4 results in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for SciERC. (collectively
named LLM Baseline). We note that it is highly unlikely that Mistral-7B poses an advantage
over an earlier version of GPT-4, when interpreting result of SciERC.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Result</title>
        <sec id="sec-4-2-1">
          <title>1https://openai.com/index/hello-gpt-4o/</title>
          <p>Table 1 shows the performance of our Method Wiki-NRE SciERC
method compared to state-of-the-art Non-LLM Baseline 0.484 0.532
baselines on this subset. Our proposed LLM Baseline 0.647 0.07
approach exceeds all baseline under tar- Proposed (Mistral) 0.66/0.60 0.73/0.58
get schema constrained setting on Wiki- Proposed (GPT-4o) 0.71/N/A 0.77/N/A
NRE and SciERC datasets, while
displaying a small performance regression when Table 1: Partial F1 scores on test datasets. Best result
without schema constraint. On WebNLG is bolded. Results of proposed pipeline under
dataset, our pipeline maintained compet- two settings are presented as Target schema
itiveness against fine-tuned SOTA when constrained/no schema constraint.
constrained on target schema. These
results validate the quality of KG generated by our pipeline, especially SciERC, whose semantics
contains properties that are not native to Wikidata. We also note performance improvement
when using GPT-4o.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Discussion</title>
        <p>4.3.1. Performance discrepancy on diferent grounding ontology
It is worth noting that the relatively lower performance on no schema constraint setting across
all datasets is due to the fact that the LLM discovers a richer ontology than the predefined
target schema. While this expanded schema may capture additional relevant information, it can
hinder extraction performance when evaluated solely against the limited target schema. This
showcases the trade-of between schema completeness and strict adherence to a predefined
ontology, and our pipeline performs best on a large set of documents with a limited scope of
knowledge, requiring a concise schema.</p>
        <p>Furthermore, the flipside of performance deficit in an absence of schema constraints, i.e.
additional ontology entries outside of dataset-defined properties, cannot be evaluated against
the dataset directly, as the ontology is not entirely covered by test set annotations. Hence, the
virtue of no schema constraint setting is to demonstrate that our pipeline can indeed provide a
coverage of the properties in test set, though somewhat limited compared to baselines, when also
capturing ontology outside test set schema, which is potentially more useful when discovering
ontology on a novel document set with no expert knowledge in its schema conposition. This
ability may be validated by manual evaluation on the full set of captured ontology in a future
work down the line.</p>
        <p>
          Nevertheless, the marginal performance deficit leaves room for improvement. Recent reports
explored that long input context may pose challenge to LLMs even if such long context length is
technically supported [
          <xref ref-type="bibr" rid="ref25">26</xref>
          ]. We conjecture that aside from trimming grounding ontology, which
hinders the knowledge coverage of our pipeline, few-shot fine-tuning on the new ontology or
general pretraining in KG construction task may be helpful. We leave these as possible future
directions.
4.3.2. Utility of generated KG
It should be emphasized that, while the selected evaluation tasks evaluate the correctness
of extracted triplets, the extracted knowledge graph can do more than that. With ongoing
discussion related to grounding LLM knowledge on trusted knowledge sources to reduce
hallucination [
          <xref ref-type="bibr" rid="ref5">6</xref>
          ], explicitly generating KG provides a path to audit knowledge elicited when
interacting with LLM, and with evidence demonstrating that LLM has the potential to reason
on graph and generate an explicit path to retrieve required knowledge [
          <xref ref-type="bibr" rid="ref26">27</xref>
          ], our pipeline may
serve as a foundation for an interpretable QA system, where an LLM autonomously extracts
ontology and deduces correct retrieval query based on the ontology when handling a set of
unstructured document. The interpretability arise from the fact that KG and query could be
understood and verified by users. Moreover, our usage of Wikidata schema ofers potential
interoperability with the whole Wikidata knowledge base, which safely expands the knowledge
scope of QA system. We propose to continue research on this significant direction.
4.3.3. Computational resources
We note the growing concern of sustainability in LLM applications due to intensive requirement
on computational resources. This pipeline consumes three separate LLM calls per document,
plus one call per extracted relation. It is not straight forward to compare the carbon footprint
of our approach compared to Non-LLM baselines, as our work at this stage does not require
model fine-tuning, whereas all of the Non-LLM baselines employed various tuning techniques
when producing the result. On the other hand, our smallest model adopted, Mistral-7B, is more
than 10x larger in terms of parameter size compared to T5 models used in Non-LLM baselines.
Larger models naturally require more powerful GPU clusters in terms of both GPU quantity
and capability, but our zero-shot approach may provide an advantage in terms of resource
cost compared to Non-LLM baselines when processing a small number of documents with no
training requirement.
        </p>
        <p>
          When comparing with LLM baselines, we note that the approach by [
          <xref ref-type="bibr" rid="ref13">14</xref>
          ], [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] consumes 1 and
2 LLM calls per document, respectively. However, we note that these baselines treat knowledge
triplet as evaluation target, while we generate a formatted ontology at the end, which is more
useful. Nevertheless, we recognize the performance burden and propose to explore techniques
in fine-tuning and guided decoding to achieve better performance with smaller model and better
reproducibility.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We have demonstrated the efectiveness of our ontology-grounded approach to KG construction
using LLMs. By leveraging the structured knowledge in Wikidata, pretrained on LLM, and
grounding KG construction with generated ontology, our pipeline is able to construct
highquality KGs across various domains while maintaining competitive performance with
state-ofthe-art baselines. Generated KGs that are conformant with Wikidata schema leaves possibly
wide open, of building an interpretable QA system that has robust access to both common
knowledge and proprietary knowledge base.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by Centre for Perceptual and Interactive Intelligence (CPII) Ltd, a
CUHK-led InnoCentre under InnoHK scheme of Innovation and Technology Commision.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Sample generated KG</title>
      <p>This KG was generated under no schema constraint setting for this document: Mohammad
Firouzi ( Born 1958 Tehran ) is a prolific Iranian musician , whose primary instrument is the
barbat .
&lt; P r e f i x e s and d e f i n i t i o n o f d e p e n d e n c i e s o m i t t e d &gt;
wd : Mohammad_Firouzi a wd : human ;
r d f s : l a b e l " Mohammad F i r o u z i " @en ;
wdt : o c c u p a t i o n wd : M u s i c i a n ;
wdt : C o u n t r y O f C i t i z e n s h i p wd : I r a n ;
wdt : P l a c e O f B i r t h wd : T e h r a n ;
wdt : D a t e O f B i r t h " 1 9 5 8 " ^ ^ x s d : d a t e .</p>
      <p>Note that in oficial annotation, only triplets related to place of birth and nationality exist, hence
the evaluation will be penalized with low precision.</p>
    </sec>
    <sec id="sec-8">
      <title>B. Preprocessing of Wikidata schema</title>
      <p>To save space in LLM input context and mitigate performance drop on selected target schema
when ontology is large, we only include commonly used properties by restricting data type
on item, quantity, string, monolingual text, point in time.2 To align with common pretraining
objectives of LLM, we substitute entity identifiers (e.g. P19) with its literal label (rdfs:label in
PascalCase (e.g. PlaceOfBirth).</p>
      <sec id="sec-8-1">
        <title>2https://www.wikidata.org/wiki/Help:Data_type</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>C. Prompts</title>
      <sec id="sec-9-1">
        <title>All prompts are reused across all datasets.</title>
        <sec id="sec-9-1-1">
          <title>C.1. CQ generation</title>
          <p>We prompt LLM to generate up to 3 CQs per document for eficiency considering nature of test
datasets, but note that this may be adjusted.</p>
          <p>Write competency q u e s t i o n s b a s e d on t h e a b s t r a c t l e v e l c o n c e p t s
i n t h e document . Write q u e s t i o n s t h a t can be answered u s i n g
t h e document o n l y .</p>
          <p>Write up t o 3 q u e s t i o n s p e r document .</p>
          <p>Below a r e t h e examples and f o l l o w t h e same f o r m a t when
g e n e r a t i n g competency q u e s t i o n s :
####
Document : Douglas Noel Adams ( 1 1 March 1952 − 11 May 2 0 0 1 ) was
an E n g l i s h author , humourist , and s c r e e n w r i t e r , b e s t known
f o r The H i t c h h i k e r ’ s Guide t o t h e Galaxy (HHGTTG) .</p>
          <p>O r i g i n a l l y a 1978 BBC r a d i o comedy , The H i t c h h i k e r ’ s Guide
t o t h e Galaxy d e v e l o p e d i n t o a " t r i l o g y " o f f i v e books t h a t
s o l d more than 15 m i l l i o n c o p i e s i n h i s l i f e t i m e . I t was
f u r t h e r d e v e l o p e d i n t o a t e l e v i s i o n s e r i e s , s e v e r a l s t a g e
p l a y s , comics , a v i d e o game , and a 2005 f e a t u r e f i l m . Adams ’
s c o n t r i b u t i o n t o UK r a d i o i s commemorated i n The Radio
Academy ’ s H a l l o f Fame .
####
Q u e s t i o n s :
CQ1 . What i s t h e d a t e o f b i r t h o f Douglas Noel Adams?
CQ2 . What i s t h e d a t e o f d e a t h o f Douglas Noel Adams?
CQ3 . What i s t h e o c c u p a t i o n o f Douglas Noel Adams?
CQ4 . What i s t h e c o u n t r y o f c i t i z e n s h i p o f Douglas Noel Adams?
CQ5 . What i s t h e most n o t a b l e work o f Douglas Noel Adams?
CQ6 . What i s t h e o r i g i n a l medium o f The H i t c h h i k e r ’ s Guide t o
t h e Galaxy ?
CQ7 . I n what y e a r was The H i t c h h i k e r ’ s Guide t o t h e Galaxy
o r i g i n a l l y b r o a d c a s t ?
CQ8 . How many books a r e i n The H i t c h h i k e r ’ s Guide t o t h e Galaxy
" t r i l o g y " ?
CQ9 . What o t h e r media a d a p t a t i o n s were c r e a t e d b a s e d on The</p>
          <p>H i t c h h i k e r ’ s Guide t o t h e Galaxy ?
####
Document :
{ document t o be p r o c e s s e d }</p>
        </sec>
        <sec id="sec-9-1-2">
          <title>C.2. CQ answering</title>
          <p>Use t h e p r o v i d e d document t o answer u s e r query . I f you don ’ t
know t h e answer , j u s t say t h a t you don ’ t know , don ’ t t r y t o
make up an answer .</p>
          <p>P a s s a g e : { doc }
Query : { query }</p>
        </sec>
        <sec id="sec-9-1-3">
          <title>C.3. Relation extraction</title>
          <p>You a r e an a s s i s t a n t i n b u i l d i n g a knowledge graph . Analyze t h e
f o l l o w i n g competency q u e s t i o n s and i d e n t i f y a l l
r e l a t i o n s h i p s and c o n c e p t s c o n c e p t s mentioned i n t h e
q u e s t i o n .</p>
          <p>E x t r a c t r e l a t i o n f i r s t , then d e s c r i b e t h e usage o f each
r e l a t i o n b a s e d on your u n d e r s t a n d i n g g i v e n t h e c o n t e x t o f
competency q u e s t i o n s .</p>
          <p>A ft e rw a rd s , e x t r a c t a l l r e l a t i o n − r e l a t e d c o n c e p t s .</p>
          <p>You s h o u l d only e x t r a c t p r o p e r t i e s between e n t i t i e s and
l i t e r a l s , not e n t i t i e s t h e m s e l v e s , or c l a s s e s o f e n t i t i e s .</p>
          <p>T h e r e f o r e , not a l l CQs c o n t a i n v a l i d p r o p e r t i e s .</p>
          <p>I f you don ’ t know t h e answer , j u s t say t h a t you don ’ t know , don
’ t t r y t o make up an answer .</p>
          <p>Merge a l l r e l a t i o n s i n t o one l i s t and a l l c o n c e p t s i n t o one
l i s t .</p>
          <p>Do not r e p l y u s i n g a c o m p l e t e s e n t e n c e , and only g i v e t h e
answer i n t h e f o l l o w i n g f o r m a t .</p>
          <p>Below a r e t h e examples and f o l l o w t h e same f o r m a t t o e x t r a c t
t h e r e l a t i o n s :
####
Document : Douglas Noel Adams ( 1 1 March 1952 − 11 May 2 0 0 1 ) was
an E n g l i s h author , humourist , and s c r e e n w r i t e r , b e s t known
f o r The H i t c h h i k e r ’ s Guide t o t h e Galaxy (HHGTTG) .</p>
          <p>O r i g i n a l l y a 1978 BBC r a d i o comedy , The H i t c h h i k e r ’ s Guide
t o t h e Galaxy d e v e l o p e d i n t o a " t r i l o g y " o f f i v e books t h a t
s o l d more than 15 m i l l i o n c o p i e s i n h i s l i f e t i m e . I t was
f u r t h e r d e v e l o p e d i n t o a t e l e v i s i o n s e r i e s , s e v e r a l s t a g e
p l a y s , comics , a v i d e o game , and a 2005 f e a t u r e f i l m . Adams ’
s c o n t r i b u t i o n t o UK r a d i o i s commemorated i n The Radio
Academy ’ s H a l l o f Fame .
####
Q u e s t i o n s :
CQ1 . What i s t h e d a t e o f b i r t h o f Douglas Noel Adams?
CQ2 . What i s t h e d a t e o f d e a t h o f Douglas Noel Adams?
CQ3 . What i s t h e o c c u p a t i o n o f Douglas Noel Adams?
CQ4 . What i s t h e c o u n t r y o f c i t i z e n s h i p o f Douglas Noel Adams?
CQ5 . What i s t h e most n o t a b l e work o f Douglas Noel Adams?
CQ6 . What i s t h e o r i g i n a l medium o f The H i t c h h i k e r ’ s Guide t o
t h e Galaxy ?
CQ7 . I n what y e a r was The H i t c h h i k e r ’ s Guide t o t h e Galaxy
o r i g i n a l l y b r o a d c a s t ?
CQ8 . How many books a r e i n The H i t c h h i k e r ’ s Guide t o t h e Galaxy
" t r i l o g y " ?
CQ9 . What o t h e r media a d a p t a t i o n s were c r e a t e d b a s e d on The</p>
          <p>H i t c h h i k e r ’ s Guide t o t h e Galaxy ?
####
R e l a t i o n s :
( d a t e o f b i r t h , The d a t e on which t h e s u b j e c t was born . )
( d a t e o f death , The d a t e on which t h e s u b j e c t d i e d . )
( o c c u p a t i o n , The o c c u p a t i o n o f a p e r s o n . )
( c o u n t r y o f c i t i z e n s h i p , The c o u n t r y o f which t h e s u b j e c t i s a
c i t i z e n . )
( n o t a b l e work , The most n o t a b l e work o f a p e r s o n . )
( genre , The g e n r e or t y p e o f work . )
( p u b l i c a t i o n date , The d a t e or p e r i o d when a work was f i r s t
p u b l i s h e d or r e l e a s e d . )
( has p a r t , I n d i c a t e s t h a t t h e s u b j e c t has a c e r t a i n p a r t ,
component , or e l e m e n t . )
( s e r i e s , I n d i c a t e s t h a t t h e s u b j e c t i s p a r t o f a s e r i e s , such
a s a book s e r i e s , f i l m s e r i e s , or t e l e v i s i o n s e r i e s . )
####
Document :
{ document t o be p r o c e s s e d }
####
Q u e s t i o n s :
####
R e l a t i o n s :</p>
        </sec>
        <sec id="sec-9-1-4">
          <title>C.4. Ontology matching</title>
          <p>De cide i f t h e two p r o p e r t i e s a r e s e m a n t i c a l l y s i m i l a r i n an
o n t o l o g y .</p>
          <p>You s h o u l d say yes i f you d e c i d e t h a t t h e s e p r o p t i e s a r e
s i m i l a r , or i f t h e y a r e i n v e r s e p r o p e r t i e s .</p>
          <p>Answer i n " yes " or " no " o n l y .</p>
          <p>P r o p e r t y 1 : { p1 }
P r o p e r t y 2 : { p2 }</p>
        </sec>
        <sec id="sec-9-1-5">
          <title>C.5. Ontology formatting</title>
          <p>For properties under Wikidata schema, we retrieve schema:description, rdfs:domain, rdfs:range
for each property and include it in resulting ontology. Otherwise LLM is prompted to author
ontology as so:
####
R e l a t i o n s :
( r e s u l t s , r e s u l t s : r e s u l t s o f a c o m p e t i t i o n such a s s p o r t s or
e l e c t i o n s )
####
Ontology :
wdt : R e s u l t s a w i k i b a s e : P r o p e r t y ;
schema : d e s c r i p t i o n " r e s u l t s o f a c o m p e t i t i o n such a s s p o r t s
or e l e c t i o n s " ;
r d f s : l a b e l " r e s u l t s " ;
r d f s : domain wd : referendum , wd : c o m p e t i t i o n , wd : p a r t y</p>
          <p>c o n f e r e n c e , wd : s p o r t i n g e v e n t ;
r d f s : range wd : e l e c t o r a l r e s u l t , wd : v o t i n g r e s u l t , wd : s p o r t
r e s u l t , wd : r a c e r e s u l t .
####
R e l a t i o n s :
{ r e l a t i o n }</p>
        </sec>
        <sec id="sec-9-1-6">
          <title>C.6. KG generation</title>
          <p>Your t a s k i s t o c o n s t r u c t a knowledge graph b a s e d on t h e
p r o v i d e d o n t o l o g y .</p>
          <p>Focus on u n d e r s t a n d i n g r e l a t i o n s h i p s from t h e q u e s t i o n answer
p a i r and document ,
and e x t r a c t r e l a t e d e n t i t i e s , then mapping them t o t h e o n t o l o g y
u s i n g t h e p r o p e r t i e s d e f i n e d i n t h e o n t o l o g y .</p>
          <p>Do not i n c l u d e new p r o p e r t i e s o t h e r than t h o s e i n o n t o l o g y .</p>
          <p>Only use t h o s e p r o p e r t i e s i n t h e o n t o l o g y .</p>
          <p>Output i n t u r t l e f o r m a t f o l l o w i n g t h e o n t o l o g y p r o v i d e d .
You s h o u l d o n l y i n c l u d e knowledge i n q u e s t i o n answer p a i r s and
t h e document .</p>
          <p>Do not make up answers .</p>
          <p>Use t h i s o n t o l o g y b a s e d on W i k i d a t a a s t h e s t a r t i n g p o i n t :
{ ont }
####
Document :
Douglas Noel Adams ( 1 1 March 1952 − 11 May 2 0 0 1 ) was an E n g l i s h
author , humourist , and s c r e e n w r i t e r , b e s t known f o r The
H i t c h h i k e r ’ s Guide t o t h e Galaxy (HHGTTG) . O r i g i n a l l y a 1978</p>
          <p>BBC r a d i o comedy , The H i t c h h i k e r ’ s Guide t o t h e Galaxy
d e v e l o p e d i n t o a " t r i l o g y " o f f i v e books t h a t s o l d more than
15 m i l l i o n c o p i e s i n h i s l i f e t i m e . I t was f u r t h e r d e v e l o p e d
i n t o a t e l e v i s i o n s e r i e s , s e v e r a l s t a g e p l a y s , comics , a
v i d e o game , and a 2005 f e a t u r e f i l m . Adams ’ s c o n t r i b u t i o n t o
UK r a d i o i s commemorated i n The Radio Academy ’ s H a l l o f
Fame .
####
Q u e s t i o n answer p a i r s :
Q : What i s Douglas Adams an i n s t a n c e o f ?
A : Douglas Adams i s an i n s t a n c e o f human .</p>
        </sec>
      </sec>
      <sec id="sec-9-2">
        <title>Q : What i s Douglas Adams ’ s e x or gender ? A : Douglas Adams ’ s e x or gender i s male .</title>
      </sec>
      <sec id="sec-9-3">
        <title>Q : Where was Douglas Adams born ? A : Douglas Adams was born i n Cambridge .</title>
      </sec>
      <sec id="sec-9-4">
        <title>Q : Where d i d Douglas Adams d i e ? A : Douglas Adams d i e d i n S a n t a Barbara , C a l i f o r n i a .</title>
      </sec>
      <sec id="sec-9-5">
        <title>Q : When was Douglas Adams born ? A : Douglas Adams was born on 1952 −03 −11.</title>
      </sec>
      <sec id="sec-9-6">
        <title>Q : On what d a t e d i d Douglas Adams d i e ? A : Douglas Adams d i e d on 2001 −05 −11.</title>
      </sec>
      <sec id="sec-9-7">
        <title>Q : What o c c u p a t i o n d i d Douglas Adams have ? A : Douglas Adams was a w r i t e r , comedian , and d r a m a t i s t . Q : What l a n g u a g e s d i d Douglas Adams speak , w r i t e , or s i g n ? A : Douglas Adams spoke , wrote , or s i g n e d E n g l i s h .</title>
      </sec>
      <sec id="sec-9-8">
        <title>Q : Where was Douglas Adams e d u c a t e d ?</title>
        <p>A : Douglas Adams was e d u c a t e d a t S t John ’ s C o l l e g e , Cambridge
and Brentwood School , E s s e x .</p>
        <p>Q : What i n s t i t u t i o n i s Douglas Adams an alumni o f ?
A : Douglas Adams i s an alumni o f S t John ’ s C o l l e g e .
Q : What a r e some n o t a b l e works by Douglas Adams?
A : Some n o t a b l e works by Douglas Adams i n c l u d e The H i t c h h i k e r ’ s</p>
        <p>Guide t o t h e Galaxy and Dirk Gently ’ s H o l i s t i c D e t e c t i v e
Agency .</p>
        <p>Q : Was Douglas Adams a member o f any n o t a b l e o r g a n i z a t i o n s ?
A : Yes , Douglas Adams was a member o f Monty Python and The</p>
        <p>I n d e p e n d e n t on Sunday .</p>
      </sec>
      <sec id="sec-9-9">
        <title>Q : What award d i d Douglas Adams r e c e i v e ?</title>
        <p>A : Douglas Adams r e c e i v e d t h e Locus Award f o r B e s t S c i e n c e</p>
        <p>F i c t i o n Novel .</p>
        <p>Q : What i s t h e Commons C a t e g o r y f o r Douglas Adams?
A : The Commons C a t e g o r y f o r Douglas Adams i s " Douglas Adams " .
wdt : MemberOf wd : Monty_Python ;
wdt : MemberOfOrganization wd : The_Independent_on_Sunday ;
wdt : Award wd : L o c u s _ A w a r d _ f o r _ B e s t _ S c i e n c e _ F i c t i o n _ N o v e l ;
wdt : CommonsCategory " Douglas Adams "@en .</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>D'amato</article-title>
          , G. D.
          <string-name>
            <surname>Melo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kirrane</surname>
            ,
            <given-names>J. E. L.</given-names>
          </string-name>
          <string-name>
            <surname>Gayo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>A.-C. N.</given-names>
          </string-name>
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Rashid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Schmelzeisen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>54</volume>
          (
          <year>2021</year>
          ). URL: https://doi.org/10.1145/3447772. doi:
          <volume>10</volume>
          .1145/3447772.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          , E. Cambria,
          <string-name>
            <given-names>P.</given-names>
            <surname>Marttinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>A survey on knowledge graphs: Representation, acquisition, and applications</article-title>
          ,
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>494</fpage>
          -
          <lpage>514</lpage>
          . URL: https://api.semanticscholar.org/CorpusID:211010433.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>N. Zhang,</surname>
          </string-name>
          <article-title>LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities</article-title>
          and
          <string-name>
            <given-names>Future</given-names>
            <surname>Opportunities</surname>
          </string-name>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2305.13168. arXiv:
          <volume>2305</volume>
          .
          <fpage>13168</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakhtin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <article-title>Language models as knowledge bases?</article-title>
          ,
          <source>in: Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2019</year>
          . URL: https://api.semanticscholar.org/CorpusID:202539551.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kumarage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Alghamdi</surname>
          </string-name>
          , H. Liu,
          <article-title>Can Knowledge Graphs Reduce Hallucinations in LLMs? :</article-title>
          <source>A Survey</source>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2311.07914. arXiv:
          <volume>2311</volume>
          .
          <fpage>07914</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Agichtein</surname>
          </string-name>
          , L. Gravano,
          <article-title>Snowball: extracting relations from large plain-text collections</article-title>
          ,
          <source>in: Proceedings of the Fifth ACM Conference on Digital Libraries</source>
          , DL '00,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2000</year>
          , p.
          <fpage>85</fpage>
          -
          <lpage>94</lpage>
          . URL: https://doi.org/10.1145/ 336597.336644. doi:
          <volume>10</volume>
          .1145/336597.336644.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Angeli,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>Position-aware attention and supervised data improve slot filling</article-title>
          , in: M.
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hwa</surname>
          </string-name>
          , S. Riedel (Eds.),
          <source>Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Copenhagen, Denmark,
          <year>2017</year>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>45</lpage>
          . URL: https:// aclanthology.org/D17-1004. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D17</fpage>
          -1004.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mintz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bills</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Snow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <article-title>Distant supervision for relation extraction without labeled data</article-title>
          , in: K.
          <string-name>
            <surname>-Y. Su</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wiebe</surname>
          </string-name>
          , H. Li (Eds.),
          <source>Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Suntec, Singapore,
          <year>2009</year>
          , pp.
          <fpage>1003</fpage>
          -
          <lpage>1011</lpage>
          . URL: https://aclanthology.org/P09-1113.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Dong</surname>
          </string-name>
          , E. Gabrilovich, G. Heitz,
          <string-name>
            <given-names>W.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Strohmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Zhang,</surname>
          </string-name>
          <article-title>Knowledge vault: A web-scale approach to probabilistic knowledge fusion</article-title>
          ,
          <source>in: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , KDD '
          <fpage>14</fpage>
          , New York, NY, USA - August
          <volume>24</volume>
          -
          <issue>27</issue>
          ,
          <year>2014</year>
          ,
          <year>2014</year>
          , pp.
          <fpage>601</fpage>
          -
          <lpage>610</lpage>
          . URL: http: //www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf,
          <string-name>
            <surname>evgeniy GabrilovichWilko HornNi LaoKevin MurphyThomas StrohmannShaohua SunWei ZhangGeremy Heitz.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosselut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rashkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Malaviya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Celikyilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          , COMET:
          <article-title>Commonsense transformers for automatic knowledge graph construction</article-title>
          , in: A.
          <string-name>
            <surname>Korhonen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Traum</surname>
          </string-name>
          , L. Màrquez (Eds.),
          <article-title>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>4762</fpage>
          -
          <lpage>4779</lpage>
          . URL: https://aclanthology.org/P19-1470. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P19</fpage>
          -1470.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>AdaPrompt: Adaptive model training for prompt-based NLP</article-title>
          , in: Y.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Kozareva</surname>
          </string-name>
          , Y. Zhang (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2022</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>6057</fpage>
          -
          <lpage>6068</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .findings-emnlp.
          <volume>448</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          . findings-emnlp.
          <volume>448</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Glass,</surname>
          </string-name>
          <article-title>AutoKG: Constructing Virtual Knowledge Graphs from Unstructured Documents for Question Answering</article-title>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.
          <year>2008</year>
          .
          <volume>08995</volume>
          . arXiv:
          <year>2008</year>
          .08995.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Bertozzi</surname>
          </string-name>
          ,
          <source>AutoKG: Eficient Automated Knowledge Graph Generation for Language Models</source>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2311.14740. arXiv:
          <volume>2311</volume>
          .
          <fpage>14740</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Semnani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. Z.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. S. Lam,
          <article-title>WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>14292</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Trisedya</surname>
          </string-name>
          , G. Weikum,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Neural relation extraction for knowledge base enrichment</article-title>
          , in: A.
          <string-name>
            <surname>Korhonen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Traum</surname>
          </string-name>
          , L. Màrquez (Eds.),
          <article-title>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>229</fpage>
          -
          <lpage>240</lpage>
          . URL: https://aclanthology.org/P19-1023. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P19</fpage>
          -1023.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ostendorf</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <article-title>Hajishirzi, Multi-task identification of entities, relations, and coreferencefor scientific knowledge graph construction</article-title>
          ,
          <source>in: Proc. Conf. Empirical Methods Natural Language Process</source>
          .
          <source>(EMNLP)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T. Castro</given-names>
            <surname>Ferreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gardent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ilinykh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. van der</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moussallem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shimorina</surname>
          </string-name>
          ,
          <article-title>The 2020 bilingual, bi-directional WebNLG+ shared task: Overview and evaluation results (WebNLG+</article-title>
          <year>2020</year>
          ), in: T. Castro Ferreira,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gardent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ilinykh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. van der</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moussallem</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Shimorina (Eds.),
          <source>Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)</source>
          ,
          <source>Association for Computational Linguistics</source>
          , Dublin, Ireland (Virtual),
          <year>2020</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>76</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .webnlg-
          <volume>1</volume>
          .7.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Soh, Extract, Define,
          <string-name>
            <surname>Canonicalize:</surname>
          </string-name>
          <article-title>An LLM-based Framework for Knowledge Graph Construction</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2404</volume>
          .
          <fpage>03868</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>T</surname>
          </string-name>
          . Peng,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <article-title>Is information extraction solved by chatgpt? an analysis of performance, evaluation criteria, robustness</article-title>
          and errors,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>14450</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A. Q.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mensch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bamford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Chaplot</surname>
          </string-name>
          , D. de las Casas,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bressand</surname>
          </string-name>
          , G. Lengyel,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Saulnier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Lavaud</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Stock</surname>
            ,
            <given-names>T. L.</given-names>
          </string-name>
          <string-name>
            <surname>Scao</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lavril</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>W. E.</given-names>
          </string-name>
          <string-name>
            <surname>Sayed</surname>
          </string-name>
          , Mistral 7b,
          <year>2023</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>06825</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2309</volume>
          .
          <fpage>07597</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Josifoski</surname>
          </string-name>
          , N. De Cao,
          <string-name>
            <given-names>M.</given-names>
            <surname>Peyrard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          , R. West, GenIE: Generative information extraction, in: M.
          <string-name>
            <surname>Carpuat</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. de Marnefe</surname>
            ,
            <given-names>I. V.</given-names>
          </string-name>
          <string-name>
            <surname>Meza Ruiz</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the</source>
          <year>2022</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>4626</fpage>
          -
          <lpage>4643</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .naacl-main.
          <volume>342</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .naacl-main.
          <volume>342</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Packed levitated marker for entity and relation extraction</article-title>
          , in: S. Muresan,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Villavicencio (Eds.),
          <source>Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>4904</fpage>
          -
          <lpage>4917</lpage>
          . URL: https://aclanthology. org/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>337</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>337</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dognin</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Padhi</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Melnyk</surname>
          </string-name>
          , P. Das,
          <article-title>ReGen: Reinforcement learning for text and knowledge base generation using pretrained language models</article-title>
          , in: M.
          <article-title>-</article-title>
          <string-name>
            <surname>F. Moens</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Specia</surname>
          </string-name>
          , S. W.-t. Yih (Eds.),
          <source>Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Online and
          <string-name>
            <given-names>Punta</given-names>
            <surname>Cana</surname>
          </string-name>
          , Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>1084</fpage>
          -
          <lpage>1099</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          . emnlp-main.
          <volume>83</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .emnlp-main.
          <volume>83</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. D.</given-names>
            <surname>Do</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <article-title>Chen, Long-context llms struggle with long in-context learning</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <fpage>2404</fpage>
          .
          <year>02060</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>F.</given-names>
            <surname>Brei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Frey</surname>
          </string-name>
          , L.-P. Meyer,
          <article-title>Leveraging small language models for text2sparql tasks to improve the resilience of ai assistance</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2405</volume>
          .
          <fpage>17076</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>