<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Jun</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring In-Context Learning Capabilities of Foundation Models for Generating Knowledge Graphs from Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hanieh Khorashadizadeh</string-name>
          <email>khorashadizadeh@ifis.uni-luebeck.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nandana Mihindukulasooriya</string-name>
          <email>nandana@ibm.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sanju Tiwari</string-name>
          <email>tiwarisanju18@ieee.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jinghua Groppe</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sven Groppe</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Knowledge Graph Construction, Knowledge Graph Completion, Ontology, Large Language Models,</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IBM Research</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad Autonoma de Tamaulipas</institution>
          ,
          <country country="MX">Mexico</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Lübeck</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>co-located with Extended Semantic Web Conference (ESWC)</institution>
          ,
          <addr-line>Hersonissos</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <issue>2023</issue>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Knowledge graphs can represent information about the real-world using entities and their relations in a structured and semantically rich manner and they enable a variety of downstream applications such as question-answering, recommendation systems, semantic search, and advanced analytics. However, at the moment, building a knowledge graph involves a lot of manual efort and thus hinders their application in some situations and the automation of this process might benefit especially for small organizations. Automatically generating structured knowledge graphs from a large volume of natural language is still a challenging task and the research on sub-tasks such as named entity extraction, relation extraction, entity and relation linking, and knowledge graph construction aims to improve the state of the art of automatic construction and completion of knowledge graphs from text.</p>
      </abstract>
      <kwd-group>
        <kwd>Generating</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge Graphs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] enable us to represent knowledge about a given domain in a semantically
rich manner by representing real-world entities as nodes and relations between them as edges.
Such knowledge graphs can be built using standards such as RDF(S) and OWL with well-defined
semantics allowing systems to perform reasoning to infer more information or query them
using structured query languages such as SPARQL.
      </p>
      <p>
        There are several approaches for constructing Knowledge Graphs based on the source of
the knowledge. They can be constructed from structured data, for example, by converting a
relational database into a knowledge graph using mappings such as RDB2RDF [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], or using
semi-structured data, for example, DBpedia from Wikipedia Infoboxes, or manually, for
example, Wikidata using crowdsourcing. Another approach is to construct knowledge graphs
from unstructured sources such as a corpus of text using Natural Language Processing (NLP)
techniques such as Named Entity Recognition (NER), Relation Extraction, Open Information
Extraction, Entity Linking, and Relation Linking. There is a growing interest in the Semantic
Web community to explore such approaches as seen from the workshops such as Knowledge
Graph Generation From Text (Text2KG) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In NLP research, the transformer [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] neural network architecture has led to significant
improvements in many tasks. Language models such as GPT-2/3 [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], BERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
TransformerXL [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], ELMo [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], RoBERTa [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], ALBERT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and XLNet[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] are becoming popular to improve
the results and reducing the human intervention in a wide range of tasks such as searching,
question answering and sentence classification.
      </p>
      <p>
        More recently, there is a focus on building foundation models [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The term foundation
model is used to describe a model that is trained on a very large corpus of unlabelled data
following the self-supervision paradigm and which can be used or adapted to a wide range of
downstream tasks. Foundation models generally show very high transfer learning capabilities.
Transfer learning [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] allows models to acquire knowledge from one task (generally with
unlabelled data in a self-learning manner) and apply it to another separate task. For example, a
model is pre-trained to predict the next word of a sentence but then fine-tuned to perform text
summarization or question answering.
      </p>
      <p>
        Reinforcement Learning from Human Feedback (RLHF) is also used to further improve these
foundation models at scale [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]. The idea is to fine-tune language models in a way that they
can follow a broad range of written instructions. The human-in-the-loop approach is used to
provide feedback based on human preferences as a reward to the reinforcement learning setup.
Approaches such as InstructGPT [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and ChatGPT have used reinforcement learning from
human feedback to improve their models significantly.
      </p>
      <p>
        There are several approaches for adapting a foundation model, for example, for a task such
as the generation of knowledge graphs from text. One is to perform fine tuning [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] (also
known as model tuning) for the task of knowledge-graph generation or its sub-tasks such as
relation extraction. This will be essentially updating all model parameters with task-specific
training. For models in the scale of foundation models, this requires weeks of GPU time. Prompt
tuning [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] or Prefix-Tuning [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], an approach that takes relatively less computational power,
is to keep the model parameters frozen and only adds some tunable tokens per downstream
task to be prefixed to the input text. Finally, prompt design, where the model is used as it is, but
the prompt or the input to the model is designed to provide a few examples of the task [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        In-context learning [
        <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
        ] is about teaching a model to perform a new task only by providing
a few demonstrations of input-output pairs at inference time. The model is supposed to
understand the task and the type of output required by the context and instructions provided
in the input. This requires the least computational power because no training or tuning is
involved.
      </p>
      <p>With these recent advancements of large prompting-based language models such as OpenAI’s
ChatGPT (175B params), Meta’s Galactica (120B params), and Google’s Bard (137B params)
have been released. As these large foundation models are expensive to fine-tune and train, our
goal is to check their capabilities for the task of generating knowledge graphs with in-context
learning in a prompt.</p>
      <p>
        Nevertheless, we need to perform further studies to understand not only the capabilities but
also the limitations of foundation models. For instance, these models hallucinate generating
non-factual and nonsensical text in a fluent and confident manner [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Furthermore, there are
arguments that they are simply memorizing information without properly understanding the
meaning and reasoning in a logical manner. Similarly, there is skepticism about the potential
biases these models might have based on their training data and their impact on AI ethics and
fairness. In extreme cases, these models might even demonstrate harmful behaviors.
      </p>
      <p>These limitations of foundation models motivate using knowledge graphs instead of just only
foundation models from now on because knowledge graphs generally contain manually checked
facts and knowledge. In addition, knowledge graphs represent the facts in a symbolic manner
that they can be inspected and validated by humans. Such knowledge will also enable explainable
AI as it can be used to provide plausible explanations to AI model behaviors. Indeed in future
work, one might investigate the possibilities of combining the technologies of foundation models
and knowledge graphs to complement each other and overcome the limitations of both.</p>
      <p>
        Foundation models inherently contain knowledge acquired from large corpora of text that
seems to be only partly available in structured data sources. For instance, there is a vast
amount of information available related to the COVID-19 pandemic [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] in unstructured data
sources such as research papers, news articles, Wikipedia, etc. but only a small portion of that
information is available in knowledge graphs such as Wikidata.
      </p>
      <p>It seems to be interesting to explore the possibilities of knowledge graph creation and
completion based on the foundation models in order to save eforts and hence costs of the
traditional ways using natural language processing on large-scale text collections for information
not available in structured data.</p>
      <p>Given this background, we will perform an initial exploration of the following research
questions:
• R1: Can we use the knowledge acquired during pre-training of LLMs for Knowledge
Graph completion to fill in missing information with a defined ontology with a few
examples?
• R2: Can we use LLMs to extract facts to generate knowledge graphs from unseen text
that is provided during inference time?
• R3: Given an ontology, can we automatically generate prompts for extracting the relevant
triples for the purpose of Knowledge Graph construction?
• R4: Given a knowledge graph, can we identify the missing information and create prompts
to perform Knowledge Graph completion using foundation models
• R5: Given a Knowledge Graph with some false facts, can we use LLMs to check the given</p>
      <p>Knowledge Graph and determine which facts are not true for the purpose of fact-checking?
• R6: What are the capabilities and limitations of models such as ChatGPT for the above
scenarios?
• R7: What are the disadvantages and risks associated with such an approach?
The rest of the paper is organized as follows: Section 2 provides an overview of the background
of the main concepts discussed in this paper. Section 3 illustrates an architecture to position
this work in a high-level big picture to motivate where this work would fit it. Section 4 provides
a qualitative analysis of the information extraction capabilities of foundation models in the
context of knowledge graph generation from text. Section 5 discusses some of the advantages
of the proposed approach and its challenges and Section 6 provides some conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.1. Foundation Models</title>
        <p>
          Foundation Models are a general class of models for building artificial intelligence (AI) systems
and are trained on enormous data by using self-supervision. These models include GPT-3 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
BERT [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] and CLIP [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Foundation models are not new, treated as a general paradigm of AI,
and based on self-supervised learning and deep neural networks [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Transfer learning [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
and scale are the key elements of foundation models and pre-training is an efective approach of
transfer learning. Transfer learning is the base for constructing the foundation model and scale
helps to strengthen these models. There are diferent stages involved in foundation models:
data creation, data curation, training, adaptation, and deployment. Data creation is generally
a human-centric approach and it is created by humans and most of the created data is about
people. After the data creation, it is required to curate into datasets for the training of foundation
model on curated datasets. Adaptation is the fourth stage to create a new model based on
the foundation model to perform some tasks such as document summarization. Finally, the
foundation model needs to be deployed such that it can be used as an AI system.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Knowledge Graphs</title>
        <p>
          Web technologies have revolutionized the way information was delivered and accessed. They
have gone through the era of 1.0 and 2.0 and are stepping into the era of 3.0. Web 1.0 provided
techniques for quick information publishing and access, and it is a huge collection of content
provided by website owners. Web 2.0 brought the interactive capability into Web 1.0 and enabled
users to be contributors as well as consumers of web content. In this area, the web is a collection
of content, which contains the collective knowledge of the public. However, Web 2.0 is lack
of the capability to extract knowledge from its content and the desire to use the collective
knowledge hidden in web content gave birth to Semantic Web technology, which is considered
an important ability of Web 3.0. Another enabler of Web 3.0 is the vision of Metaverse [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and
this aspect will not be discussed here since it is out of the scope of this work.
        </p>
        <p>
          At most times, the semantic web is the underlying technology of knowledge graphs. Although
the knowledge graph [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] is defined diferently by diferent works, most definitions follow
the RDF1 data model of the semantic web. RDF describes knowledge as a collection of triples
of subject, predicate and object, where the predicate indicates the relationship between the
subject and the object. For example, the piece of information ”COVID-19 is a pandemic that
broke out in 2019.” can be described as two RDF triples: &lt;covid19, is, pandemic&gt;, &lt;covid19,
breakoutInYear, 2019&gt;. A collection of triples can be visualized as a directed graph, where the
subject and object are nodes of the graph and the predicate is an edge directed from the subject
node to the object node. In a summary, RDF graphs can be seen as knowledge graphs and any
knowledge graphs can be transformed into collections of triples. Instead of RDF triples, the
knowledge graph has become the widely used term to describe the relationship of entities. On
one side, the introduction of Google’s knowledge graph in 2012 contributed to the popularity
of the term [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], on the other side, knowledge graph does highlight the nature of the data and
thus is a more appropriate and also an impressive term.
        </p>
        <p>Apart from RDF, the semantic web also provides other standards, including SPARQL2, a
standard query language for RDF graphs, and ontology languages such as RDFS3, OWL4 and
SHACL5 for defining the structure and vocabulary of RDF data.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Knowledge Graph Construction from Natural Languages</title>
        <p>The Semantic Web is ready to define, represent and query knowledge graphs. What is missing is
a way to build knowledge graphs from web content. The construction approaches of knowledge
graphs have evolved from semantic publishing to machine learning. The change from RDF triples
to knowledge graphs is basically just a change of terminology, the evolution of construction
approaches is a giant leap because it means a change from a completely handcrafted approach
to an automatic one.</p>
        <p>Natural Language Processing (NLP): The goal of NLP is to enable machines to understand
the meaning of texts like humans. With this capability, machines can do analysis and processing
tasks of texts for human beings. NLP has been widely used for machine translation, text
classification, and sentiment analysis. An application on the rise is knowledge extraction, which
enables the automatic construction of knowledge graphs from a huge collection of web content.
NLP has evolved from rule-based to one powered by the learning ability of AI.</p>
        <p>Rule-based: Early NLP relied on complex sets of hand-written rules, which are the formal
representations of sophisticated linguistic knowledge and common-sense reasoning. Rule-based
NLP creates highly precise solutions, but manually defining complicated rules is a dificult and
time-consuming task. A large number of rules is needed to perform an NLP task and such
rule-based NLP also sufers from the issue of scalability.
1https://www.w3.org/RDF/
2https://www.w3.org/2001/sw/wiki/SPARQL
3https://www.w3.org/2001/sw/wiki/RDFS
4https://www.w3.org/OWL/
5https://www.w3.org/2001/sw/wiki/SHACL</p>
        <p>Learning-based: Today’s NLP integrates machine learning technology and has the capability
to automatically learn complex rules. The learning-based NLP is reaching a new milestone
in the automatic understanding and analysis of texts. It uses machine learning algorithms
to train language models on large amounts of data in order to get a solution to the given
problem. The technology of machine learning enables a model to steadily optimize itself, so the
solution will become increasingly accurate. A number of language models have been trained
and can be used for downstream NLP tasks, such as BERT, GPT2, ChatGPT3, RoBERTa, ALBERT,
ELECTRA, DeBERTa, XLNet, and T5. While AI academic community is usually of the opinion
that state-of-the-art results obtained by just using more data and more computational resources
are not research novelties [29], the huge language models like ChatGPT6 (with its 175 billion
parameters, 300 billion words, 570GB web content, 10K GPUs and the cost of $4.6 million for a
single training session) does demonstrate the potential of AI-powered NLP.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Existing Tools for Language Models</title>
        <p>Language models are AI-based models to create and analyze text. These models are trained to
predict the next word in the text, speech recognition, spelling correction, etc. Language models
are the basis for natural language processing (NLP) activities such as sentiment analysis and
speech-to-text. They are generally categorized into two categories: Statistical Language Models
and Neural Language Models7. There are several models8 introduced by diferent groups such
as OpenAI, Google, Deepmind, Anthropic, Baidu, Huawei, Meta, AI21 Labs, LG AI Research
and NVIDIA as discussed in Table 1.</p>
        <p>ChatGPT is a language model powered by GPT-3. These models are particularly designed for
conversational tasks and pre-trained on various topics, and can guide in diferent tasks such
as providing information and answering questions. LaMDA is a type of Transformer-based
model and can assist in free-flowing conversations. BARD is a chatbot model to answer natural
6https://openai.com/blog/chatgpt
7https://medium.com/unpackai/language-models-in-ai-70a318f43041
8https://www.marktechpost.com/2023/02/22/top-large-language-models-llms-in-2023-from-openai-google-aideepmind-anthropic-baidu-huawei-meta-ai-ai21-labs-lg-ai-research-and-nvidia/?amp
language questions with the help of NLP and Machine Learning. PaLM is a language model
based on a few-shot learning approach to handle diferent tasks. Model mT5 is a text-to-text
transformer model trained on the mC4 corpus.</p>
        <p>Gopher is DeepMind’s language model and is relatively more eficient than existing large
language models in diferent tasks such as answering questions and logical reasoning. Chinchilla
also works the same as Gopher. It uses relatively less computing for fine-tuning tasks. Sparrow
is a chatbot designed by DeepMind to respond to users’ questions accurately and lessen the
risk of unsafe and incorrect answers. Claude is an Al-based conversational model powered
by advanced natural language processing and trained using a Constitutional Al technique.
The Ernie 3.0 model was developed by Baidu and Peng Cheng Laboratory trained on huge
unstructured data and acquired state-of-the-art results in over 60 Natural Language Processing
tasks. Ernie Bot is an AI-powered Chinese language model relatively similar to OpenAI’s
ChatGPT, able to language understanding, language generation, and text-to-image generation.
PanGu-Alpha was designed by Huawei as a Chinese-language model equivalent to OpenAI’s
GPT-3. It is highly accurate to complete several language tasks such as question answering,
text summarization, and dialogue generation. OPT-IML is released by Meta as a pre-trained
language model and fine-tuned for strengthening the performance on natural language tasks
such as text summarization, translation, and question answering.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Architecture for Knowledge Graph Construction and</title>
    </sec>
    <sec id="sec-4">
      <title>Completion from language Models</title>
      <p>In this section, we present a potential architecture (see Figure 1) with a set of components for
using foundational models for generating knowledge graphs from text corpora. It is important
to note that even though we present this overall architecture, in the paper, our focus is only on
one component, which is information extraction using the foundation model. The objective
of presenting this architecture is to position our work in the overall big picture and motivate
the usefulness of that work in a practical setting. We plan to work on the other components in
future work to implement a pipeline for automatically generating knowledge graphs from text.</p>
      <p>In the context of knowledge graph generation, we especially target on the following two
possible scenarios: (a) knowledge graph construction from scratch and (b) knowledge graph
completion where an incomplete knowledge graph is extended with missing facts. This pipeline
aims to address both of these use cases.</p>
      <p>Inputs: There are three possible inputs to the pipeline; an ontology, optionally text corpora,
and an incomplete knowledge graph. The goal of the proposed pipeline is to generate a
knowledge graph from text driven by an existing ontology, thus, one of the inputs to the process
is the ontology. The ontology will guide which concepts and relations have to be used for
information extraction for generating the triples. The facts can come from two sources: The
knowledge that the foundation model acquired during its pre-training with a large volume of
text (mainly open domain knowledge) or new text provided to it through the prompt (e.g., facts
internal to a company from internal documents). In the latter case, the new text will come
from a text corpora provided as input. In addition, an incomplete knowledge graph can also
be provided as input. This will allow the pipeline to identify what are the missing facts and
Optional</p>
      <p>Text Corpora
Incomplete
Knowledge Graphs</p>
      <p>Automatic</p>
      <p>Prompt
Generation</p>
      <p>Information
Extraction</p>
      <p>using
Foundation</p>
      <p>Models
The step that is the focus of this paper.</p>
      <p>Foundation Model
(e.g., GPT)</p>
      <p>Post processing</p>
      <p>and
RDFication</p>
      <p>Output
Validation</p>
      <p>Generated or enriched
Knowledge Graphs
with new facts
generate drive the information extraction to fill those missing information.</p>
      <p>Automatic Prompt Generation: Depending on the availability of computing resources and
training data, there are diferent ways foundation models can be used for generating knowledge
graphs from text by utilizing techniques such as fine-tuning, prompt tuning, or simple prompt
designs. In the scope of this paper, we will focus only on the prompt design approaches.</p>
      <p>
        As the examples in Section 4 show, in order to perform the information extraction, we need to
generate a prompt that provides instructions to the model on what needs to be extracted and also
how to format the output. The goal of this component is to analyze the ontology and optionally
the incomplete knowledge graph and generate prompts with concepts and relations to extract
specific information (see Fig. 2). This will require research related to prompt engineering [
        <xref ref-type="bibr" rid="ref19 ref6">6, 19</xref>
        ]
as well as RDF profiling techniques [ 30].
      </p>
      <p>Information Extraction using Foundation Models The goal of this component is to
execute the generate prompts against the foundation model and generate the output. Some
foundation models are publicly and freely available to download in repositories such as
HuggingFace while some others are proprietary and made available through APIs based on a subscription.
Depending on the use case, requirements, and the target foundation model, this component
will have the necessary access mechanisms to run the inference of the model with a given input
prompt.</p>
      <p>This component can be further improved using new paradigms that include teaching language
models to use external tools such as Toolformer[31] or TALM [32]. This will allow the model to
use other tools, for example, web search or other API calls necessary to complete the information
request.</p>
      <p>Post-processing and RDFication: As we can see in examples in Section 4 (see Fig. 2 or Fig.
3), we are instructing the model to generate the facts in a simpler triplet format than a verbose
RDF syntax. As the prompt and the response of the model are limited in the number of tokens9,
this allows the response to be less verbose. The output of the model can then be post-processed
and converted into RDF. As the pipeline has an ontology as input and the automatic prompt
design component is aware of the concept and relations used to generate the prompt, this
process can be done in a deterministic manner.</p>
      <p>Output Validation: As we will discuss in Section 5, one of the challenges in this process is
the fact that the language models could generate inaccurate or outdated facts due to reasons
such as hallucinations, incomplete and outdated training data, or even due biases in the training
data. Thus, it is important that the facts generated by the model are validated before they are
used in the knowledge graph. This is an open research question that requires further research.
Certain validations can be done by performing reasoning with the generated triples to ensure
there are no inconsistencies both at TBox and ABox levels. Another solution is to follow a
Human-in-the-loop approach [33, 34] or using crowd-sourcing if applicable [35].</p>
    </sec>
    <sec id="sec-5">
      <title>4. Qualitative analysis of information extraction capabilities of foundation models</title>
      <p>This section provides a qualitative analysis of the information extraction capabilities of
foundation models such as GPT using a set of examples that are based on the research questions.</p>
      <p>R1: Can we use the knowledge acquired during pre-training of LLMs for Knowledge
Graph completion with a defined ontology with a few examples?</p>
      <p>Since most Covid-19 KGs have dealt with biomedical aspects of the disease and there has been
less efort on the societal, economic, and climate change impacts of the disease[ 36], We can ask
Chatgpt if it is able to fill in the missing information in the knowledge graphs. Figure 2 illustrates
the use case that is studied in this question. It asks the model to generate a set of triples about
COVID-19 vaccines and their manufacturers based on the knowledge that it has acquired during
the pre-training. The output shows that the model understood the task and generated the
triples following the output format based on the instructions in the prompt. Out of the 20
triples produced by the model, only one is factually inaccurate: The tenth triple is incorrect
because ZF2001, the vaccine based on RDB-Dimer, is manufactured by Anhui Zhifei Longcom,
in collaboration with the Institute of Microbiology at the Chinese Academy of Sciences and
not by Novavax. The model might have made this mistake because the Wikipedia page states,
‘ZF2001 employs technology similar to other protein-based vaccines in Phase III trials from
Novavax, Vector Institute, and Medicago’10. Figure 3 illustrated another example with the entity
“COVID-19” and the relation “is transmitted by”. Also for this example, the model was able to
ifnd information that is not currently available in Wikidata.</p>
      <p>Similarly, Table 2 shows results for similar exercises on 10 other examples. For each example,
we selected an entity and a relation from Wikidata and generated a prompt similar to the one
in Fig. 2. Then each of the generated triples is manually checked to see if they are factually
correct. Except for some facts in three distinct requests, all the other facts generated by the
model were factually correct. But it must be mentioned that on the COVID-19 pandemic impact
on tourism (Q9084098) entity, there have been several redundancies formulated by chatgpt, like
’Reduction in tourism spending’ and ’Decrease in tourism spending’ or ’Loss in revenue for
airlines’ and ’Decrease in airline revenue’ which tend to be the same stuf but has been stated
as two diferent entities by chatgpt. It is worth noting that in the third example, Prevention of
SARS-CoV-2/COVID-19 (Q102056722), chatgpt provided some items that act as therapies but
can be concluded also as preventative measures.</p>
      <p>R2: Can we use LLMs to extract facts to generate knowledge graphs from unseen
text that is provided during inference time?</p>
      <p>In Figure 4, an example prompt has been submitted to see if chatgpt is able to extract entities
and relations from raw text provided during inference. The prompt is about covid-19 symptoms
and chatgpt extracted well all relations and entities.</p>
      <p>Similar to the previous research question, Table 3 shows an analysis of a set of examples
following this second use case. Some entities and relations were taken from wikidata. Mostly
there are lots of missing relations in wikidata for covid-19 related items. We provided some
related text for chatgpt and asked that to make triples. As the table illustrates chatgpt acted well.
In the second example, only 5 out of 8 triples were detected and chatgpt could not extract the
treatments from the sentence ’Other corticosteroids, such as prednisone, methylprednisolone
or hydrocortisone, may be used if dexamethasone isn’t available.’ And in the third example,
the long-term efects of COVID-19, the following sentence was not detected by chatgpt ’Other
symptoms were reported, which were not included in the publications, including brain fog and
neuropathy’. It might be the case that chatgpt was not able to extract the triples as it is stated in
the phrase ”which were not included in the publications”. On example eight, chatgpt was only
able to extract 7 out of 8 facts. There has been a sentence in the text ’In some cases, governmental
decision making created shortages, such as when CDC prohibited the use of any diagnostic
test other than the one it created.’. Chatgpt extracted the following wrong triple: &lt;diagnostic
test other than the one it created, instance of, COVID-19 related shortage&gt;. the term ’when’ is
vital to take into consideration, although the last sentence has the phrase ’created shortages,
such as’. In the ninth example, COVID-19 disease in pregnancy, chatgpt has detected the wrong
triple on the sentence, ’A review in 2022 suggests that pregnant women are at increased risk
of severe COVID-19 disease, with an increased rate of being hospitalized to the intensive care
unit and requiring ventilation, but was not associated with a statistically significant increase
in mortality.’. The wrong triple is ’COVID-19 disease in pregnancy-efect-mortality’. Chatgpt
neglected the phrase ’was not associated with’. One other stuf to take into account is the fact
that prompt design is really crucial in extracting triples by chatgpt. It is essential to provide an
example triple from the same text that is given to chatgpt.</p>
      <p>R3: Given an ontology, can we automatically generate prompts for extracting the
relevant triples for the purpose of Knowledge Graph construction?</p>
      <p>Figure 5 shows an example of this use case. We have provided a prompt with a toy ontology
about diseases which contains 7 concepts such as disease, symptom, organ, drug, and 6 relations
with their domain and range concepts. As the example’s output shows, the model seemed
to understand the task and provided an output with a set of triples that followed the given
ontology. In some triples, it deviated from the given domain constraints of the relation, for
example, in triples 10 and 17, it provided “anatomical location” for the disease instead of the
symptom. Nevertheless, the extracted facts are correct in those cases. Overall, the generated
triples seem to follow the schema and are factually correct.</p>
      <p>Because there are token limits for both the prompt input as well as the output, in order to
follow a similar setup for a larger real-world ontology, we would have to perform an iterative
process. As the initial results for the toy example have shown promising results, in future work,
we will follow up with a larger ontology to extract a list of triples to construct a knowledge
graph given an ontology.</p>
      <p>R4: Given a knowledge graph, can we identify the missing information and create
prompts to perform Knowledge Graph completion using foundation models
There might be missing entities or links in a knowledge graph. This missing info can be
predicted by the entity and link prediction methods, then we can design prompts and feed that
to LLMs. So we can pick an entity that misses links and ask the foundation model to provide us
with related relations for the entity as depicted in Figure 6.</p>
      <p>R5: Given a Knowledge Graph with some false facts, can we use LLMs to check the
given Knowledge Graph and determine which facts are not true for the purpose of
fact-checking?
That task is a bit tricky with chatgpt as it sometimes hallucinates and provides wrong data. But
overall it is possible to compare what chatgpt provides as relations and objects and compare
with what is already in the knowledge graph. As Figure 7 shows chatgpt only found ’Are very
hungry’ and neglected the wrong fact ’urinating a lot’. However, if the training data is biased,
chatgpt might provide some wrong output. Or if doesn’t have access to the data, like if the
knowledge graph contains some data or statistics that belong to a certain company then it
would provide imprecise results. So although LLMs could be considered a valuable asset in
fact checking, they must be supplemented with some other methods for better accuracy and
precision.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Discussion</title>
      <p>In this section, we will discuss the advantages of having an automatic Knowledge Graph
construction/completion pipeline and also reflect on the current challenges. These challenges
open the door to future research to mitigate them and make such automatic knowledge graph
construction pipeline more robust.</p>
      <sec id="sec-6-1">
        <title>5.1. Advantages</title>
        <sec id="sec-6-1-1">
          <title>5.1.1. Cost-efective Knowledge Graph Creation and Completion</title>
          <p>As shown in Section 4, foundation models inherently contain a large amount of information that
is not yet available as structured data. If we need to fill that information with the help of domain
experts or crowd-sourcing, it requires a vast amount of human efort. There are high eforts
needed causing high costs to generate or complete knowledge graphs based on large-scale text
collections. It seems to be less cost-intensive and faster to use pre-trained language models for
knowledge graph creation and completion.</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>5.1.2. Scalability</title>
          <p>If we can employ an automatic pipeline for knowledge base creation and construction with
minimal human efort, we can scale the creation and completion of the knowledge graphs just
with computational resources. If humans have to manually construct and curate the knowledge
graphs, it is not scalable, especially for custom knowledge graphs, for example, for a given
organization.</p>
        </sec>
        <sec id="sec-6-1-3">
          <title>5.1.3. Continuous updates for evolving knowledge graphs</title>
          <p>If we can automatically convert unstructured natural language text to knowledge graphs, we
will be able to convert the most up-to-date information coming from sources such as news
articles or social media posts into facts in knowledge graphs.</p>
        </sec>
        <sec id="sec-6-1-4">
          <title>5.1.4. Size of Language Models versus Raw Text for Knowledge Graph Creation and</title>
        </sec>
        <sec id="sec-6-1-5">
          <title>Completion</title>
          <p>The size of language models is less than and relatively compact compared to the original raw
texts used for pre-training of the language models. Hence it is easier to set up a system for
knowledge graph construction and completion using pre-trained language models instead of
natural language processing of large-scale text collections.</p>
          <p>If the large-scale text collections are not available for download in one (or a set of) compressed
ifles, then a crawler is needed to retrieve the text collection increasing the technical complexity
and the processing time. Compared to using crawlers, if the language model cannot be installed
locally, then it is relatively easy to communicate with the corresponding chatbot by generating
the prompts and processing the answers for knowledge graph creation and completion by using
typically a well-defined web API of the chatbot.</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>5.2. Challenges</title>
        <sec id="sec-6-2-1">
          <title>5.2.1. Hallucinations</title>
          <p>
            Foundation models and large language models are known to create hallucinations when they
generate natural language where they invent non-existing facts (also referred to as being
unfaithful to the source content) or nonsensical text in a fluent and confident manner [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ]. This
phenomenon has been observed in many language generation tasks such as conversational
dialogues, text summarization, generative question answering, data-to-text generation and
machine translation. For a human or a downstream system, it can be hard to identify which
outputs of the model are non-factual. It can have undesired outcomes, and failures in the
downstream tasks if the hallucinations can not be detected and filtered out. This could result in
bad user experiences in real-world applications.
          </p>
        </sec>
        <sec id="sec-6-2-2">
          <title>5.2.2. Bias and Fairness</title>
          <p>Though the large language models have recently shown impressive results on many academic
benchmarks for various NLP tasks, it is still less understood what diferent types of biases
exist in these models [37]. As the training data could reflect societal biases that discriminate
against certain groups of individuals in an unfair manner, the outputs of these models could be
susceptible to such biases. NLP research communities are working towards defining methods to
uncover such biases in large language models and mitigating them ensuring both individual and
group fairness of the results. If not addressed properly, knowledge graphs constructed using
these language models could inherit some of the biases from these models.</p>
        </sec>
        <sec id="sec-6-2-3">
          <title>5.2.3. High Computational Resources</title>
          <p>Foundational models are inherently large in their size. For example, OpenAI’s ChatGPT has
175B parameters, Meta’s Galactica has 120B parameters, and Googgle’s Bard 137B parameters,
BigScience’s BLOOM 176B parameters, and so on. Thus, these models require high computation
resources to run including GPUs, and large memories.</p>
        </sec>
        <sec id="sec-6-2-4">
          <title>5.2.4. Automatic Prompt Design</title>
          <p>One of the requirements of these instruction-based foundation models is to come up with
prompts for all the information that needs to be extracted. For example, if we have an ontology
(TBox) and want to populate a knowledge graph with facts (ABox), it will require generating
correct prompts and optionally examples or demonstrators. Further research and exploration
are needed to understand if that can be easily done through templates.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions and Future Work</title>
      <p>In the era of rapid advancements in large language models or foundation models and their
applications, it is important to explore how these large language models can be used to
improve knowledge graphs and inversely how Knowledge Graphs can be used to improve large
language models. In this work, we have evaluated a large foundation model, i.e., GPT-3.5 using
ChatGPT, for the purpose of understanding its capabilities for tasks related to knowledge graph
construction and completion. We made a qualitative analysis of ChatGPT on knowledge graph
construction and completion based on research questions discussed earlier in the paper. As the
results show ChatGPT can be considered a valuable source in knowledge graph construction
and completion but we should keep in mind that there are some challenges including bias,
hallucinations, and high computational costs. Another stuf that needs to be considered is the
fact that prompt design is an extremely important matter in this area. As improper prompt
design might lead to inaccurate results.</p>
      <p>In future work, we plan to implement an automatic pipeline that uses foundation models to
perform information extraction and generate knowledge graphs from text. There are several
open research challenges that need to be addressed to accomplish this including automatic
prompt generation using ontologies and validation of the generated output. We believe
complementary use of foundation models and knowledge graphs opens up several new research
directions and we plan to further explore each of these areas in the context of knowledge graph
generation from text.
[29] A. Rogers, How the transformers broke nlp leaderboards, Posted on the Hacking Semantics
blog: https://hackingsemantics. xyz/2019/leaderboards (2019).
[30] N. Mihindukulasooriya, M. R. A. Rashid, G. Rizzo, R. García-Castro, O. Corcho, M.
Torchiano, Rdf shape induction using knowledge base profiling, in: Proceedings of the 33rd
Annual ACM Symposium on Applied Computing, 2018, pp. 1952–1959.
[31] T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda,
T. Scialom, Toolformer: Language models can teach themselves to use tools, arXiv preprint
arXiv:2302.04761 (2023).
[32] A. Parisi, Y. Zhao, N. Fiedel, Talm: Tool augmented language models, arXiv preprint
arXiv:2205.12255 (2022).
[33] R. M. Monarch, Human-in-the-Loop Machine Learning: Active learning and annotation
for human-centered AI, Simon and Schuster, 2021.
[34] X. Wu, L. Xiao, Y. Sun, J. Zhang, T. Ma, L. He, A survey of human-in-the-loop for machine
learning, Future Generation Computer Systems (2022).
[35] M. Acosta, A. Zaveri, E. Simperl, D. Kontokostas, S. Auer, J. Lehmann, Crowdsourcing
linked data quality assessment, in: The Semantic Web–ISWC 2013: 12th International
Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part
II 12, Springer, 2013, pp. 260–276.
[36] S. Groppe, S. Tiwari, H. Khorashadizadeh, J. Groppe, T. Groth, F. Benamara, S. Sahri, Short
analysis of the impact of covid-19 ontologies, in: International Semantic Intelligence
Conference (ISIC 2022), online, 2022.
[37] B. C. Kwon, N. Mihindukulasooriya, An Empirical Study on Pseudo-log-likelihood Bias
Measures for Masked Language Models Using Paraphrased Sentences, in: Proceedings
of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022),
Association for Computational Linguistics, Seattle, U.S.A., 2022, pp. 74–79. URL: https:
//aclanthology.org/2022.trustnlp-1.7. doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 2 . t r u s t n l p - 1 . 7 .</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato, G. d. Melo,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Knowledge</surname>
            <given-names>graphs</given-names>
          </string-name>
          ,
          <source>ACM Computing Surveys (CSUR) 54</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Halb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Idehen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Thibodeau</given-names>
            <surname>Jr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ezzat</surname>
          </string-name>
          ,
          <article-title>A survey of current approaches for mapping of relational databases to rdf</article-title>
          ,
          <source>W3C RDB2RDF Incubator Group Report</source>
          <volume>1</volume>
          (
          <year>2009</year>
          )
          <fpage>113</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mihindukulasooriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kejriwal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Bozzato</surname>
            ,
            <given-names>V. A.</given-names>
          </string-name>
          <string-name>
            <surname>Carriero</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hahmann</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Zimmermann (Eds.),
          <source>Proceedings of the 1st International Workshop on Knowledge Graph Generation From Text and the 1st International Workshop on Modular Knowledge co-located with 19th Extended Semantic Conference (ESWC</source>
          <year>2022</year>
          ), Hersonissos, Greece, May
          <year>30th</year>
          ,
          <year>2022</year>
          , volume
          <volume>3184</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3184</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <source>OpenAI blog 1</source>
          (
          <year>2019</year>
          )
          <article-title>9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>J. D. M.-W. C. Kenton</surname>
            ,
            <given-names>L. K.</given-names>
          </string-name>
          <string-name>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of naacL-HLT</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , J. Carbonell,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <article-title>Transformer-xl: Attentive language models beyond a fixed-length context</article-title>
          , arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>02860</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Neumann</surname>
          </string-name>
          , R. L.
          <string-name>
            <surname>Logan</surname>
            <given-names>IV</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith,</surname>
          </string-name>
          <article-title>Knowledge enhanced contextual word representations</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>04164</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , R. Soricut,
          <string-name>
            <surname>Albert:</surname>
          </string-name>
          <article-title>A lite bert for self-supervised learning of language representations</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>11942</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , J. Carbonell,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bommasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Hudson</surname>
          </string-name>
          , E. Adeli,
          <string-name>
            <given-names>R.</given-names>
            <surname>Altman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Arora</surname>
          </string-name>
          , S. von Arx,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bohg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosselut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Brunskill</surname>
          </string-name>
          , et al.,
          <article-title>On the opportunities and risks of foundation models</article-title>
          ,
          <source>arXiv preprint arXiv:2108.07258</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thrun</surname>
          </string-name>
          , L. Pratt, Learning to learn, Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Christiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leike</surname>
          </string-name>
          , T. Brown, M. Martic,
          <string-name>
            <given-names>S.</given-names>
            <surname>Legg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Deep reinforcement learning from human preferences</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Stiennon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lowe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Voss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Christiano</surname>
          </string-name>
          ,
          <article-title>Learning to summarize with human feedback</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>3008</fpage>
          -
          <lpage>3021</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Wainwright</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Agarwal,
          <string-name>
            <given-names>K.</given-names>
            <surname>Slama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ray</surname>
          </string-name>
          , et al.,
          <article-title>Training language models to follow instructions with human feedback</article-title>
          ,
          <source>arXiv preprint arXiv:2203.02155</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruder</surname>
          </string-name>
          ,
          <article-title>Universal language model fine-tuning for text classification</article-title>
          ,
          <source>in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>328</fpage>
          -
          <lpage>339</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>B.</given-names>
            <surname>Lester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <article-title>The power of scale for parameter-eficient prompt tuning</article-title>
          ,
          <source>in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Online and
          <string-name>
            <given-names>Punta</given-names>
            <surname>Cana</surname>
          </string-name>
          , Dominican Republic,
          <year>2021</year>
          , pp.
          <fpage>3045</fpage>
          -
          <lpage>3059</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .emnlp-main.
          <source>243. doi:1 0 . 1 8</source>
          <volume>6 5 3</volume>
          / v 1 /
          <article-title>2 0 2 1</article-title>
          . e m n l p -
          <source>m a i n . 2</source>
          <volume>4</volume>
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>X. L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Prefix-tuning: Optimizing continuous prompts for generation, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th</article-title>
          <source>International Joint Conference on Natural Language Processing (Volume 1: Long Papers) abs/2101</source>
          .00190 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Artetxe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          , L. Zettlemoyer,
          <article-title>Rethinking the role of demonstrations: What makes in-context learning work?</article-title>
          ,
          <source>in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>11048</fpage>
          -
          <lpage>11064</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .emnlp-main.
          <volume>759</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raghunathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          , T. Ma,
          <article-title>An explanation of in-context learning as implicit bayesian inference</article-title>
          ,
          <source>arXiv preprint arXiv:2111</source>
          .
          <year>02080</year>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Frieske</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ishii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>Survey of Hallucination in Natural Language Generation</article-title>
          , ACM Computing Surveys (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gruenwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jain</surname>
          </string-name>
          , S. Groppe (Eds.),
          <source>Leveraging Artificial Intelligence in Global Epidemics, Elsevier</source>
          ,
          <year>2021</year>
          . URL: https://www.elsevier.com/books/leveraging-artificialintelligence-in
          <string-name>
            <surname>-</surname>
          </string-name>
          global-epidemics/gruenwald/978-0-
          <fpage>323</fpage>
          -89777-8.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>D.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <article-title>What is the 'metaverse'? facebook says it's the future of the internet</article-title>
          , Washington Post (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. N.</given-names>
            <surname>Al-Aswadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gaurav</surname>
          </string-name>
          ,
          <article-title>Recent trends in knowledge graphs: theory and practice</article-title>
          ,
          <source>Soft Computing</source>
          <volume>25</volume>
          (
          <year>2021</year>
          )
          <fpage>8337</fpage>
          -
          <lpage>8355</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ehrlinger</surname>
          </string-name>
          , W. Wöß,
          <article-title>Towards a definition of knowledge graphs</article-title>
          .,
          <source>SEMANTiCS</source>
          (Posters, Demos, SuCCESS)
          <volume>48</volume>
          (
          <year>2016</year>
          )
          <article-title>2</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>