<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Creative Storytelling with Language Models and Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xinran Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilaria Tiddi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vrije Universiteit Amsterdam</institution>
          ,
          <addr-line>De Boelelaan 1105, 1081 HV Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <fpage>5</fpage>
      <lpage>13</lpage>
      <abstract>
        <p>Automated story generation is a popular and well-recognized task in the field of natural language processing. The emergence of pre-trained language models based on large Transformer architectures shows the great capability of text generation. However, language models are limited when the generation requires explicit clues within the context. In this research, we study how to combine knowledge graphs with language models, and build a creative story generation system named DICE. DICE uses external knowledge graphs to provide context clues and implicit knowledge to generate coherent and creative stories. The evaluation shows that our approach can efectively inject the knowledge from knowledge graphs into the stories automatically generated by the language model.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;knowledge graph</kwd>
        <kwd>language model</kwd>
        <kwd>story generation</kwd>
        <kwd>natural language generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Story generation is a challenging task that requires
reasonable and relevant content in the generated
sentences as well as dealing with logic and implicit
in
        <xref ref-type="bibr" rid="ref13">formation (Guan et al. 2019</xref>
        ). After large-scale
pretrained language modes like OpenAI GPT-2 (Ra
        <xref ref-type="bibr" rid="ref4">dford
et al. 2019</xref>
        ) and BERT (Devlin et al. 2018) have been
released in recent years, machines have shown the
ability to generate a paragraph of understandable text
according to a given topic. These language models are
able to generate mostly-grammatical sentences with
nearly perfect syntax and punctuation (Koncel-Ke
        <xref ref-type="bibr" rid="ref4">dziorski
et al. 2019</xref>
        ). However, the text generated by these
language models often lacks commonsense knowledge
Figure 1: An example of the story generation. The orange
(Logan et al. 2019) and it is hard to control the content words are the keywords provided by the user, and the blue
of the automatically generated text. To solve the prob- words are the extended entities and relations from the DICE
lem, one solution is to take advantage of structured knowledge graph. These words are connected as knowledge
inputs, such as tabular inputs and knowledge graphs graphs (SVO triples). “#i” indicates the sentence is the i-th
(Koncel-Ke
        <xref ref-type="bibr" rid="ref4">dziorski et al. 2019</xref>
        ). Meanwhile, one of the sentence of the story.
most popular methods to combine language models
and knowledge graphs, is using knowledge graph
embeddings. However, creating embeddings for
knowledge graphs is a complex and time-consuming process; embedding approaches.
moreover, knowledge graphs tend to be often updated, We aim to answer the following research questions:
and new embeddings have to be created (Wu et al. Q1. How to combine the language mo
        <xref ref-type="bibr" rid="ref4">del with
knowl2019</xref>
        ). This research introduces a new method to com- edge graphs for the story generation without
knowlbine knowledge graphs with language models without edge graph embeddings? Q2. What are the advantages
and disadvantages of using knowledge graphs to
automatically generate a story?
      </p>
      <p>We propose a two-layer system called DICE, which
contains a knowledge enrichment layer and a text
generation layer, applying the knowledge graph and the
language model respectively, to generate coherent and
Proceedings of the CIKM 2020 Workshops, October 19-20, 2020,
Galway, Ireland
email: x6.yang@student.vu.nl (X. Yang); i.tiddi@vu.nl (I. Tiddi)
url: https://kmitd.github.io/ilaria/ (I. Tiddi)
orcid: 0000-0001-7116-9338 (I. Tiddi)</p>
      <p>© 2020 Copyright for this paper by its authors. Use permitted under Creative
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmUmoRns WLiceonrsekAsthtriobuptioPnr4o.0cIneteerdnaitniognasl ((CCC EBYU4R.0)-.WS.org)
creative stories1. Figure 1 presents an example of the Existing natural language generation systems are
ofstory generation process. In the example, the system ten limited when the tasks require higher levels of
cretakes 4 keywords as an input, then enriches the key- ativity and originality (Jain et al. 2017). Pre-trained
words with the knowledge graph and constructs subject- language models based on large Transformer
architecverb-object (SVO) triples, the latter will be used as a tures (Vaswani et al. 2017), such as GPT-2 and BERT,
prompt for the language model to generate stories. can be a potential solution for this problem. Recently,</p>
      <p>
        The current work explores the possibility of con- the OpenAI team has announced the upgraded GPT-3
necting the knowledge graph and the language model (Brown et al. 2020) with 175 billion parameters which
with an interface. The advantage of using an inter- is 100 times larger than the previous version, GPT-2.
face is that the language model can rapidly adapt to These language models show impressive text
generthe changes from an updated knowledge graph. For ation capabilities that can achieve state-of-the-art
rethe knowledge enrichment layer, we implemented two sults without extra training (Keskar et al. 2019).
Howversions of DICE knowledge graph. For version1, we ever, these language models perform poorly when
capretrieve the knowledge from ConceptNet (Speer et al. turing the long tail of rare entities such as numbers and
2017) and WordNet (Miller 1995) and construct an inte-
        <xref ref-type="bibr" rid="ref4">dates (Logan et al. 2019</xref>
        ). Moreover, these models are
grated knowledge graph of commonsense knowledge unable to build context clues and use implicit
knowlfor the story generation. For version2, we enriched the edge to generate a reasonable story ending (Guan et
knowledge graph of version1 by using
        <xref ref-type="bibr" rid="ref4">DBpedia facts. al. 2019</xref>
        ).
      </p>
      <p>For the text generation layer, we choose ROCStories2
(Mostafazadeh et al. 2016) as our story corpus to fine- 2.2. Text Generation with Knowledge
tune the language model, GPT-2. More details are dis- Graph Embeddings
cussed in Section 3.</p>
      <p>The contributions are as follows:</p>
      <p>
        The problem mentioned above can be improved by
combining language models with knowledge graphs, where
• We propose a new way of combining knowledge the former can facilitate the knowledge extracted from
graphs and language models for text generation knowledge graphs. For example, Logan et al. (2019)
without using knowledge graph embeddings. The built the knowledge graph language model (KGLM)
results show that we can efectively inject the that could select and copy related facts from a
knowlknowledge from knowledge graphs into the au- e
        <xref ref-type="bibr" rid="ref4">dge graph. Ostendorf et al. (2019</xref>
        ) enriched BERT
tomatically generated stories as a background or with knowledge graph embeddings for document
clasa plot and therefore control the content of these sification and got better results than the standard BERT
stories to some extent. approach. Meanwhile, Koncel-Ke
        <xref ref-type="bibr" rid="ref4">dziorski et al. (2019</xref>
        )
• We introduce a fine-tuned model which accepts introduced a new attention model for graph
encodSVO triples as a prompt instead of sentences used ing and used it for the graph-to-text generation. The
by original GPT-2 models, to generate reason- main shortcoming of these models is their high cost of
able and creative stories with the context pro- computational resources which leads to a long
trainvided by the SVO triples. ing and task execution time (Yao et al. 2019).
KoncelKe
        <xref ref-type="bibr" rid="ref4">dziorski et al. (2019</xref>
        ) also showed that their
proposed model failed to mention 40% of entities in the
knowledge graphs in the generated text.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Text Generation using Language</title>
      </sec>
      <sec id="sec-2-2">
        <title>Models</title>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Knowledge Enrichment with</title>
      </sec>
      <sec id="sec-2-4">
        <title>Knowledge Graphs</title>
        <p>
          Story generation is a knowledge-intensive process (Li Hsu et al. (2019) proposed the distill-enrich-generate
et al. 2013). In particular, open story generation re- framework that using knowledge graphs to enrich the
quires artificial intelligence systems to create narra- words distilled from the input images and then
gentives about any topic without a pre-defined domain erating stories. Liu et al. (2019) used external
knowlmodel (Li et al. 2013). Meanwhile, a creative story edge graphs to enrich the input sentence as a sentence
should be both novel and appropriate
          <xref ref-type="bibr" rid="ref10">(Sternberg 1999)</xref>
          . tree for solving NLP tasks such as classification and
sequence labeling. Guo et al. (2019) built a poetry
knowl1Data and code available at github.com/ranyxr/dice_story edge graph for keyword mapping, extension, and
se2https://cs.rochester.edu/nlp/rocstories/ lection to generate Chinese classical poems with high
guage model fine-tuning is the pre-processing step for
story generation, which includes two stages: SVO triple
extraction and fine-tuning. Story generation also has
two stages, i.e., knowledge enrichment and text
generation. In the next section, we will discuss each stage
in detail.
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>3.2. Language Model Fine-tuning</title>
        <sec id="sec-2-5-1">
          <title>The OpenAI team has released GPT-3, the GPT-2’s suc</title>
          <p>cessor, but it was not available when we conducted the
research. As a result, we choose GPT-2 as the natural
language generator. OpenAI has released 4 versions of
GPT-23: the small version with 124M parameters, the
medium version with 355M parameters, the large
version with 774M parameters, and the XL version with
1.5B parameters. Considering the large amount of
training data (the encoded story corpus is 19M), we choose
the medium version of GPT-2 to strike the balance of
speed, size, and creativity. An open-source Python
package, gpt-2-simple4, is used to support the
finetuning and text generation process. Meanwhile, we
choose the ROCStories as our story corpus, which
contains nearly 10 thousand short stories, each story
includes a title and five-sentence content.
quality and relevance. Similarly, Zhou et al. (2020)
resort to a knowledge graph that consists of a collection
of head-relation-tail triples to retrieve related topics in
their intelligent dialogue system.</p>
          <p>Diferent from some researches above, instead of
delivering a graph-to-text task which emphasizes the
explicit translation from graph to text without creative 3.2.1. SVO Triple Extraction
writing, this study puts more focus on using
information from knowledge graphs to provide a background After acquiring the story corpus, we need to encode
or a plot for the language model as guidance or inspi- the dataset into a format that allows GPT-2 to generate
ration. text according to the specified SVO triples. We extract
SVO triples from each story, then add the triples as a
prefix for each story respectively. This way, the
lan3. Method guage model can learn from a hint that each story is
generated conditionally on the SVO triples.
3.1. Overview We use spaCy5 to extract SVO triples from each story
The task here is to generate 5-sentence stories from a as “entities and relations”. However, sometimes the
set of SVO triples that are extracted and regrouped in process may encounter the coreference problem, i.e.,
a knowledge graph. The expected input of the system a pronoun is used as a subject. For example, the
senis a set of keywords provided by users. Figure 2 shows tence is “My sister has a dog. She loves him.”, the triple
the two-layer architecture of the DICE system. We directly extracted by spaCy is (My sister, has, dog) and
use SVO triples as an interface to connect the knowl- (She, loves, him), which are not the expected result
beedge enrichment layer and the text generation layer. cause we want a more specific reference as a subject,
The SVO triples can be constructed from knowledge i.e., (My sister, has, dog) and (My sister, loves, dog). The
graphs or extracted from story corpus; meanwhile, they resolution is using neuralcoref6 that applies the
neuserve as a prompt for the language model to gener- ral net scoring model to find coreferences in the text
ate stories. The system firstly checks the relationships (Clark &amp; Manning 2016). Meanwhile, to simplify the
between these keywords and adds additional informa- triple, we convert the verb into its lemma and only
extion using the knowledge graph, then generates a set tract the main text of the subject and the object. For
of SVO triples to feed the language model to generate
stories. 3https://openai.com/blog/gpt-2-1-5b-release/</p>
          <p>Two processes are involved to complete this task: 45hhttttppss::////sgpitahcuyb.i.oc/om/minimaxir/gpt-2-simple
language model fine-tuning and story generation. Lan- 6https://spacy.io/universe/project/neuralcoref
the example above, we extract “sister” instead of “my
sister”, “love” instead of “loves”.</p>
          <p>As a result, one example from the encoded dataset is
the following:</p>
        </sec>
        <sec id="sec-2-5-2">
          <title>The system includes a new knowledge graph dataset</title>
          <p>named DICE KG. We implemented two versions of DICE</p>
          <p>KG. Version1 (CW, i.e., ConceptNet and WordNet)
com(Joseph, sign, deal), (Joseph, be, musician), (Joseph, bines two large open-source knowledge graphs:
Conbe, songwriter), (Joseph, hope, write), (Joseph, ceptNet 5.6.0 and WordNet. ConceptNet is a
knowllose, wallet), (woman, contact, Joseph), (Joseph, edge graph that connects words and terms (phrases of
have, idea) The Best Single Joseph has just re- natural language) with assertions (labeled, weighted
cently signed a deal with a new record label. edges) (Speer et al. 2017). Unlike ConceptNet,
WordHe is a musician and a songwriter who hopes Net is a large lexical database of English with cognitive
to write a best new hit. On his way to a local synonyms (synsets), which are connected by means of
cofee shop to brainstorm, he lost his wallet. conceptual-semantic and lexical relations (Miller 1995).
Jhoimsepahndwarestfurrunsetrdatiet.dSuundtidleanwlyo, mheanrecaolnizteadcthede The DICE KG converts these two datasets into an
inhad an idea for his new song about kindness. tegrated model, and as a result, the dataset contains
more than 1.6 million nodes and over 3 million
relaWords in red are the SVO triples; words in orange are tionships with 54 types. The DICE KG is large enough
the story title; words in blue are the story content. for finding relations between the keywords given by
users and constructing a set of SVO triples using the
3.2.2. Fine-tuning entities and relations in the knowledge graph.
Moreover, each relationship between the words has an
anThe last step of this process is to fine-tune the model notation named “weight", which can help the system
based on the encoded dataset, which includes both SVO to find a more reasonable path in the next step, i.e., the
triples and the original ROCStories. However, language SVO triple construction.
models like GPT-2 are built for longform content, gen- We also introduce another version (DBCW, i.e.,
DBerating short text like 5-sentence stories is not the typ- pedia, ConceptNet, and WordNet) of DICE KG that
enical generation scenario. To workaround this issue, we riches version1 with DBpedia’s mappings8. The DBCW
use GPT-2-simple, which allows us to add flags to indi- version includes over 8.5 million nodes with 6 labels
cate where is the start and the end of each short text (5- and over 23 million relationships with 694 types. In
sentence story in this case), then the language model this version, we enrich the common concepts from
Conwill automatically extract the shortform texts during ceptNet and WordNet with factual instances and
propthe fine-tuning process. erties from DBpedia. In Section 4, we compare the
per</p>
          <p>The final fine-tuned model is called the DICE model, formance of the two versions of DICE KG.
which can be found and downloaded on Google Drive7. To construct SVO triples from the given keywords,
there are 3 steps: internal matching, external
enrich3.3. Story Generation ment, and converting paths to triples. The internal
matching concerns finding meaningful relations
beWe use the SVO triples as a prompt for GPT-2. The tween the keywords, so that we can later put the
keytriples are constructed based on the keywords by using words at the corresponding position in an SVO triple.
knowledge graphs . Each triple includes a subject and If there is a keyword that has no relation with other
an object as its entities and a verb as its relation that keywords, we use external enrichment to assign other
connects the entities. The SVO triples can not only related words in the knowledge graph to construct an
give the language model topics (entities) to talk about SVO triple for the keyword. The first two steps are
but also define part of the plots (relations) of the story. both semi-automatic, i.e., we use Cypher to query the
For example, (Jane, be, singer) defines the background graph database and get the matching candidates while
of the story, where there is a person whose name is manually filtering the matching results, which are still
Jane who is a singer. The story generation includes needed to ensure the quality of the SVO triples.
two stages: knowledge enrichment and text genera- Figure 3 shows an example of the SVO triple
contion. struction. We assume that the keywords are: {love, cat,
beer, nap}. Firstly, we try to lookup the one-hop
relationship (only specific relations are considered, such as</p>
        </sec>
        <sec id="sec-2-5-3">
          <title>7https://drive.google.com/drive/folders/</title>
          <p>1T68rWkOde5ZwcuodQ9iWuYJAcqAmb0Jo</p>
        </sec>
        <sec id="sec-2-5-4">
          <title>8https://databus.dbpedia.org/dbpedia/mappings/</title>
          <p>mappingbased-objects/2020.07.01
CapableOf and Desires) between the keywords in the
knowledge graph. In this case, we find one direct
relation: (cat, desires, nap). Next, we assign additional
information to the keywords without a direct relation. In
this case, for the verb “love”, we randomly choose the
word “sing” as the verb’s object, which is connected to
“love” through a relation called “CausesDesire”.
Meanwhile, we choose “Tina”, which belongs to the person
class, as the verb’s subject. For “beer” which is a noun,
we assign it a verb “drink”, which is related to “beer”,
and we also choose “Tina” as its subject to keep the
story simple. Finally, we also need to map the directly
one-hop relation into a more common word, for
example, (cat, desires, nap) becomes (cat, want, nap). As
a result, the final SVO triples are (cat, want, nap), (Tina,
love, sing), and (Tina, drink, beer).
3.3.2. Text Generation</p>
        </sec>
        <sec id="sec-2-5-5">
          <title>The final stage is the text generation. After we get the</title>
          <p>SVO triples, we can use these as a prefix to generate
stories from the trained model. In this process, we use
GPT-2 as the story generator. Meanwhile, we use
gpt2-simple which allows for prefixes to force the
generated text to start with the prefix and generate stories
from these triples. Finally, we truncate the prefix and
lfags in the generated stories, to return text only with
titles and contents. Table 1 shows one example
generated by DICE using the triples mentioned above. These
stories are handpicked from 75 automatedly generated
stories. We can see the stories can exactly reflect the
entities and relations from the SVO triples in
generated stories, although the triples may not be presented
in the stories 100% of the time.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments</title>
      <sec id="sec-3-1">
        <title>4.1. Baselines</title>
        <sec id="sec-3-1-1">
          <title>DICE (CW) vs. Human. A given keyword set will</title>
          <p>be provided to both a human and the DICE system
(with CW version of knowledge graph) to create
stories, then we compare the results of human-written
stories and machine-written stories.</p>
          <p>DICE (CW) vs. GPT-2. For the original GPT-2
model, we construct one or two sentences containing
all the entities in the keyword set, and we use these
sentences as input for the GPT-2 model which is
directly fine-tuned on ROCStories to generate a story.
We then use the same keyword set to generate stories
using the DICE model and compare the results.</p>
          <p>DICE (CW) vs. GPT-2-keyword-generation.
GPT2-keyword-generation9 is open-source software that
using GPT-2 to generate text pertaining to the
speciifed keywords. We compare the stories directly
generated from a set of keywords with the stories generated
by the DICE system.</p>
          <p>DICE-CW vs. DICE-DBCW. We also compare the
performance of the DICE system when using
diferent versions of DICE KG to evaluate whether factual
knowledge graphs can contribute to the story
generation.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Evaluation</title>
        <p>4.2.1. Evaluation Metrics
The evaluation focuses on two aspects of the
generated output: story-independent metrics and story-dependent
metrics (Roemmele et al. 2017). Story-independent
metrics, including grammatical correctness, clarity, and
engagement, will be used to analyze the quality of the
generated output without considering its context; whereas
grammatical The correctness of spelling,
correctness grammar and punctuation
clarity uWnhdeetrhstearntdh.e text is easy to
engagement iWsihnettehreesrttinhge awnrditeinfegctisvtey.le
creativity cWrehaettihveerortnhoet. stories are
coherence tSheemoaunttpicuatl.ly coherent of
cKoevyewraogrde
gTwoeonwredhrsaaattereedxptteerexnstt.ednotetdheinktehyeEvaluation
approach
Automatic
Automatic
Automatic
Manual
Manual
Automatic
son (non-native English speaker but with professional
working proficiency) to write two stories with the same
keywords. For human-written stories, each story should
only contain 5 sentences and every keyword in the
keyword set must be mentioned in the story content.
Finally, we invited people to estimate whether the story
is written by a human or a machine and score each
story on its creativity and coherence.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Results and Discussion</title>
      <sec id="sec-4-1">
        <title>5.1. Experiment Results</title>
        <sec id="sec-4-1-1">
          <title>We picked 100 random samples for each model to eval</title>
          <p>uate their performance. We gathered the automatic
evaluation results and manual evaluation results and
separated them by story-independent metrics and
storyindependent metrics, which were shown in Table 3 and
Table 4 respectively. The result shows there is no much
diference according to the story-independent metrics
among the stories written by the language models and
story-dependent metrics, including coherence, keyword human-written stories. The overall grammaticality
percoverage, and creativity, will be used to evaluate the formance of each model is satisfactory. The
Gramgenerated stories with reference to the context (Roem- marly overall score of the fine-tuned GPT-2 model is
mele et al. 2017). On the other hand, the evaluation even higher than the score of human-written stories.
combines both automatic evaluation and manual eval- For samples from ROCStories, most of the
grammatuation. Explanation of each metric and the evaluation ical errors are the punctuation misuse. While for the
approaches are shown in Table 2. stories generated by language models, the biggest
writAutomatic Evaluation. For story-independent met- ing issue is the determiner (a/an/the/this, etc.) misuse,
rics, we used the automated analysis tool, Grammarly, followed by punctuation misuse and wordy sentences.
to evaluate the overall grammaticality performance of For the two story-dependent metrics of creativity
the generated text. For keyword coverage, we used a and coherence, all the models perform poorly
comscript to monitor to what extent do the keywords were pared with human writers. In general, the generated
presented in the generated stories. stories are not always logical and making sense, even</p>
          <p>Manual Evaluation. Stories should be reasonable with a properly trained model. The OpenAI team shows
and coherent with the context (Guan et al. 2019), which that it takes a few tries to get a good and reasonable
is hard to access by automatic tools. As a result, a man- result, and meanwhile, the number of tries is highly
ual evaluation was also performed to more accurately dependent on the topics presented in the training data.
evaluate the quality of each story. We invited 3 indi- Particularly, in this case, the given keywords can
influviduals to score the stories from each model, includ- ence the performance of the result significantly. For
ing stories from the original ROCStories. We applied example, if the given keywords are barely related to
5-point Likert scales to rate each story on its creativity each other, then the model can perform poorly. This is
and coherence. Then we calculated the overall average because unrelated keywords make it more dificult to
score for each model. generate related SVO triples, and unrelated SVO triples</p>
          <p>Furthermore, we used a questionnaire10 to investi- lead to unconnected sentences in the generated
stogate whether readers could tell the diference between ries. However, the keyword coverage of the DICE
systhe automatically generated stories and the human- tem (96% for DICE-CW and 97% for DICE-DBCW) is
written ones. We handpicked two stories generated significantly higher than other baselines (73% for
GPTby the DICE system where the stories were generated 2, 88% for GPT-keyword-generation). However, for
based on a given keyword set. Then we invited a per- the DICE-DBCW, the coverage of the enriched words
(80%) from DBpedia is lower compared with the
keyword coverage. This is because some of the enriched</p>
          <p>Correctness Clarity Engage
21 alerts/ Very
4276 words clear</p>
          <p>Engaging 82/100
26 alerts/
3512 words
18 alerts/
3931 words
31 alerts/
5279 words
54 alerts/
4591 words</p>
          <p>Bland
Bland
A
bland
A
bland
bit
bit</p>
          <p>Score
78/100
75/100
80/100
80/100
Keyword
coverage
0.7275
0.88
words are proper nouns, like brand names, which are
hardly shown in the training text.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Injecting Relations into Stories</title>
        <sec id="sec-4-2-1">
          <title>As mentioned in the last section, the keyword cover</title>
          <p>age (96%) and the relation coverage (100%) of the DICE
system are very high during the test. This means the
SVO triples can efectively afect the plots of the
generated stories. During the experiment, we find that
we can use SVO triples to inject entities and the
relations between the entities into the stories as
backgrounds or plots. As a result, the quality of the SVO
triples and the order of these triples can significantly
afect the quality of the automatically generated
stories. Since these triples are generated from the
knowledge graphs, the logic and relationships behind these
knowledge graphs are also important to a better story
generation.</p>
          <p>5.3. Quality of Generated Stories
2.3/5 2.7/5 0.9725 As shown in Table 4, there is little diference in the
creativity score and the coherence score from the
base3.7/5 4.9/5 N/A
lines to the DICE model. Although with the DICE model,
we are able to inject relations into the stories, the
relation can only afect the logic within each sentence
while it cannot influence the logic that runs through
5.1.1. Questionnaire Results the story. This is because the SVO triples extracted
The questionnaire has received 54 responses. Most of during the language fine-tuning process, are extracted
the respondents are native English speakers (4/5 of the from each sentence separately in the stories which are
respondents), and some of them are non-native speak- loosely connected, so they cannot reflect relations like
ers (1/5 of the respondents) but with efective English causation throughout the text. As a result, the
coherproficiency. The result is shown in Table 5. In general, ence of the generated stories from the DICE model is
there is a great chance (37.5% on average) for people not satisfying in general.
to make a mistake when judging whether the story is
written by a human or a machine. In particular, sto- 5.4. Commonsense vs. Factual KG
ries with short sentences and wrong word choices are
more likely to be regarded as a machine-written story.</p>
          <p>On the other hand, for stories that are interesting and
creative but without coherence between the sentences,
people are more likely to make a mistake and think the
stories are written by a human.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>We introduce two knowledge graphs in this research.</title>
          <p>The knowledge graph used in version1 (CW) is a
semantic knowledge graph where common concepts and
words have many connections with each other, which
is the foundation to relate keywords and construct SVO
triples. While for the fact-based knowledge graphs
like DBpedia, they can hardly provide connections be- commonsense knowledge. In Proceedings of the
tween the common concepts, and as a result, they can AAAI Conference on Artificial Intelligence (Vol. 33,
hardly contribute to the triple construction process. pp. 6473-6480).</p>
          <p>
            However, with a combination of semantic knowledge [7] Guo, Z., Yi, X., Sun, M., Li, W., Yang, C., Liang, J.,
graphs and factual knowledge graphs, i.e.,
            <xref ref-type="bibr" rid="ref4">DICE KG ... &amp; Li, R. (2019</xref>
            , July). Jiuge: A Human-Machine
version2 (DBCW), we can make use of the knowledge Collaborative Chinese Classical Poetry Generation
about the instances of the concepts and the properties System. In Proceedings of the 57th Annual Meeting
of the instances from factual knowledge graphs, and of the Association for Computational Linguistics:
we can use it to enrich the entities in the triples. System Demonstrations (pp. 25-30).
[8] Hsu, C. C., Chen, Z. Y., Hsu, C. Y., Li, C.
          </p>
          <p>C., Lin, T. Y., Huang, T. H. K., &amp; Ku, L. W.
6. Conclusions (2019). Knowledge-Enriched Visual Storytelling.
arXiv preprint arXiv:1912.01496.</p>
          <p>
            In this paper we showed how to use subject-verb-object [9] Jain, P., Agrawal, P., Mishra, A., Sukhwani, M.,
triples as a context clues input to the generative model, Laha, A., &amp; Sankaranarayanan, K. (2017). Story
gento connect language models and knowledge graphs for eration from sequence of independent short
destory generation. Evaluation results showed that we scriptions. arXiv preprint arXiv:1707.05501.
can efectively inject entities and relations from knowl- [10] Keskar, N. S., McCann, B., Varshney, L. R., Xiong,
edge graphs into the generated stories. Future work C., &amp; Socher, R. (2019). Ctrl: A conditional
transwill focus on improving the coherence of the gener- former language model for controllable generation.
ated stories and making them have smooth transitions
            <xref ref-type="bibr" rid="ref3">arXiv preprint arXiv:1909</xref>
            .05858.
between sentences. For example, in order to improve [11] Knublauch, H., &amp; Kontokostas, D. (2017). Shapes
the performance of the internal matching process, we constraint language (SHACL). W3C Candidate
Reccan classify popular words into specific classes and use ommendation, 11(8).
ontology techniques, such as SCHACL
            <xref ref-type="bibr" rid="ref6">(Knublauch &amp; [12] Koncel-Kedziorski, R., Bekal, D., Luan, Y.,
LapKontokostas 2017)</xref>
            and OWL restrictions (McGuinness ata, M., &amp; Hajishirzi, H. (2019). Text Generation
&amp; Van Harmelen 2004), to make sure these classes can from Knowledge Graphs with Graph Transformers.
interact with each other based on specific rules. arXiv preprint arXiv:1904.02342.
[13] Li, B., Lee-Urban, S., Johnston, G., &amp; Riedl, M.
          </p>
          <p>References (2013, June). Story generation with crowdsourced
plot graphs. In Twenty-Seventh AAAI Conference
[1] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Ka- on Artificial Intelligence.</p>
          <p>
            plan, J., Dhariwal, P., ... &amp; Agarwal, S. (2020). Lan- [14] Liu, W., Zhou, P., Zhao, Z., Wang, Z., Ju, Q.,
guage models are few-shot learners. arXiv preprint
            <xref ref-type="bibr" rid="ref4">Deng, H., &amp; Wang, P. (2019</xref>
            ). K-bert: Enabling
lanarXiv:2005.14165. guage representation with knowledge graph. arXiv
[2] Chen, J., Chen, J., &amp; Yu, Z. (2019, July). Incorpo- preprint
            <xref ref-type="bibr" rid="ref3">arXiv:1909</xref>
            .07606.
          </p>
          <p>rating structured commonsense knowledge in story [15] Logan, R., Liu, N. F., Peters, M. E., Gardner, M., &amp;
completion. In Proceedings of the AAAI Confer- Singh, S. (2019, July). Barack’s wife hillary: Using
ence on Artificial Intelligence (Vol. 33, pp. 6244- knowledge graphs for fact-aware language
model6251). ing. In Proceedings of the 57th Annual Meeting of
[3] Chen, Z., Eavani, H., Liu, Y., &amp; Wang, W. Y. (2019). the Association for Computational Linguistics (pp.</p>
          <p>Few-shot NLG with Pre-trained Language Model. 5962-5971).</p>
          <p>arXiv preprint arXiv:1904.09521. [16] McGuinness, D. L., &amp; Van Harmelen, F. (2004).
[4] Clark, K., &amp; Manning, C. D. (2016). Deep rein- OWL web ontology language overview. W3C
recforcement learning for mention-ranking corefer- ommendation, 10(10), 2004.</p>
          <p>ence models. arXiv preprint arXiv:1609.08667. [17] Miller, G. A. (1995). WordNet: a lexical database
[5] Devlin, J., Chang, M. W., Lee, K., &amp; Toutanova, for English. Communications of the ACM, 38(11),
K. (2018). Bert: Pre-training of deep bidirectional 39-41.
transformers for language understanding. arXiv [18] Mostafazadeh, N., Vanderwende, L., Yih, W. T.,
preprint arXiv:1810.04805. Kohli, P., &amp; Allen, J. (2016, August). Story cloze
[6] Guan, J., Wang, Y., &amp; Huang, M. (2019, July). Story evaluator: Vector space representation evaluation
ending generation with incremental encoding and by predicting what happens next. In Proceedings of
the 1st Workshop on Evaluating Vector-Space
Rep</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>resentations for NLP</source>
          (pp.
          <fpage>24</fpage>
          -
          <lpage>29</lpage>
          ). [19]
          <string-name>
            <surname>Ostendorf</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bourgonje</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Moreno-Schneider</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rehm</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gipp</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          arXiv:
          <year>1909</year>
          .
          <volume>08402</volume>
          . [20]
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Amodei,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            , &amp;
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Language models are un-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>supervised multitask learners</article-title>
          .
          <source>OpenAI Blog</source>
          ,
          <volume>1</volume>
          (
          <issue>8</issue>
          ). [21]
          <string-name>
            <surname>Roemmele</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gordon</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Swanson</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2017</year>
          ,
          <article-title>August)</article-title>
          .
          <source>Evaluating story generation</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>SIGKDD 2017 Workshop on Machine Learning for</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Creativity</surname>
          </string-name>
          (pp.
          <fpage>13</fpage>
          -
          <lpage>17</lpage>
          ). [22]
          <string-name>
            <surname>Speer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Havasi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2017</year>
          , February).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>Conceptnet 5.5: An open multilingual graph of gen-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>on Artificial Intelligence</source>
          . [23]
          <string-name>
            <surname>Sternberg</surname>
            ,
            <given-names>R. J</given-names>
          </string-name>
          . (Ed.). (
          <year>1999</year>
          ). Handbook of creativ-
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          ity. Cambridge University Press. [24]
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A. N.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>formation processing systems</source>
          (pp.
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          ). [25]
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2019</year>
          ). Ef-
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>06708</volume>
          . [26]
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>KG-BERT:</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          preprint arXiv:
          <year>1909</year>
          .
          <volume>03193</volume>
          . [27]
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Shum</surname>
            ,
            <given-names>H. Y.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>