1. Introduction

Creative Storytelling with Language Models and Knowledge Graphs

Xinran Yang

Ilaria Tiddi

0 0 Vrije Universiteit Amsterdam , De Boelelaan 1105, 1081 HV Amsterdam , The Netherlands

5 13

Automated story generation is a popular and well-recognized task in the field of natural language processing. The emergence of pre-trained language models based on large Transformer architectures shows the great capability of text generation. However, language models are limited when the generation requires explicit clues within the context. In this research, we study how to combine knowledge graphs with language models, and build a creative story generation system named DICE. DICE uses external knowledge graphs to provide context clues and implicit knowledge to generate coherent and creative stories. The evaluation shows that our approach can efectively inject the knowledge from knowledge graphs into the stories automatically generated by the language model.

eol>knowledge graph language model story generation natural language generation

1. Introduction

Story generation is a challenging task that requires reasonable and relevant content in the generated sentences as well as dealing with logic and implicit in formation (Guan et al. 2019 ). After large-scale pretrained language modes like OpenAI GPT-2 (Ra dford et al. 2019 ) and BERT (Devlin et al. 2018) have been released in recent years, machines have shown the ability to generate a paragraph of understandable text according to a given topic. These language models are able to generate mostly-grammatical sentences with nearly perfect syntax and punctuation (Koncel-Ke dziorski et al. 2019 ). However, the text generated by these language models often lacks commonsense knowledge Figure 1: An example of the story generation. The orange (Logan et al. 2019) and it is hard to control the content words are the keywords provided by the user, and the blue of the automatically generated text. To solve the prob- words are the extended entities and relations from the DICE lem, one solution is to take advantage of structured knowledge graph. These words are connected as knowledge inputs, such as tabular inputs and knowledge graphs graphs (SVO triples). “#i” indicates the sentence is the i-th (Koncel-Ke dziorski et al. 2019 ). Meanwhile, one of the sentence of the story. most popular methods to combine language models and knowledge graphs, is using knowledge graph embeddings. However, creating embeddings for knowledge graphs is a complex and time-consuming process; embedding approaches. moreover, knowledge graphs tend to be often updated, We aim to answer the following research questions: and new embeddings have to be created (Wu et al. Q1. How to combine the language mo del with knowl2019 ). This research introduces a new method to com- edge graphs for the story generation without knowlbine knowledge graphs with language models without edge graph embeddings? Q2. What are the advantages and disadvantages of using knowledge graphs to automatically generate a story?

We propose a two-layer system called DICE, which contains a knowledge enrichment layer and a text generation layer, applying the knowledge graph and the language model respectively, to generate coherent and Proceedings of the CIKM 2020 Workshops, October 19-20, 2020, Galway, Ireland email: x6.yang@student.vu.nl (X. Yang); i.tiddi@vu.nl (I. Tiddi) url: https://kmitd.github.io/ilaria/ (I. Tiddi) orcid: 0000-0001-7116-9338 (I. Tiddi)

© 2020 Copyright for this paper by its authors. Use permitted under Creative CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmUmoRns WLiceonrsekAsthtriobuptioPnr4o.0cIneteerdnaitniognasl ((CCC EBYU4R.0)-.WS.org) creative stories1. Figure 1 presents an example of the Existing natural language generation systems are ofstory generation process. In the example, the system ten limited when the tasks require higher levels of cretakes 4 keywords as an input, then enriches the key- ativity and originality (Jain et al. 2017). Pre-trained words with the knowledge graph and constructs subject- language models based on large Transformer architecverb-object (SVO) triples, the latter will be used as a tures (Vaswani et al. 2017), such as GPT-2 and BERT, prompt for the language model to generate stories. can be a potential solution for this problem. Recently,

The current work explores the possibility of con- the OpenAI team has announced the upgraded GPT-3 necting the knowledge graph and the language model (Brown et al. 2020) with 175 billion parameters which with an interface. The advantage of using an inter- is 100 times larger than the previous version, GPT-2. face is that the language model can rapidly adapt to These language models show impressive text generthe changes from an updated knowledge graph. For ation capabilities that can achieve state-of-the-art rethe knowledge enrichment layer, we implemented two sults without extra training (Keskar et al. 2019). Howversions of DICE knowledge graph. For version1, we ever, these language models perform poorly when capretrieve the knowledge from ConceptNet (Speer et al. turing the long tail of rare entities such as numbers and 2017) and WordNet (Miller 1995) and construct an inte- dates (Logan et al. 2019 ). Moreover, these models are grated knowledge graph of commonsense knowledge unable to build context clues and use implicit knowlfor the story generation. For version2, we enriched the edge to generate a reasonable story ending (Guan et knowledge graph of version1 by using DBpedia facts. al. 2019 ).

For the text generation layer, we choose ROCStories2 (Mostafazadeh et al. 2016) as our story corpus to fine- 2.2. Text Generation with Knowledge tune the language model, GPT-2. More details are dis- Graph Embeddings cussed in Section 3.

The contributions are as follows:

The problem mentioned above can be improved by combining language models with knowledge graphs, where • We propose a new way of combining knowledge the former can facilitate the knowledge extracted from graphs and language models for text generation knowledge graphs. For example, Logan et al. (2019) without using knowledge graph embeddings. The built the knowledge graph language model (KGLM) results show that we can efectively inject the that could select and copy related facts from a knowlknowledge from knowledge graphs into the au- e dge graph. Ostendorf et al. (2019 ) enriched BERT tomatically generated stories as a background or with knowledge graph embeddings for document clasa plot and therefore control the content of these sification and got better results than the standard BERT stories to some extent. approach. Meanwhile, Koncel-Ke dziorski et al. (2019 ) • We introduce a fine-tuned model which accepts introduced a new attention model for graph encodSVO triples as a prompt instead of sentences used ing and used it for the graph-to-text generation. The by original GPT-2 models, to generate reason- main shortcoming of these models is their high cost of able and creative stories with the context pro- computational resources which leads to a long trainvided by the SVO triples. ing and task execution time (Yao et al. 2019). KoncelKe dziorski et al. (2019 ) also showed that their proposed model failed to mention 40% of entities in the knowledge graphs in the generated text.

2. Related Work 2.1. Text Generation using Language Models 2.3. Knowledge Enrichment with Knowledge Graphs

Story generation is a knowledge-intensive process (Li Hsu et al. (2019) proposed the distill-enrich-generate et al. 2013). In particular, open story generation re- framework that using knowledge graphs to enrich the quires artificial intelligence systems to create narra- words distilled from the input images and then gentives about any topic without a pre-defined domain erating stories. Liu et al. (2019) used external knowlmodel (Li et al. 2013). Meanwhile, a creative story edge graphs to enrich the input sentence as a sentence should be both novel and appropriate (Sternberg 1999) . tree for solving NLP tasks such as classification and sequence labeling. Guo et al. (2019) built a poetry knowl1Data and code available at github.com/ranyxr/dice_story edge graph for keyword mapping, extension, and se2https://cs.rochester.edu/nlp/rocstories/ lection to generate Chinese classical poems with high guage model fine-tuning is the pre-processing step for story generation, which includes two stages: SVO triple extraction and fine-tuning. Story generation also has two stages, i.e., knowledge enrichment and text generation. In the next section, we will discuss each stage in detail.

3.2. Language Model Fine-tuning The OpenAI team has released GPT-3, the GPT-2’s suc

cessor, but it was not available when we conducted the research. As a result, we choose GPT-2 as the natural language generator. OpenAI has released 4 versions of GPT-23: the small version with 124M parameters, the medium version with 355M parameters, the large version with 774M parameters, and the XL version with 1.5B parameters. Considering the large amount of training data (the encoded story corpus is 19M), we choose the medium version of GPT-2 to strike the balance of speed, size, and creativity. An open-source Python package, gpt-2-simple4, is used to support the finetuning and text generation process. Meanwhile, we choose the ROCStories as our story corpus, which contains nearly 10 thousand short stories, each story includes a title and five-sentence content. quality and relevance. Similarly, Zhou et al. (2020) resort to a knowledge graph that consists of a collection of head-relation-tail triples to retrieve related topics in their intelligent dialogue system.

Diferent from some researches above, instead of delivering a graph-to-text task which emphasizes the explicit translation from graph to text without creative 3.2.1. SVO Triple Extraction writing, this study puts more focus on using information from knowledge graphs to provide a background After acquiring the story corpus, we need to encode or a plot for the language model as guidance or inspi- the dataset into a format that allows GPT-2 to generate ration. text according to the specified SVO triples. We extract SVO triples from each story, then add the triples as a prefix for each story respectively. This way, the lan3. Method guage model can learn from a hint that each story is generated conditionally on the SVO triples. 3.1. Overview We use spaCy5 to extract SVO triples from each story The task here is to generate 5-sentence stories from a as “entities and relations”. However, sometimes the set of SVO triples that are extracted and regrouped in process may encounter the coreference problem, i.e., a knowledge graph. The expected input of the system a pronoun is used as a subject. For example, the senis a set of keywords provided by users. Figure 2 shows tence is “My sister has a dog. She loves him.”, the triple the two-layer architecture of the DICE system. We directly extracted by spaCy is (My sister, has, dog) and use SVO triples as an interface to connect the knowl- (She, loves, him), which are not the expected result beedge enrichment layer and the text generation layer. cause we want a more specific reference as a subject, The SVO triples can be constructed from knowledge i.e., (My sister, has, dog) and (My sister, loves, dog). The graphs or extracted from story corpus; meanwhile, they resolution is using neuralcoref6 that applies the neuserve as a prompt for the language model to gener- ral net scoring model to find coreferences in the text ate stories. The system firstly checks the relationships (Clark & Manning 2016). Meanwhile, to simplify the between these keywords and adds additional informa- triple, we convert the verb into its lemma and only extion using the knowledge graph, then generates a set tract the main text of the subject and the object. For of SVO triples to feed the language model to generate stories. 3https://openai.com/blog/gpt-2-1-5b-release/

Two processes are involved to complete this task: 45hhttttppss::////sgpitahcuyb.i.oc/om/minimaxir/gpt-2-simple language model fine-tuning and story generation. Lan- 6https://spacy.io/universe/project/neuralcoref the example above, we extract “sister” instead of “my sister”, “love” instead of “loves”.

As a result, one example from the encoded dataset is the following:

The system includes a new knowledge graph dataset

named DICE KG. We implemented two versions of DICE

KG. Version1 (CW, i.e., ConceptNet and WordNet) com(Joseph, sign, deal), (Joseph, be, musician), (Joseph, bines two large open-source knowledge graphs: Conbe, songwriter), (Joseph, hope, write), (Joseph, ceptNet 5.6.0 and WordNet. ConceptNet is a knowllose, wallet), (woman, contact, Joseph), (Joseph, edge graph that connects words and terms (phrases of have, idea) The Best Single Joseph has just re- natural language) with assertions (labeled, weighted cently signed a deal with a new record label. edges) (Speer et al. 2017). Unlike ConceptNet, WordHe is a musician and a songwriter who hopes Net is a large lexical database of English with cognitive to write a best new hit. On his way to a local synonyms (synsets), which are connected by means of cofee shop to brainstorm, he lost his wallet. conceptual-semantic and lexical relations (Miller 1995). Jhoimsepahndwarestfurrunsetrdatiet.dSuundtidleanwlyo, mheanrecaolnizteadcthede The DICE KG converts these two datasets into an inhad an idea for his new song about kindness. tegrated model, and as a result, the dataset contains more than 1.6 million nodes and over 3 million relaWords in red are the SVO triples; words in orange are tionships with 54 types. The DICE KG is large enough the story title; words in blue are the story content. for finding relations between the keywords given by users and constructing a set of SVO triples using the 3.2.2. Fine-tuning entities and relations in the knowledge graph. Moreover, each relationship between the words has an anThe last step of this process is to fine-tune the model notation named “weight", which can help the system based on the encoded dataset, which includes both SVO to find a more reasonable path in the next step, i.e., the triples and the original ROCStories. However, language SVO triple construction. models like GPT-2 are built for longform content, gen- We also introduce another version (DBCW, i.e., DBerating short text like 5-sentence stories is not the typ- pedia, ConceptNet, and WordNet) of DICE KG that enical generation scenario. To workaround this issue, we riches version1 with DBpedia’s mappings8. The DBCW use GPT-2-simple, which allows us to add flags to indi- version includes over 8.5 million nodes with 6 labels cate where is the start and the end of each short text (5- and over 23 million relationships with 694 types. In sentence story in this case), then the language model this version, we enrich the common concepts from Conwill automatically extract the shortform texts during ceptNet and WordNet with factual instances and propthe fine-tuning process. erties from DBpedia. In Section 4, we compare the per

The final fine-tuned model is called the DICE model, formance of the two versions of DICE KG. which can be found and downloaded on Google Drive7. To construct SVO triples from the given keywords, there are 3 steps: internal matching, external enrich3.3. Story Generation ment, and converting paths to triples. The internal matching concerns finding meaningful relations beWe use the SVO triples as a prompt for GPT-2. The tween the keywords, so that we can later put the keytriples are constructed based on the keywords by using words at the corresponding position in an SVO triple. knowledge graphs . Each triple includes a subject and If there is a keyword that has no relation with other an object as its entities and a verb as its relation that keywords, we use external enrichment to assign other connects the entities. The SVO triples can not only related words in the knowledge graph to construct an give the language model topics (entities) to talk about SVO triple for the keyword. The first two steps are but also define part of the plots (relations) of the story. both semi-automatic, i.e., we use Cypher to query the For example, (Jane, be, singer) defines the background graph database and get the matching candidates while of the story, where there is a person whose name is manually filtering the matching results, which are still Jane who is a singer. The story generation includes needed to ensure the quality of the SVO triples. two stages: knowledge enrichment and text genera- Figure 3 shows an example of the SVO triple contion. struction. We assume that the keywords are: {love, cat, beer, nap}. Firstly, we try to lookup the one-hop relationship (only specific relations are considered, such as

7https://drive.google.com/drive/folders/

1T68rWkOde5ZwcuodQ9iWuYJAcqAmb0Jo

8https://databus.dbpedia.org/dbpedia/mappings/

mappingbased-objects/2020.07.01 CapableOf and Desires) between the keywords in the knowledge graph. In this case, we find one direct relation: (cat, desires, nap). Next, we assign additional information to the keywords without a direct relation. In this case, for the verb “love”, we randomly choose the word “sing” as the verb’s object, which is connected to “love” through a relation called “CausesDesire”. Meanwhile, we choose “Tina”, which belongs to the person class, as the verb’s subject. For “beer” which is a noun, we assign it a verb “drink”, which is related to “beer”, and we also choose “Tina” as its subject to keep the story simple. Finally, we also need to map the directly one-hop relation into a more common word, for example, (cat, desires, nap) becomes (cat, want, nap). As a result, the final SVO triples are (cat, want, nap), (Tina, love, sing), and (Tina, drink, beer). 3.3.2. Text Generation

The final stage is the text generation. After we get the

SVO triples, we can use these as a prefix to generate stories from the trained model. In this process, we use GPT-2 as the story generator. Meanwhile, we use gpt2-simple which allows for prefixes to force the generated text to start with the prefix and generate stories from these triples. Finally, we truncate the prefix and lfags in the generated stories, to return text only with titles and contents. Table 1 shows one example generated by DICE using the triples mentioned above. These stories are handpicked from 75 automatedly generated stories. We can see the stories can exactly reflect the entities and relations from the SVO triples in generated stories, although the triples may not be presented in the stories 100% of the time.

4. Experiments 4.1. Baselines DICE (CW) vs. Human. A given keyword set will

be provided to both a human and the DICE system (with CW version of knowledge graph) to create stories, then we compare the results of human-written stories and machine-written stories.

DICE (CW) vs. GPT-2. For the original GPT-2 model, we construct one or two sentences containing all the entities in the keyword set, and we use these sentences as input for the GPT-2 model which is directly fine-tuned on ROCStories to generate a story. We then use the same keyword set to generate stories using the DICE model and compare the results.

DICE (CW) vs. GPT-2-keyword-generation. GPT2-keyword-generation9 is open-source software that using GPT-2 to generate text pertaining to the speciifed keywords. We compare the stories directly generated from a set of keywords with the stories generated by the DICE system.

DICE-CW vs. DICE-DBCW. We also compare the performance of the DICE system when using diferent versions of DICE KG to evaluate whether factual knowledge graphs can contribute to the story generation.

4.2. Evaluation

4.2.1. Evaluation Metrics The evaluation focuses on two aspects of the generated output: story-independent metrics and story-dependent metrics (Roemmele et al. 2017). Story-independent metrics, including grammatical correctness, clarity, and engagement, will be used to analyze the quality of the generated output without considering its context; whereas grammatical The correctness of spelling, correctness grammar and punctuation clarity uWnhdeetrhstearntdh.e text is easy to engagement iWsihnettehreesrttinhge awnrditeinfegctisvtey.le creativity cWrehaettihveerortnhoet. stories are coherence tSheemoaunttpicuatl.ly coherent of cKoevyewraogrde gTwoeonwredhrsaaattereedxptteerexnstt.ednotetdheinktehyeEvaluation approach Automatic Automatic Automatic Manual Manual Automatic son (non-native English speaker but with professional working proficiency) to write two stories with the same keywords. For human-written stories, each story should only contain 5 sentences and every keyword in the keyword set must be mentioned in the story content. Finally, we invited people to estimate whether the story is written by a human or a machine and score each story on its creativity and coherence.

5. Results and Discussion 5.1. Experiment Results We picked 100 random samples for each model to eval

uate their performance. We gathered the automatic evaluation results and manual evaluation results and separated them by story-independent metrics and storyindependent metrics, which were shown in Table 3 and Table 4 respectively. The result shows there is no much diference according to the story-independent metrics among the stories written by the language models and story-dependent metrics, including coherence, keyword human-written stories. The overall grammaticality percoverage, and creativity, will be used to evaluate the formance of each model is satisfactory. The Gramgenerated stories with reference to the context (Roem- marly overall score of the fine-tuned GPT-2 model is mele et al. 2017). On the other hand, the evaluation even higher than the score of human-written stories. combines both automatic evaluation and manual eval- For samples from ROCStories, most of the grammatuation. Explanation of each metric and the evaluation ical errors are the punctuation misuse. While for the approaches are shown in Table 2. stories generated by language models, the biggest writAutomatic Evaluation. For story-independent met- ing issue is the determiner (a/an/the/this, etc.) misuse, rics, we used the automated analysis tool, Grammarly, followed by punctuation misuse and wordy sentences. to evaluate the overall grammaticality performance of For the two story-dependent metrics of creativity the generated text. For keyword coverage, we used a and coherence, all the models perform poorly comscript to monitor to what extent do the keywords were pared with human writers. In general, the generated presented in the generated stories. stories are not always logical and making sense, even

Manual Evaluation. Stories should be reasonable with a properly trained model. The OpenAI team shows and coherent with the context (Guan et al. 2019), which that it takes a few tries to get a good and reasonable is hard to access by automatic tools. As a result, a man- result, and meanwhile, the number of tries is highly ual evaluation was also performed to more accurately dependent on the topics presented in the training data. evaluate the quality of each story. We invited 3 indi- Particularly, in this case, the given keywords can influviduals to score the stories from each model, includ- ence the performance of the result significantly. For ing stories from the original ROCStories. We applied example, if the given keywords are barely related to 5-point Likert scales to rate each story on its creativity each other, then the model can perform poorly. This is and coherence. Then we calculated the overall average because unrelated keywords make it more dificult to score for each model. generate related SVO triples, and unrelated SVO triples

Furthermore, we used a questionnaire10 to investi- lead to unconnected sentences in the generated stogate whether readers could tell the diference between ries. However, the keyword coverage of the DICE systhe automatically generated stories and the human- tem (96% for DICE-CW and 97% for DICE-DBCW) is written ones. We handpicked two stories generated significantly higher than other baselines (73% for GPTby the DICE system where the stories were generated 2, 88% for GPT-keyword-generation). However, for based on a given keyword set. Then we invited a per- the DICE-DBCW, the coverage of the enriched words (80%) from DBpedia is lower compared with the keyword coverage. This is because some of the enriched

Correctness Clarity Engage 21 alerts/ Very 4276 words clear

Engaging 82/100 26 alerts/ 3512 words 18 alerts/ 3931 words 31 alerts/ 5279 words 54 alerts/ 4591 words

Bland Bland A bland A bland bit bit

Score 78/100 75/100 80/100 80/100 Keyword coverage 0.7275 0.88 words are proper nouns, like brand names, which are hardly shown in the training text.

5.2. Injecting Relations into Stories As mentioned in the last section, the keyword cover

age (96%) and the relation coverage (100%) of the DICE system are very high during the test. This means the SVO triples can efectively afect the plots of the generated stories. During the experiment, we find that we can use SVO triples to inject entities and the relations between the entities into the stories as backgrounds or plots. As a result, the quality of the SVO triples and the order of these triples can significantly afect the quality of the automatically generated stories. Since these triples are generated from the knowledge graphs, the logic and relationships behind these knowledge graphs are also important to a better story generation.

5.3. Quality of Generated Stories 2.3/5 2.7/5 0.9725 As shown in Table 4, there is little diference in the creativity score and the coherence score from the base3.7/5 4.9/5 N/A lines to the DICE model. Although with the DICE model, we are able to inject relations into the stories, the relation can only afect the logic within each sentence while it cannot influence the logic that runs through 5.1.1. Questionnaire Results the story. This is because the SVO triples extracted The questionnaire has received 54 responses. Most of during the language fine-tuning process, are extracted the respondents are native English speakers (4/5 of the from each sentence separately in the stories which are respondents), and some of them are non-native speak- loosely connected, so they cannot reflect relations like ers (1/5 of the respondents) but with efective English causation throughout the text. As a result, the coherproficiency. The result is shown in Table 5. In general, ence of the generated stories from the DICE model is there is a great chance (37.5% on average) for people not satisfying in general. to make a mistake when judging whether the story is written by a human or a machine. In particular, sto- 5.4. Commonsense vs. Factual KG ries with short sentences and wrong word choices are more likely to be regarded as a machine-written story.

On the other hand, for stories that are interesting and creative but without coherence between the sentences, people are more likely to make a mistake and think the stories are written by a human.

We introduce two knowledge graphs in this research.

The knowledge graph used in version1 (CW) is a semantic knowledge graph where common concepts and words have many connections with each other, which is the foundation to relate keywords and construct SVO triples. While for the fact-based knowledge graphs like DBpedia, they can hardly provide connections be- commonsense knowledge. In Proceedings of the tween the common concepts, and as a result, they can AAAI Conference on Artificial Intelligence (Vol. 33, hardly contribute to the triple construction process. pp. 6473-6480).

However, with a combination of semantic knowledge [7] Guo, Z., Yi, X., Sun, M., Li, W., Yang, C., Liang, J., graphs and factual knowledge graphs, i.e., DICE KG ... & Li, R. (2019 , July). Jiuge: A Human-Machine version2 (DBCW), we can make use of the knowledge Collaborative Chinese Classical Poetry Generation about the instances of the concepts and the properties System. In Proceedings of the 57th Annual Meeting of the instances from factual knowledge graphs, and of the Association for Computational Linguistics: we can use it to enrich the entities in the triples. System Demonstrations (pp. 25-30). [8] Hsu, C. C., Chen, Z. Y., Hsu, C. Y., Li, C.

C., Lin, T. Y., Huang, T. H. K., & Ku, L. W. 6. Conclusions (2019). Knowledge-Enriched Visual Storytelling. arXiv preprint arXiv:1912.01496.

In this paper we showed how to use subject-verb-object [9] Jain, P., Agrawal, P., Mishra, A., Sukhwani, M., triples as a context clues input to the generative model, Laha, A., & Sankaranarayanan, K. (2017). Story gento connect language models and knowledge graphs for eration from sequence of independent short destory generation. Evaluation results showed that we scriptions. arXiv preprint arXiv:1707.05501. can efectively inject entities and relations from knowl- [10] Keskar, N. S., McCann, B., Varshney, L. R., Xiong, edge graphs into the generated stories. Future work C., & Socher, R. (2019). Ctrl: A conditional transwill focus on improving the coherence of the gener- former language model for controllable generation. ated stories and making them have smooth transitions arXiv preprint arXiv:1909 .05858. between sentences. For example, in order to improve [11] Knublauch, H., & Kontokostas, D. (2017). Shapes the performance of the internal matching process, we constraint language (SHACL). W3C Candidate Reccan classify popular words into specific classes and use ommendation, 11(8). ontology techniques, such as SCHACL (Knublauch & [12] Koncel-Kedziorski, R., Bekal, D., Luan, Y., LapKontokostas 2017) and OWL restrictions (McGuinness ata, M., & Hajishirzi, H. (2019). Text Generation & Van Harmelen 2004), to make sure these classes can from Knowledge Graphs with Graph Transformers. interact with each other based on specific rules. arXiv preprint arXiv:1904.02342. [13] Li, B., Lee-Urban, S., Johnston, G., & Riedl, M.

References (2013, June). Story generation with crowdsourced plot graphs. In Twenty-Seventh AAAI Conference [1] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Ka- on Artificial Intelligence.

plan, J., Dhariwal, P., ... & Agarwal, S. (2020). Lan- [14] Liu, W., Zhou, P., Zhao, Z., Wang, Z., Ju, Q., guage models are few-shot learners. arXiv preprint Deng, H., & Wang, P. (2019 ). K-bert: Enabling lanarXiv:2005.14165. guage representation with knowledge graph. arXiv [2] Chen, J., Chen, J., & Yu, Z. (2019, July). Incorpo- preprint arXiv:1909 .07606.

rating structured commonsense knowledge in story [15] Logan, R., Liu, N. F., Peters, M. E., Gardner, M., & completion. In Proceedings of the AAAI Confer- Singh, S. (2019, July). Barack’s wife hillary: Using ence on Artificial Intelligence (Vol. 33, pp. 6244- knowledge graphs for fact-aware language model6251). ing. In Proceedings of the 57th Annual Meeting of [3] Chen, Z., Eavani, H., Liu, Y., & Wang, W. Y. (2019). the Association for Computational Linguistics (pp.

Few-shot NLG with Pre-trained Language Model. 5962-5971).

arXiv preprint arXiv:1904.09521. [16] McGuinness, D. L., & Van Harmelen, F. (2004). [4] Clark, K., & Manning, C. D. (2016). Deep rein- OWL web ontology language overview. W3C recforcement learning for mention-ranking corefer- ommendation, 10(10), 2004.

ence models. arXiv preprint arXiv:1609.08667. [17] Miller, G. A. (1995). WordNet: a lexical database [5] Devlin, J., Chang, M. W., Lee, K., & Toutanova, for English. Communications of the ACM, 38(11), K. (2018). Bert: Pre-training of deep bidirectional 39-41. transformers for language understanding. arXiv [18] Mostafazadeh, N., Vanderwende, L., Yih, W. T., preprint arXiv:1810.04805. Kohli, P., & Allen, J. (2016, August). Story cloze [6] Guan, J., Wang, Y., & Huang, M. (2019, July). Story evaluator: Vector space representation evaluation ending generation with incremental encoding and by predicting what happens next. In Proceedings of the 1st Workshop on Evaluating Vector-Space Rep

resentations for NLP (pp. 24 - 29 ). [19] Ostendorf , M. , Bourgonje , P. , Berger , M. ,

Moreno-Schneider , J. , Rehm , G. , & Gipp , B. ( 2019 ).

arXiv: 1909 . 08402 . [20] Radford , A. , Wu , J. , Child , R. , Luan , D. , Amodei,

D. , & Sutskever , I. ( 2019 ). Language models are un-

supervised multitask learners . OpenAI Blog , 1 ( 8 ). [21] Roemmele , M. , Gordon , A. S. , & Swanson ,

R. ( 2017 , August) . Evaluating story generation

SIGKDD 2017 Workshop on Machine Learning for

Creativity (pp. 13 - 17 ). [22] Speer , R. , Chin , J. , & Havasi , C. ( 2017 , February).

Conceptnet 5.5: An open multilingual graph of gen-

on Artificial Intelligence . [23] Sternberg , R. J . (Ed.). ( 1999 ). Handbook of creativ-

ity. Cambridge University Press. [24] Vaswani , A. , Shazeer , N. , Parmar , N. , Uszkoreit , J. ,

Jones , L. , Gomez , A. N. , ... & Polosukhin , I. ( 2017 ).

formation processing systems (pp. 5998 - 6008 ). [25] Wu , T. , Khan , A. , Gao , H. , & Li , C. ( 2019 ). Ef-

arXiv preprint arXiv: 1910 . 06708 . [26] Yao , L. , Mao , C. , & Luo , Y. ( 2019 ). KG-BERT:

preprint arXiv: 1909 . 03193 . [27] Zhou , L. , Gao , J. , Li , D. , & Shum , H. Y. ( 2020 ).