=Paper=
{{Paper
|id=Vol-2699/paper07
|storemode=property
|title=Creative Storytelling with Language Models and Knowledge Graphs
|pdfUrl=https://ceur-ws.org/Vol-2699/paper07.pdf
|volume=Vol-2699
|authors=Xinran Yang,Ilaria Tiddi
|dblpUrl=https://dblp.org/rec/conf/cikm/YangT20
}}
==Creative Storytelling with Language Models and Knowledge Graphs==
Creative Storytelling with Language Models and Knowledge Graphs Xinran Yanga , Ilaria Tiddia a Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands Abstract Automated story generation is a popular and well-recognized task in the field of natural language processing. The emergence of pre-trained language models based on large Transformer architectures shows the great capability of text generation. How- ever, language models are limited when the generation requires explicit clues within the context. In this research, we study how to combine knowledge graphs with language models, and build a creative story generation system named DICE. DICE uses external knowledge graphs to provide context clues and implicit knowledge to generate coherent and creative stories. The evaluation shows that our approach can effectively inject the knowledge from knowledge graphs into the stories auto- matically generated by the language model. Keywords knowledge graph, language model, story generation, natural language generation 1. Introduction Story generation is a challenging task that requires reasonable and relevant content in the generated sen- tences as well as dealing with logic and implicit in- formation (Guan et al. 2019). After large-scale pre- trained language modes like OpenAI GPT-2 (Radford et al. 2019) and BERT (Devlin et al. 2018) have been re- leased in recent years, machines have shown the abil- ity to generate a paragraph of understandable text ac- cording to a given topic. These language models are able to generate mostly-grammatical sentences with nearly perfect syntax and punctuation (Koncel-Kedziorski et al. 2019). However, the text generated by these language models often lacks commonsense knowledge Figure 1: An example of the story generation. The orange (Logan et al. 2019) and it is hard to control the content words are the keywords provided by the user, and the blue of the automatically generated text. To solve the prob- words are the extended entities and relations from the DICE lem, one solution is to take advantage of structured knowledge graph. These words are connected as knowledge inputs, such as tabular inputs and knowledge graphs graphs (SVO triples). “#i” indicates the sentence is the i-th (Koncel-Kedziorski et al. 2019). Meanwhile, one of the sentence of the story. most popular methods to combine language models and knowledge graphs, is using knowledge graph em- beddings. However, creating embeddings for knowl- edge graphs is a complex and time-consuming process; embedding approaches. moreover, knowledge graphs tend to be often updated, We aim to answer the following research questions: and new embeddings have to be created (Wu et al. Q1. How to combine the language model with knowl- 2019). This research introduces a new method to com- edge graphs for the story generation without knowl- bine knowledge graphs with language models without edge graph embeddings? Q2. What are the advantages and disadvantages of using knowledge graphs to auto- Proceedings of the CIKM 2020 Workshops, October 19-20, 2020, matically generate a story? Galway, Ireland email: x6.yang@student.vu.nl (X. Yang); i.tiddi@vu.nl (I. Tiddi) We propose a two-layer system called DICE, which url: https://kmitd.github.io/ilaria/ (I. Tiddi) contains a knowledge enrichment layer and a text gen- orcid: 0000-0001-7116-9338 (I. Tiddi) eration layer, applying the knowledge graph and the © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). language model respectively, to generate coherent and CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) creative stories1 . Figure 1 presents an example of the Existing natural language generation systems are of- story generation process. In the example, the system ten limited when the tasks require higher levels of cre- takes 4 keywords as an input, then enriches the key- ativity and originality (Jain et al. 2017). Pre-trained words with the knowledge graph and constructs subject- language models based on large Transformer architec- verb-object (SVO) triples, the latter will be used as a tures (Vaswani et al. 2017), such as GPT-2 and BERT, prompt for the language model to generate stories. can be a potential solution for this problem. Recently, The current work explores the possibility of con- the OpenAI team has announced the upgraded GPT-3 necting the knowledge graph and the language model (Brown et al. 2020) with 175 billion parameters which with an interface. The advantage of using an inter- is 100 times larger than the previous version, GPT-2. face is that the language model can rapidly adapt to These language models show impressive text gener- the changes from an updated knowledge graph. For ation capabilities that can achieve state-of-the-art re- the knowledge enrichment layer, we implemented two sults without extra training (Keskar et al. 2019). How- versions of DICE knowledge graph. For version1, we ever, these language models perform poorly when cap- retrieve the knowledge from ConceptNet (Speer et al. turing the long tail of rare entities such as numbers and 2017) and WordNet (Miller 1995) and construct an inte- dates (Logan et al. 2019). Moreover, these models are grated knowledge graph of commonsense knowledge unable to build context clues and use implicit knowl- for the story generation. For version2, we enriched the edge to generate a reasonable story ending (Guan et knowledge graph of version1 by using DBpedia facts. al. 2019). For the text generation layer, we choose ROCStories2 (Mostafazadeh et al. 2016) as our story corpus to fine- 2.2. Text Generation with Knowledge tune the language model, GPT-2. More details are dis- Graph Embeddings cussed in Section 3. The contributions are as follows: The problem mentioned above can be improved by com- bining language models with knowledge graphs, where • We propose a new way of combining knowledge the former can facilitate the knowledge extracted from graphs and language models for text generation knowledge graphs. For example, Logan et al. (2019) without using knowledge graph embeddings. The built the knowledge graph language model (KGLM) results show that we can effectively inject the that could select and copy related facts from a knowl- knowledge from knowledge graphs into the au- edge graph. Ostendorff et al. (2019) enriched BERT tomatically generated stories as a background or with knowledge graph embeddings for document clas- a plot and therefore control the content of these sification and got better results than the standard BERT stories to some extent. approach. Meanwhile, Koncel-Kedziorski et al. (2019) • We introduce a fine-tuned model which accepts introduced a new attention model for graph encod- SVO triples as a prompt instead of sentences used ing and used it for the graph-to-text generation. The by original GPT-2 models, to generate reason- main shortcoming of these models is their high cost of able and creative stories with the context pro- computational resources which leads to a long train- vided by the SVO triples. ing and task execution time (Yao et al. 2019). Koncel- Kedziorski et al. (2019) also showed that their pro- posed model failed to mention 40% of entities in the 2. Related Work knowledge graphs in the generated text. 2.1. Text Generation using Language 2.3. Knowledge Enrichment with Models Knowledge Graphs Story generation is a knowledge-intensive process (Li Hsu et al. (2019) proposed the distill-enrich-generate et al. 2013). In particular, open story generation re- framework that using knowledge graphs to enrich the quires artificial intelligence systems to create narra- words distilled from the input images and then gen- tives about any topic without a pre-defined domain erating stories. Liu et al. (2019) used external knowl- model (Li et al. 2013). Meanwhile, a creative story edge graphs to enrich the input sentence as a sentence should be both novel and appropriate (Sternberg 1999). tree for solving NLP tasks such as classification and se- quence labeling. Guo et al. (2019) built a poetry knowl- 1 Data and code available at github.com/ranyxr/dice_story edge graph for keyword mapping, extension, and se- 2 https://cs.rochester.edu/nlp/rocstories/ lection to generate Chinese classical poems with high guage model fine-tuning is the pre-processing step for story generation, which includes two stages: SVO triple extraction and fine-tuning. Story generation also has two stages, i.e., knowledge enrichment and text gen- eration. In the next section, we will discuss each stage in detail. 3.2. Language Model Fine-tuning The OpenAI team has released GPT-3, the GPT-2’s suc- cessor, but it was not available when we conducted the research. As a result, we choose GPT-2 as the natural language generator. OpenAI has released 4 versions of Figure 2: Two-layer architecture of DICE system. The green GPT-23 : the small version with 124M parameters, the arrows indicate the workflow of the knowledge enrichment medium version with 355M parameters, the large ver- process. The purple arrows indicate the workflow of the text sion with 774M parameters, and the XL version with generation process. The blue arrows indicate the workflow 1.5B parameters. Considering the large amount of train- of the language fine-tuning process. ing data (the encoded story corpus is 19M), we choose the medium version of GPT-2 to strike the balance of speed, size, and creativity. An open-source Python quality and relevance. Similarly, Zhou et al. (2020) re- package, gpt-2-simple4 , is used to support the fine- sort to a knowledge graph that consists of a collection tuning and text generation process. Meanwhile, we of head-relation-tail triples to retrieve related topics in choose the ROCStories as our story corpus, which con- their intelligent dialogue system. tains nearly 10 thousand short stories, each story in- Different from some researches above, instead of de- cludes a title and five-sentence content. livering a graph-to-text task which emphasizes the ex- plicit translation from graph to text without creative 3.2.1. SVO Triple Extraction writing, this study puts more focus on using informa- tion from knowledge graphs to provide a background After acquiring the story corpus, we need to encode or a plot for the language model as guidance or inspi- the dataset into a format that allows GPT-2 to generate ration. text according to the specified SVO triples. We extract SVO triples from each story, then add the triples as a prefix for each story respectively. This way, the lan- 3. Method guage model can learn from a hint that each story is generated conditionally on the SVO triples. 3.1. Overview We use spaCy5 to extract SVO triples from each story The task here is to generate 5-sentence stories from a as “entities and relations”. However, sometimes the set of SVO triples that are extracted and regrouped in process may encounter the coreference problem, i.e., a knowledge graph. The expected input of the system a pronoun is used as a subject. For example, the sen- is a set of keywords provided by users. Figure 2 shows tence is “My sister has a dog. She loves him.”, the triple the two-layer architecture of the DICE system. We directly extracted by spaCy is (My sister, has, dog) and use SVO triples as an interface to connect the knowl- (She, loves, him), which are not the expected result be- edge enrichment layer and the text generation layer. cause we want a more specific reference as a subject, The SVO triples can be constructed from knowledge i.e., (My sister, has, dog) and (My6 sister, loves, dog). The graphs or extracted from story corpus; meanwhile, they resolution is using neuralcoref that applies the neu- serve as a prompt for the language model to gener- ral net scoring model to find coreferences in the text ate stories. The system firstly checks the relationships (Clark & Manning 2016). Meanwhile, to simplify the between these keywords and adds additional informa- triple, we convert the verb into its lemma and only ex- tion using the knowledge graph, then generates a set tract the main text of the subject and the object. For of SVO triples to feed the language model to generate 3 https://openai.com/blog/gpt-2-1-5b-release/ stories. 4 https://github.com/minimaxir/gpt-2-simple Two processes are involved to complete this task: 5 https://spacy.io/ language model fine-tuning and story generation. Lan- 6 https://spacy.io/universe/project/neuralcoref the example above, we extract “sister” instead of “my 3.3.1. Knowledge Enrichment sister”, “love” instead of “loves”. The system includes a new knowledge graph dataset As a result, one example from the encoded dataset is named DICE KG. We implemented two versions of DICE the following: KG. Version1 (CW, i.e., ConceptNet and WordNet) com- (Joseph, sign, deal), (Joseph, be, musician), (Joseph, bines two large open-source knowledge graphs: Con- be, songwriter), (Joseph, hope, write), (Joseph, ceptNet 5.6.0 and WordNet. ConceptNet is a knowl- lose, wallet), (woman, contact, Joseph), (Joseph, edge graph that connects words and terms (phrases of have, idea) The Best Single Joseph has just re- natural language) with assertions (labeled, weighted cently signed a deal with a new record label. edges) (Speer et al. 2017). Unlike ConceptNet, Word- He is a musician and a songwriter who hopes Net is a large lexical database of English with cognitive to write a best new hit. On his way to a local synonyms (synsets), which are connected by means of coffee shop to brainstorm, he lost his wallet. conceptual-semantic and lexical relations (Miller 1995). Joseph was frustrated until a woman contacted The DICE KG converts these two datasets into an in- him and returned it. Suddenly, he realized he had an idea for his new song about kindness. tegrated model, and as a result, the dataset contains more than 1.6 million nodes and over 3 million rela- Words in red are the SVO triples; words in orange are tionships with 54 types. The DICE KG is large enough the story title; words in blue are the story content. for finding relations between the keywords given by users and constructing a set of SVO triples using the 3.2.2. Fine-tuning entities and relations in the knowledge graph. More- over, each relationship between the words has an an- The last step of this process is to fine-tune the model notation named “weight", which can help the system based on the encoded dataset, which includes both SVO to find a more reasonable path in the next step, i.e., the triples and the original ROCStories. However, language SVO triple construction. models like GPT-2 are built for longform content, gen- We also introduce another version (DBCW, i.e., DB- erating short text like 5-sentence stories is not the typ- pedia, ConceptNet, and WordNet) of DICE KG that en- ical generation scenario. To workaround this issue, we riches version1 with DBpedia’s mappings8 . The DBCW use GPT-2-simple, which allows us to add flags to indi- version includes over 8.5 million nodes with 6 labels cate where is the start and the end of each short text (5- and over 23 million relationships with 694 types. In sentence story in this case), then the language model this version, we enrich the common concepts from Con- will automatically extract the shortform texts during ceptNet and WordNet with factual instances and prop- the fine-tuning process. erties from DBpedia. In Section 4, we compare the per- The final fine-tuned model is called the DICE model, formance of the two versions of DICE KG. which can be found and downloaded on Google Drive7 . To construct SVO triples from the given keywords, there are 3 steps: internal matching, external enrich- 3.3. Story Generation ment, and converting paths to triples. The internal matching concerns finding meaningful relations be- We use the SVO triples as a prompt for GPT-2. The tween the keywords, so that we can later put the key- triples are constructed based on the keywords by using words at the corresponding position in an SVO triple. knowledge graphs . Each triple includes a subject and If there is a keyword that has no relation with other an object as its entities and a verb as its relation that keywords, we use external enrichment to assign other connects the entities. The SVO triples can not only related words in the knowledge graph to construct an give the language model topics (entities) to talk about SVO triple for the keyword. The first two steps are but also define part of the plots (relations) of the story. both semi-automatic, i.e., we use Cypher to query the For example, (Jane, be, singer) defines the background graph database and get the matching candidates while of the story, where there is a person whose name is manually filtering the matching results, which are still Jane who is a singer. The story generation includes needed to ensure the quality of the SVO triples. two stages: knowledge enrichment and text genera- Figure 3 shows an example of the SVO triple con- tion. struction. We assume that the keywords are: {love, cat, beer, nap}. Firstly, we try to lookup the one-hop rela- tionship (only specific relations are considered, such as 7 https://drive.google.com/drive/folders/ 8 https://databus.dbpedia.org/dbpedia/mappings/ 1T68rWkOde5ZwcuodQ9iWuYJAcqAmb0Jo mappingbased-objects/2020.07.01 Table 1 A story generated by DICE. Words in red are the subjects from the SVO triples; words in orange are the verbs (rela- tions) from the SVO triples; words in blue are the objects from the SVO triples. Title Content Tina loved to sing and drink beer with her friends. One day she was drunk and didn’t Lazy know what to do. She decided to go to the bar Cat and see what she could do. She drank some beer and then went home. She went to sleep and woke up to her cat’s snoring. Figure 3: An example of SVO triple construction. Words in yellow are verbs. Words in green are nouns. Words in red are the enriched words from knowledge graphs. 4. Experiments 4.1. Baselines CapableOf and Desires) between the keywords in the DICE (CW) vs. Human. A given keyword set will knowledge graph. In this case, we find one direct rela- be provided to both a human and the DICE system tion: (cat, desires, nap). Next, we assign additional in- (with CW version of knowledge graph) to create sto- formation to the keywords without a direct relation. In ries, then we compare the results of human-written this case, for the verb “love”, we randomly choose the stories and machine-written stories. word “sing” as the verb’s object, which is connected to DICE (CW) vs. GPT-2. For the original GPT-2 “love” through a relation called “CausesDesire”. Mean- model, we construct one or two sentences containing while, we choose “Tina”, which belongs to the person all the entities in the keyword set, and we use these class, as the verb’s subject. For “beer” which is a noun, sentences as input for the GPT-2 model which is di- we assign it a verb “drink”, which is related to “beer”, rectly fine-tuned on ROCStories to generate a story. and we also choose “Tina” as its subject to keep the We then use the same keyword set to generate stories story simple. Finally, we also need to map the directly using the DICE model and compare the results. one-hop relation into a more common word, for ex- DICE (CW) vs. GPT-2-keyword-generation. GPT- ample, (cat, desires, nap) becomes (cat, want, nap). As 2-keyword-generation9 is open-source software that a result, the final SVO triples are (cat, want, nap), (Tina, using GPT-2 to generate text pertaining to the speci- love, sing), and (Tina, drink, beer). fied keywords. We compare the stories directly gener- ated from a set of keywords with the stories generated 3.3.2. Text Generation by the DICE system. The final stage is the text generation. After we get the DICE-CW vs. DICE-DBCW. We also compare the SVO triples, we can use these as a prefix to generate performance of the DICE system when using differ- stories from the trained model. In this process, we use ent versions of DICE KG to evaluate whether factual GPT-2 as the story generator. Meanwhile, we use gpt- knowledge graphs can contribute to the story genera- 2-simple which allows for prefixes to force the gener- tion. ated text to start with the prefix and generate stories from these triples. Finally, we truncate the prefix and 4.2. Evaluation flags in the generated stories, to return text only with titles and contents. Table 1 shows one example gener- 4.2.1. Evaluation Metrics ated by DICE using the triples mentioned above. These The evaluation focuses on two aspects of the gener- stories are handpicked from 75 automatedly generated ated output: story-independent metrics and story-dependent stories. We can see the stories can exactly reflect the metrics (Roemmele et al. 2017). Story-independent entities and relations from the SVO triples in gener- metrics, including grammatical correctness, clarity, and ated stories, although the triples may not be presented engagement, will be used to analyze the quality of the in the stories 100% of the time. generated output without considering its context; whereas 9 github.com/minimaxir/gpt-2-keyword-generation Table 2 son (non-native English speaker but with professional Explanations and approaches for each metric. Metrics in working proficiency) to write two stories with the same orange are story-independent metrics. Metrics in blue are keywords. For human-written stories, each story should story-dependent metrics. only contain 5 sentences and every keyword in the keyword set must be mentioned in the story content. Evaluation Finally, we invited people to estimate whether the story Metrics Explanation approach is written by a human or a machine and score each grammatical The correctness of spelling, Automatic story on its creativity and coherence. correctness grammar and punctuation Whether the text is easy to clarity Automatic understand. 5. Results and Discussion Whether the writing style engagement Automatic is interesting and effective. 5.1. Experiment Results Whether the stories are creativity Manual We picked 100 random samples for each model to eval- creative or not. Semantically coherent of uate their performance. We gathered the automatic coherence Manual the output. evaluation results and manual evaluation results and To what extent do the key- separated them by story-independent metrics and story- Keyword coverage words are presented in the Automatic independent metrics, which were shown in Table 3 and generated text. Table 4 respectively. The result shows there is no much difference according to the story-independent metrics among the stories written by the language models and human-written stories. The overall grammaticality per- story-dependent metrics, including coherence, keyword formance of each model is satisfactory. The Gram- coverage, and creativity, will be used to evaluate the marly overall score of the fine-tuned GPT-2 model is generated stories with reference to the context (Roem- even higher than the score of human-written stories. mele et al. 2017). On the other hand, the evaluation For samples from ROCStories, most of the grammat- combines both automatic evaluation and manual eval- ical errors are the punctuation misuse. While for the uation. Explanation of each metric and the evaluation stories generated by language models, the biggest writ- approaches are shown in Table 2. ing issue is the determiner (a/an/the/this, etc.) misuse, Automatic Evaluation. For story-independent met- followed by punctuation misuse and wordy sentences. rics, we used the automated analysis tool, Grammarly, For the two story-dependent metrics of creativity to evaluate the overall grammaticality performance of and coherence, all the models perform poorly com- the generated text. For keyword coverage, we used a pared with human writers. In general, the generated script to monitor to what extent do the keywords were stories are not always logical and making sense, even presented in the generated stories. with a properly trained model. The OpenAI team shows Manual Evaluation. Stories should be reasonable that it takes a few tries to get a good and reasonable and coherent with the context (Guan et al. 2019), which result, and meanwhile, the number of tries is highly is hard to access by automatic tools. As a result, a man- dependent on the topics presented in the training data. ual evaluation was also performed to more accurately Particularly, in this case, the given keywords can influ- evaluate the quality of each story. We invited 3 indi- ence the performance of the result significantly. For viduals to score the stories from each model, includ- example, if the given keywords are barely related to ing stories from the original ROCStories. We applied each other, then the model can perform poorly. This is 5-point Likert scales to rate each story on its creativity because unrelated keywords make it more difficult to and coherence. Then we calculated the overall average generate related SVO triples, and unrelated SVO triples score for each model. 10 lead to unconnected sentences in the generated sto- Furthermore, we used a questionnaire to investi- ries. However, the keyword coverage of the DICE sys- gate whether readers could tell the difference between tem (96% for DICE-CW and 97% for DICE-DBCW) is the automatically generated stories and the human- significantly higher than other baselines (73% for GPT- written ones. We handpicked two stories generated 2, 88% for GPT-keyword-generation). However, for by the DICE system where the stories were generated the DICE-DBCW, the coverage of the enriched words based on a given keyword set. Then we invited a per- (80%) from DBpedia is lower compared with the key- word coverage. This is because some of the enriched 10 https://forms.gle/jEu1LohH5zkADiNt6 words are proper nouns, like brand names, which are Table 5 hardly shown in the training text. Result of the questionnaire. Average Vote Vote Table 3 Story Written Score for for Results of the story-independent metrics. No. by rate by Machine Human human Model Correctness Clarity Engage Score Story1 Human 2.70 70.4% 29.6% 21 alerts/ Very Story2 Machine 2.80 70.4% 29.6% GPT-2 Engaging 82/100 Story3 Human 3.37 31.5% 68.5% 4276 words clear GPT- Story4 Machine 2.74 81.5% 18.5% 26 alerts/ Mostly keyword- Bland 78/100 3512 words clear generation DICE- 18 alerts/ Mostly 5.2. Injecting Relations into Stories Bland 75/100 CW 3931 words clear As mentioned in the last section, the keyword cover- DICE- 31 alerts/ very A bit DBCW 5279 words clear bland 80/100 age (96%) and the relation coverage (100%) of the DICE 54 alerts/ Very A bit system are very high during the test. This means the Human 4591 words clear bland 80/100 SVO triples can effectively affect the plots of the gen- erated stories. During the experiment, we find that we can use SVO triples to inject entities and the re- lations between the entities into the stories as back- Table 4 grounds or plots. As a result, the quality of the SVO Results of story-dependent metrics. triples and the order of these triples can significantly affect the quality of the automatically generated sto- Model Creativity Coherence Keyword ries. Since these triples are generated from the knowl- coverage edge graphs, the logic and relationships behind these GPT-2 2.3/5 2.4/5 0.7275 knowledge graphs are also important to a better story GPT- generation. keyword- 2.4/5 2.7/5 0.88 generation DICE-CW 2.2/5 2.5/5 0.9625 5.3. Quality of Generated Stories DICE- 2.3/5 2.7/5 0.9725 As shown in Table 4, there is little difference in the DBCW creativity score and the coherence score from the base- Human 3.7/5 4.9/5 N/A lines to the DICE model. Although with the DICE model, we are able to inject relations into the stories, the re- lation can only affect the logic within each sentence while it cannot influence the logic that runs through 5.1.1. Questionnaire Results the story. This is because the SVO triples extracted The questionnaire has received 54 responses. Most of during the language fine-tuning process, are extracted the respondents are native English speakers (4/5 of the from each sentence separately in the stories which are respondents), and some of them are non-native speak- loosely connected, so they cannot reflect relations like ers (1/5 of the respondents) but with effective English causation throughout the text. As a result, the coher- proficiency. The result is shown in Table 5. In general, ence of the generated stories from the DICE model is there is a great chance (37.5% on average) for people not satisfying in general. to make a mistake when judging whether the story is written by a human or a machine. In particular, sto- 5.4. Commonsense vs. Factual KG ries with short sentences and wrong word choices are more likely to be regarded as a machine-written story. We introduce two knowledge graphs in this research. On the other hand, for stories that are interesting and The knowledge graph used in version1 (CW) is a se- creative but without coherence between the sentences, mantic knowledge graph where common concepts and people are more likely to make a mistake and think the words have many connections with each other, which stories are written by a human. is the foundation to relate keywords and construct SVO triples. While for the fact-based knowledge graphs like DBpedia, they can hardly provide connections be- commonsense knowledge. In Proceedings of the tween the common concepts, and as a result, they can AAAI Conference on Artificial Intelligence (Vol. 33, hardly contribute to the triple construction process. pp. 6473-6480). However, with a combination of semantic knowledge [7] Guo, Z., Yi, X., Sun, M., Li, W., Yang, C., Liang, J., graphs and factual knowledge graphs, i.e., DICE KG ... & Li, R. (2019, July). Jiuge: A Human-Machine version2 (DBCW), we can make use of the knowledge Collaborative Chinese Classical Poetry Generation about the instances of the concepts and the properties System. In Proceedings of the 57th Annual Meeting of the instances from factual knowledge graphs, and of the Association for Computational Linguistics: we can use it to enrich the entities in the triples. System Demonstrations (pp. 25-30). [8] Hsu, C. C., Chen, Z. Y., Hsu, C. Y., Li, C. C., Lin, T. Y., Huang, T. H. K., & Ku, L. W. 6. Conclusions (2019). Knowledge-Enriched Visual Storytelling. arXiv preprint arXiv:1912.01496. In this paper we showed how to use subject-verb-object [9] Jain, P., Agrawal, P., Mishra, A., Sukhwani, M., triples as a context clues input to the generative model, Laha, A., & Sankaranarayanan, K. (2017). Story gen- to connect language models and knowledge graphs for eration from sequence of independent short de- story generation. Evaluation results showed that we scriptions. arXiv preprint arXiv:1707.05501. can effectively inject entities and relations from knowl- [10] Keskar, N. S., McCann, B., Varshney, L. R., Xiong, edge graphs into the generated stories. Future work C., & Socher, R. (2019). Ctrl: A conditional trans- will focus on improving the coherence of the gener- former language model for controllable generation. ated stories and making them have smooth transitions arXiv preprint arXiv:1909.05858. between sentences. For example, in order to improve [11] Knublauch, H., & Kontokostas, D. (2017). Shapes the performance of the internal matching process, we constraint language (SHACL). W3C Candidate Rec- can classify popular words into specific classes and use ommendation, 11(8). ontology techniques, such as SCHACL (Knublauch & [12] Koncel-Kedziorski, R., Bekal, D., Luan, Y., Lap- Kontokostas 2017) and OWL restrictions (McGuinness ata, M., & Hajishirzi, H. (2019). Text Generation & Van Harmelen 2004), to make sure these classes can from Knowledge Graphs with Graph Transformers. interact with each other based on specific rules. arXiv preprint arXiv:1904.02342. [13] Li, B., Lee-Urban, S., Johnston, G., & Riedl, M. References (2013, June). Story generation with crowdsourced plot graphs. In Twenty-Seventh AAAI Conference [1] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Ka- on Artificial Intelligence. plan, J., Dhariwal, P., ... & Agarwal, S. (2020). Lan- [14] Liu, W., Zhou, P., Zhao, Z., Wang, Z., Ju, Q., guage models are few-shot learners. arXiv preprint Deng, H., & Wang, P. (2019). K-bert: Enabling lan- arXiv:2005.14165. guage representation with knowledge graph. arXiv [2] Chen, J., Chen, J., & Yu, Z. (2019, July). Incorpo- preprint arXiv:1909.07606. rating structured commonsense knowledge in story [15] Logan, R., Liu, N. F., Peters, M. E., Gardner, M., & completion. In Proceedings of the AAAI Confer- Singh, S. (2019, July). Barack’s wife hillary: Using ence on Artificial Intelligence (Vol. 33, pp. 6244- knowledge graphs for fact-aware language model- 6251). ing. In Proceedings of the 57th Annual Meeting of [3] Chen, Z., Eavani, H., Liu, Y., & Wang, W. Y. (2019). the Association for Computational Linguistics (pp. Few-shot NLG with Pre-trained Language Model. 5962-5971). arXiv preprint arXiv:1904.09521. [16] McGuinness, D. L., & Van Harmelen, F. (2004). [4] Clark, K., & Manning, C. D. (2016). Deep rein- OWL web ontology language overview. W3C rec- forcement learning for mention-ranking corefer- ommendation, 10(10), 2004. ence models. arXiv preprint arXiv:1609.08667. [17] Miller, G. A. (1995). WordNet: a lexical database [5] Devlin, J., Chang, M. W., Lee, K., & Toutanova, for English. Communications of the ACM, 38(11), K. (2018). Bert: Pre-training of deep bidirectional 39-41. transformers for language understanding. arXiv [18] Mostafazadeh, N., Vanderwende, L., Yih, W. T., preprint arXiv:1810.04805. Kohli, P., & Allen, J. (2016, August). Story cloze [6] Guan, J., Wang, Y., & Huang, M. (2019, July). Story evaluator: Vector space representation evaluation ending generation with incremental encoding and by predicting what happens next. In Proceedings of the 1st Workshop on Evaluating Vector-Space Rep- resentations for NLP (pp. 24-29). [19] Ostendorff, M., Bourgonje, P., Berger, M., Moreno-Schneider, J., Rehm, G., & Gipp, B. (2019). Enriching BERT with Knowledge Graph Embed- dings for Document Classification. arXiv preprint arXiv:1909.08402. [20] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are un- supervised multitask learners. OpenAI Blog, 1(8). [21] Roemmele, M., Gordon, A. S., & Swanson, R. (2017, August). Evaluating story generation systems using automated linguistic analyses. In SIGKDD 2017 Workshop on Machine Learning for Creativity (pp. 13-17). [22] Speer, R., Chin, J., & Havasi, C. (2017, February). Conceptnet 5.5: An open multilingual graph of gen- eral knowledge. In Thirty-First AAAI Conference on Artificial Intelligence. [23] Sternberg, R. J. (Ed.). (1999). Handbook of creativ- ity. Cambridge University Press. [24] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural in- formation processing systems (pp. 5998-6008). [25] Wu, T., Khan, A., Gao, H., & Li, C. (2019). Ef- ficiently embedding dynamic knowledge graphs. arXiv preprint arXiv:1910.06708. [26] Yao, L., Mao, C., & Luo, Y. (2019). KG-BERT: BERT for knowledge graph completion. arXiv preprint arXiv:1909.03193. [27] Zhou, L., Gao, J., Li, D., & Shum, H. Y. (2020). The design and implementation of xiaoice, an em- pathetic social chatbot. Computational Linguistics, 46(1), 53-93.