Introduction

Dungeons and DQNs: Toward Reinforcement Learning Agents that Play Tabletop Roleplaying Games

School of Interactive Computing, Georgia Institute of Technology

riedl@cc.gatech.edu 1 0 Atlanta , GA 30332 , USA 1 In: H. Wu, M. Si, A. Jhala (eds.): Proceedings of the Joint Workshop on Intelligent Narrative Technologies and Workshop on, Intelligent Cinematography and Editing , Edmonton , Canada , 11-2018, published at http://ceur-ws.org

Game playing has been an important testbed for arti cial intelligence. Board games, rst-person shooters, and real-time strategy games have well-de ned win conditions and rely on strong feedback from a simulated environment. Text adventures require natural language understanding to progress through the game but still have an underlying simulated environment. In this paper, we propose tabletop roleplaying games as a challenge due to an in nite action space, multiple (collaborative) players and models of the world, and no explicit reward signal. We present an approach for reinforcement learning agents that can play tabletop roleplaying games.

Introduction

those actions serve the purpose of solving puzzles to unlock the story. In TRPGs, even though the players can choose actions, none of the players know exactly what will happen in response to those actions and should adjust accordingly. Even the Game Master may encounter valid player actions that are unexpected, and they must decide how aspects of the world that are not controlled by the players will respond. Most signi cantly, no single player or system|including the Game Master|possesses a ground-truth understanding of the complete state of the world.

In a game like D&D, actions cannot be cleanly mapped to states. Instead, players need to maintain a general model of the world that can be exibly altered as the story progresses. Since there is no shared simulation engine that maintains a ground-truth state of the world, there is no way for players to receive feedback about the consequences of their actions except for intrinsic motivation. This means that an AI player would need a set of commonsense knowledge and procedures so that it can act in a reasonable manner. The AI should know what can physically and temporally happen in the world (e.g. if I leave the lightsaber here, it will stay here until someone picks it up again); what social and cultural norms it should follow (e.g. greet people when you meet them); and what tropes the genre normally follows (e.g. fairies are found in forests).

Action selection in TRPGs can be further complicated by the fact that there is no well-de ned win condition. TRPGs are usually set up with scenarios called campaigns where there are short-term objectives (such as quests) to complete, but even those might not be clearly de ned. In D&D, characters may die, and \hit points" (numerical indication of health) can be thought of as an indicator of success in combat, but there are no clear signals of success or progress in non-combat portions (the majority) of the game. This makes it especially hard for an AI player to know whether it is acting appropriately (i.e. there is no explicit reward signal).

D&D is also largely collaborative, which is unusual for a game with multiple players. Collaboration in a game means that not only does the agent need to understand what their fellow players are trying to do but be able to work toward a joint goal which might not be explicit. The agent should not be just ful lling its own agenda.

In this paper, we propose an approach to creating a TRPG player. Since this is an expansive challenge for the current state of AI, we will focus on the improvisational nature of action selection in the context of a quest. We have made the following simplifying assumptions in order to initially make the challenge more tractable. (1) We do not consider combat or actions that are constrained by numerical values such as strength or health. (2) We also assume that the agent is always \in character" and thus does not interact with other players in extra-dietetic ways (e.g., out of character conversations to plan out actions). (3) If another player is a GM, we only consider descriptions of events that occur, but not refereeing communications. The important aspects that we're still maintaining are collaboration, improvisation, and keeping track of and maintaining a consistent world. The world is represented as a set of rules acting on the current state, informed by a sense of genre.

In the remainder of the paper, we relate TRPG playing to interactive ction, interactive storytelling, and story generation. We put forth a proposal for using a form of reinforcement learning|Deep Q Networks (DQNs)|to meet the criteria above for the portions of TRPGs we focus on. 2

Background and Related Work

Interactive Fiction (IF) has been around long before the creation of the personal computer, found in the form of choose-your-own-adventure books. These stories enabled the user to not only experience the narrative but to have input into what events will take place. Computerized IF provided users with exibility of input; giving the player natural-language commands provided a greater sense of agency. Early computer IF, such as Adventure and Zork, had a command language in the form of hverbi hnoun phrase (NP)i, which can include a prepositional phrase and/or adjectives. For example, \enter house" or \give book to woman". Patterns like this can be resolved by simple grammars, but the language that these systems permitted was very limited. Early IF-playing systems avoided dealing with language all together, having the system work only with propositional logic for example [HA04]. Recent work has focused on neural networks and reinforcement learning|in particular deep Q networks (DQNs)|to play IF [NKB15, HZMM18, YCS+18]. IF has become an increasingly common testbed for AI research, especially with the introduction of toolkits [CKY+18]. We believe that DQNs can also be used to play D&D; while super cially the same, major di erences in what the agent can know make this a distinct challenge.

The eld of interactive narrative concerns itself with the creation of digital interactive experiences in which users create or in uence a dramatic storyline through actions, either by assuming the role of a character in a ctional virtual world, issuing commands to computer-controlled characters, or directly manipulating the ctional world state [RB13]. Interactive narratives sometimes make use of an Experience Manager|also called a Drama Manager|an intelligent, omniscient, and disembodied agent that monitors the virtual world and intervenes to drive the narrative forward according to some model for quality of experience. An experience manager progresses the narrative by intervening in the ctional world, typically by directing computer-controlled characters in how to respond to the user's actions. Riedl and Bultiko [RB13] give a high-level overview of some of the techniques that have been attempted. Reinforcement-learning{based approaches to drama management include [BRN+07] and [HR16].

Interactive narratives share a lot of similarities with TRPGs. However, players do not describe their actions in natural language but use point-and-click action interfaces to interact with the world. In some instances, the player can engage in dialogue with NPCs through unconstrained natural language [MS03]. Nonetheless, NPCs in interactive narratives are constrained to a xed and pre-speci ed repertoire of actions and dialogue. In this paper we focus on the opposite problem of AI agents that play TRPGs, and to make the problem more tractable, there is no external evaluator of actions nor Experience Manager.

Contrasting IF playing and drama management, Interactive Fiction generation systems use pre-existing resources to develop dynamic IF that adapts to the player's choices. Systems like Scheherazade-IF [GHLR15] and DINE [CGO+17] were strongly in uenced by automated story generation, giving the user control again, whereas a traditional IF simply has the user discover the preexisting story. Playing a TRPG shares many of the same challenges as being able to automatically generate a story; both story generation and TRPG playing require an agent to select what a character will do next. Automated story generation has a long history of using planning systems [Mee77, Leb87, CCM02, PC09, RY10, WY11] that work in well-de ned domains. Recently, machine learning has been used to build story generation systems that automatically acquire knowledge about domains and how to tell stories from natural language corpora [LLUJR13, SG12, RG15, KBT17, GAG+17, MAW+18]. Our approach draws heavily from neural-network{based approaches. 3

Proposed Approach

Similar to text adventure games, a TRPG's game state is hidden. However, what makes TRPGs di erent from text adventure games is the lack of a shared game engine to maintain a ground-truth state of the ctional world and to provide a xed set of allowable actions. That is, the \game engine" is largely in the heads of the players and each player may have a di erent understanding of the world state. This makes playing TRPGs more akin to improvisational theater acting [MMR+09]. While the Game Master may be considered the maintainer of ground-truth state and an arbiter of what can and cannot be done in the ctional world, the GM's belief about the state of the world is just one of many and refereeing is mostly restricted to combat and other formulaic parts of the game. Still, one may assume that, just as with the real world, the ctional world does have some rules and conventions, some of which may be explicit, others can be implied. Marie-Laure Ryan named this implication the principle of minimal departure, which says that, unless stated otherwise, we are to assume that a ctional world matches our actual world as closely as possible [Rya80]. This means that the ctional world that our agent operates in should have as many similarities to our actual world as we can give it.

This poses a problem though; how can the agent acquire models of the explicit and implicit rules of the ctional world? A standard technique in machine learning is to train a model on a corpus of relevant data. In our case, the most relevant data from which to learn a model is likely to be stories from the particular genre of ctional world our agent will be inhabiting. While it is possible to learn a model of likely event sequences (i.e. machine-learned story generation models [RG15, MAW+18, KBT17, FLD18]), recurrent neural networks maintain state as hidden neural network layers, which are limited in the length of their memory and do not explicitly capture the underlying reason why certain events are preceded by others. This is essential because the other, human players may make choices that are very di erent from sequences in a training corpus|what are referred to as \out of distribution"|and are capable of remembering events and state information for long periods of time. Because of the principle of minimal departure, story generation models also fail to capture details that we take for granted in our own lives|details that are too mundane to mention in stories, such as the a ordances of objects. For example, the system would be unable to understand why a cow can be hurt but a cot can't no matter how much weight you put on it.

Our proposal has two parts. First, we propose a method for acquiring models of the ctional world by blending commonsense, overarching rules about the real world with automated methods that can extract relevant genre information from stories. Second, we propose a reinforcement learning technique based on Deep Q Networks that can learn to use these models to interact with human TRPG players. Our proposed agent works as follows. It rst converts any human player's declaration of action|a natural language sentence|into an event, which is an abstract sentence representation that is easier for AI systems to work with. We will describe the event representation in Section 3.1. This event is used to update the agent's belief about the state of the ctional world. Once the state is updated, the agent takes its turn, selecting a new event using the deep reinforcement learner. The state is updated again and the agent's event is convert back into natural language so that the human player can read what the agent did. This pipeline can be seen in Figure 1(a).

The training method is shown in Figure 1(b). While the DQN is exploring during training, the previous event in the story is passed into a Sequence-to-Sequence LSTM [SVL14] that is trained on data from the genre we selected. The Seq2Seq network generates a distribution over possible subsequent events according to our model of genre expectations. A set of rules lters the list of events, keeping only events that could occur given the current state of the game. The agent chooses to exploit its policy model or explore randomly, and once a valid event is picked, the state is updated. Because we have a rule model, we can conduct multi-step lookahead, wherein the agent explores several steps into the future before using the reward to update the policy. Each event that is picked should bring the agent closer to its goal for the campaign. The goal in this case is a genre-appropriate pre-de ned event that we select. 3.1

The Agent's Model of the Fictional World

In this section we describe the two-part model of the ctional world that the agent has access to in order to select actions and update its understanding of the current state of the game world. 3.1.1

The Genre Expectation Model

Given a corpus of stories from a genre related to the ctional world that the agent will inhabit, we train a genre expectation model. This model provides the probability, according to the genre-speci c corpus, that certain events will happen after other events. Speci cally, we model genre expectations as a sequence-to-sequence LSTM [SVL14], a type of recurrent neural network. However, instead of training on natural language sentences, we rst convert sentences to events.

An event is a tuple hs; v; o; n; pi where v is the verb of the sentence, s is the subject of the verb, o is the direct object of the verb, n is the noun of a propositional phrase (or indirect object, causal complement, or any other signi cant word), and p is the preposition. Martin et al. [MAW+18] found that representations similar to this assist with the accuracy of models trained on stories. In this work, we add the p element, which Pichotta and Mooney [PM16] also found to be helpful for event representations. These words are extracted from the original sentence, stemmed, and then generalized or looked up in their respective databases. Verbs are abstracted using VerbNet [KS05], which provides a category{or class{for each verb. All other words are queried in a hierarchical lexicon called WordNet [Mil95], and the generalized word/Synset is taken from two levels up in the hierarchy (i.e. the grandparent). Adding the preposition to the event enables us to make closer comparisons between possible syntactic constructions within a VerbNet class since we would not have access to the original sentence's syntax.

The story corpus for the genre is pre-processed such that each clause is transformed into an event. Using these events, we train the Seq2Seq model to predict the probability of an event given a previous event. This becomes the agent's genre expectation model, giving the agent a pool of likely actions to choose from during its turn. However, it does not not guarantee that they are valid or logical actions though, nor would it be able to keep track of long-term dependencies. 3.1.2

Commonsense Rules Model

To help the agent with selecting appropriate events/actions, we acquire a second model of general, commonsense rules about the real world. The purpose of this model is to (a) prune out candidate events that would not work for the current state of the game, and (b) allow the agent to do lookahead planning to determine how current actions might a ect future world states.

The rules are acquired from a set of semantic facts we get from VerbNet. In VerbNet, each verb class has a set of frames. Each frame is determined by a grammatical construction that this verb can be found in. Within a frame, the syntax is listed, along with a set of semantics. The semantics specify what roles/entities are doing what in the form of predicates. For example, VerbNet would tell us that the sentence \Lily screwed the handle to the drawer" yields the following predicates:

CAUSE(Agent,Event) TOGETHER(end(Event), Patient, Co-Patient) ATTACHED(end(Event),Patient, Instrument) ATTACHED(end(Event),Co-Patient, Instrument) where Lily is the Agent, the handle is the Patient, the drawer is the Co-Patient, and screw is the Instrument. In other words: Lily caused the event, and at the end of the event, the screw attached the drawer and the handle together.

Based on the principle of minimal departure, our agent assumes that when an event occurs, the frame's predicates hold, acting as the agent's knowledge about the actual world. This is reasonable because the frame semantics are relatively high-level and can occur in a variety of scenarios. Whereas the state of the genre expectation model is latent, we can use the facts generated by applying commonsense rules to maintain explicit beliefs about the world that persist until new facts replace them. That is, the drawer and handle will remain attached until such time that another verb class indicates that they are no longer attached. This is important because the agent's belief statement won't be tied to a limited, probabilistic window of history maintained by the genre expectation model.

However, the predicates currently provided by VerbNet frames are insu cient for our purposes. We augment VerbNet by breaking down predicates that required more detail. All of the predicates are either considered \core predicates"|where they cannot be broken down further|or are given other existing predicates to form preconditions and post-conditions. Preconditions are conditions that must be true in the world prior to the verb frame being enacted. Post-conditions|or e ects|are facts about the world that hold after a verb frame has nished being enacted. This information would not be learned by a recurrent neural network.

We use the preconditions to lter out any actions proposed by the genre expectation model that are not consistent with the current state of the world. Once an action is selected, we use the post-conditions to update the agent's belief state about the world. Magerko et al. made use of pre- and post-conditions for actions so that the individual agents separate from the Game Manager in their game kept the story consistent [MLA+04]. Similarly, Clark et al. [CDT18] broke VerbNet semantics into pre- and post-conditions. 3.2

Reinforcement Learning for TRPGs

Reinforcement learning (RL) is a technique whereby an agent learns a policy, mapping states to actions that maximize expected future reward. A reward function gives a value indicating how good or bad di erent states are, and it can be sparse|meaning that a reward is not provided very often. RL agents learn the policy incrementally by trial and error, attempting to nd correlations between states si and future expected reward, Q(si). Deep Q networks learn to approximate the expected future reward for states with a neural network. Deep Q network agents have been shown to be able to play complicated games such as Atari video games [MKS+13] and text adventures with limited action spaces [NKB15, HZMM18, YCS+18].

Our event representation turns an in nite number of actions (any natural language sentence) into a large, but nite action space. Still, we cannot perform exhaustive trial and error learning while training a deep Q network to play D&D. The genre expectation model provides a beam of highly probably events that are consistent with the genre. The commonsense rule model lters events and also allows acts as a transition function, helping the agent to search through possible future states for those that return the highest reward, which will, in turn, help the agent converge faster in situations where the reward (e.g. reaching a particular state) is sparse. Details on the ow of information through the DQN can be seen in Figure 1(b). [CCM02]

Future Work

One of the outstanding limitations of our current proposal is the reliance on a reward function. For the near future, rewards are based on quest completion, although that is only one aspect of the tabletop roleplaying game experience. Quest completion is a sparse reward, which is one of the reasons why the commonsense rules will be useful in allowing the agent to lookahead since most states will not provide any reward signal. In the future, we will need to identify or learn more complete reward functions.

Future versions of the system can be created to learn what rules are broken by the user and remain consistent with them. This will require the ability for the agent to identify broken rules and then remove them from its processing of potential actions to take. It might also import other genre models. For example, if the user has raised Vinay from the dead, the agent now knows that after a character dies, they are no longer just removed from the story but can be reanimated. It might also integrate a horror genre that includes zombies. For now, we will start with a strict set of rules that the agent must obey when it is playing the game, and the agent will work within one genre at a time. 5

Conclusions

As game-playing AI research progresses, we argue that TRPGs like D&D are an appropriate next challenge. TRPGs are unique from other games in that they have an in nite selection of actions, have a partially-visible world, contain hidden states, use intrinsic reward, do not have explicit progress markers, and are cooperative. We outlined a subproblem of TRPGs which focuses less on character stats, which we already know computers to handle well, but also simpli es the problem slightly by eliminating the refereeing of rules. TRPGs are unlike text adventure games in that the players have more agency in a ecting the story, but it is also unlike Drama Management where the system gives the player some control over the story but still has the nal say. TRPG players are more similar to collaborative automated story generators in this way.

To create an AI that plays this modi ed TRPG, we proposed that the agent has a model of the world that is a combination of rules about our actual world and a concept of what events usually occur within similar ctional worlds. The agent is then trained to use the model through deep Q-learning, which has been successful in playing games. By sharing our plans for our TRPG player, we hope that we inspire other AI researchers to look into this unique space of games.

Acknowledgments

This work is supported by DARPA W911NF-15-C-0246. The views, opinions, and/or conclusions contained in this paper are those of the authors and should not be interpreted as representing the o cial views or policies, either expressed or implied of the DARPA or the DoD. [BRN+07] Sooraj Bhat, David L. Roberts, Mark Nelson, Charles Isbell, and Michael Mateas. A globally optimal algorithm for TTD-MDPs. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, 2007.

M. Cavazza, F. Charles, and S. Mead. Planning characters' behaviour in interactive storytelling. Journal of Visualization and Computer Animation, 13:121{131, 2002.

Peter Clark, Bhavana Dalvi, and Niket Tandon. What Happened? Leveraging VerbNet to Predict the E ects of Actions in Procedural Text. arXiv:1804.05435, 2018. [CGO+17] Margaret Cychosz, Andrew S. Gordon, Obiageli Odimegwu, Olivia Connolly, Jenna Bellassai, and Melissa Roemmele. E ective scenario designs for free-text interactive ction. In Nuno Nunes, Ian Oakley, and Valentina Nisi, editors, Interactive Storytelling, pages 12{23. Springer International Publishing, 2017. [CKY+18] Marc-Alexandre Cote, Akos Kadar, Xingdi (Eric) Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, Wendy Tay, and Adam Trischler. TextWorld: A Learning Environment for Text-based Games. In Computer Games Workshop at ICML/IJCAI 2018, pages 1{29, June 2018. [FLD18] [GA74] [HA04] [HR16] [KS05] [LC16] [Leb87]

A. Fan, M. Lewis, and Y. Dauphin. Hierarchical Neural Story Generation. arXiv:1805.04833, 2018.

Gary Gygax and Dave Arneson. Dungeons & Dragons, 1974.

[KBT17] [LvL01] [Mee77] [Mil95]

James R. Meehan. TALE-SPIN: An interactive program that writes stories. In Proceedings of the 5th International Joint Conference on Arti cial Intelligence, pages 91{98, 1977.

George A. Miller. WordNet: a Lexical Database for English. Communications of the ACM, 38(11):39{ 41, 1995. [MKS+13] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602, 2013. [MLA+04] Brian Magerko, John E. Laird, Mazin Assanie, Alex Kerfoot, and Devvan Stokes. AI characters and directors for interactive computer games. Proceedings of the Nineteenth National Conference on Arti cial Intelligence, Sixteenth Conference on Innovative Applications of Arti cial Intelligence, pages 877{883, 2004. [MS03]

Michael Mateas and Andrew Stern. Integrating plot, character, and natural language processing in the interactive drama Facade. In Proceedings of the 1st International Conference on Technologies for Interactive Digital Storytelling and Entertainment, 2003.

Karthik Narasimhan, Tejas Kulkarni, and Regina Barzilay. Language Understanding for Text-based Games Using Deep Reinforcement Learning. In EMNLP, page 10, 2015.

OpenAI. OpenAI DOTA 2 1v1 bot, 2017.

Julie Porteous and Marc Cavazza. Controlling narrative generation with planning trajectories: the role of constraints. In Proceedings of the 2nd International Conference on Interactive Digital Storytelling, pages 234{245, 2009.

Karl Pichotta and Raymond J Mooney. Learning Statistical Scripts with LSTM Recurrent Neural Networks. In Proceedings of the Thirtieth AAAI Conference on Arti cial Intelligence, pages 2800{ 2806, 2016.

Mark O. Riedl and Vadim Bulitko. Interactive narrative: An intelligent systems approach. AI Magazine, 34(1):67{77, Spring 2013.

Melissa Roemmele and Andrew S. Gordon. Creative help: A story writing assistant. In Proceedings of the Eighth International Conference on Interactive Digital Storytelling, 2015. Mark O. Riedl and R. Michael Young. Narrative planning: Balancing plot and character. Journal of Arti cial Intelligence Research, 39:217{268, 2010.

Marie-Laure Ryan. Fiction, non-factuals, and the principle of minimal departure. Poetics, 9(4):403{ 422, 1980.

Reid Swanson and Andrew S. Gordon. Say Anything: Using Textual Case-Based Reasoning to Enable Open-Domain Interactive Storytelling. ACM Transactions on Interactive Intelligent Systems, 2(3):1{35, 2012. [SVL14] [YCS+18]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104{3112, 2014.

[GAG+17] Jonas

Gehring

Michael

Auli , David Grangier,

Denis

Yarats , and Yann

Dauphin . Convolutional Sequence to Sequence Learning . arXiv:1705.03122 , 2017 .

[GHLR15]

Matthew

Guzdial , Brent Harrison,

Boyang

Li , and

Mark

Riedl . Crowdsourcing open interactive narrative . In 10th International Conference on the Foundations of Digital Games (FDG 2015 ), 2015 .

Brian

Hlubocky and

Eyal

Amir . Knowledge-gathering agents in adventure games . In AAAI-04 workshop on Challenges in Game AI , 2004 .

Brent

Harrsion and

Mark O

Riedl . Learning from stories: Using crowdsourced narratives to train virtual agents . In Proceedings of the 2016 AAAI Conference on Arti cial Intelligence and Interactive Digital Entertainment , 2016 .

[HZMM18] Matan

Haroush

, Tom Zahavy, Daniel J. Mankowitz, and

Shie

Mannor . Learning How Not to Act in Text-Based Games . In Workshop Track at ICLR 2018 , pages 1{4 , 2018 .

Ahmed

Khalifa , Gabriella AB Barros, and Julian Togelius. Deeptingle. arXiv:1705.03557 , 2017 .

[KKKR17]

Bartosz

Kostka , Jaroslaw Kwiecieli, Jakub Kowalski, and

Pawel

Rychlikowski . Text-based adventures of the Golovin AI agent . 2017 IEEE Conference on Computational Intelligence and Games , CIG 2017 , pages 181 { 188 , 2017 .

Karen

Kipper-Schuler. VerbNet: A Broad-Coverage , Comprehensive Verb Lexicon . PhD thesis , University of Pennsylvania, 2005 .

Guillaume

Lample and Devendra Singh Chaplot. Playing FPS games with deep reinforcement learning . CoRR, abs/1609.05521 , 2016 .

Michael

Lebowitz . Planning stories . In Proceedings of the 9th Annual Conference of the Cognitive Science Society , pages 234 { 242 , 1987 .

[LLUJR13]

Boyang

Li ,

Stephen

Lee-Urban ,

George

Johnston , and

Mark O.

Riedl . Story generation with crowdsourced plot graphs . In Proceedings of the 27th AAAI Conference on Arti cial Intelligence , Bellevue, Washington, July 2013 .

AI Magazine , 22 ( 2 ): 15 { 25 , 2001 .

[MAW+18] Lara

Martin , Prithviraj Ammanabrolu ,

Xinyu

Wang ,

William

Hancock , Shruti Singh,

Brent

Harrison , and

Mark O.

Riedl . Event Representations for Automated Story Generation with Deep Neural Nets . In Thirty-Second AAAI Conference on Arti cial Intelligence (AAAI-18) , pages 868 { 875 , New

Orleans

, Louisiana, 2018 .

[MMR+09] Brian

Magerko

, Waleed Manzoul, Mark Riedl, Allan Baumer, Daniel Fuller, Kurt Luther, and

Celia

Pearce . An empirical study of cognition and theatrical improvisation . In Proceedings of the Seventh ACM Conference on Creativity and Cognition , pages 117 { 126 , New York, NY, USA, 2009 . ACM.

Stephen

Ware and

R. Michael

Young . CPOCL: A narrative planner supporting con ict . In Proceedings of the 7th AAAI Conference on Arti cial Intelligence and Interactive Digital Entertainment , 2011 .

Xingdi

Yuan , Marc-Alexandre

^ote, Alessandro Sordoni, Romain Laroche, Remi Tachet Des Combes, Matthew Hausknecht, and

Adam

Trischler . Counting to Explore and Generalize in Text-based Games . arXiv: 1806 .11525, 2018 .