<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dungeons and DQNs: Toward Reinforcement Learning Agents that Play Tabletop Roleplaying Games</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>School of Interactive Computing, Georgia Institute of Technology</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Copyright c by L. J. Martin, S. Sood, M. O. Riedl. Copying permitted for private and academic purposes.</string-name>
          <email>riedl@cc.gatech.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Atlanta</institution>
          ,
          <addr-line>GA 30332</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>In: H. Wu, M. Si, A. Jhala (eds.): Proceedings of the Joint Workshop on Intelligent Narrative Technologies and Workshop on, Intelligent Cinematography and Editing</institution>
          ,
          <addr-line>Edmonton</addr-line>
          ,
          <country country="CA">Canada</country>
          ,
          <addr-line>11-2018, published at http://ceur-ws.org</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Game playing has been an important testbed for arti cial intelligence. Board games, rst-person shooters, and real-time strategy games have well-de ned win conditions and rely on strong feedback from a simulated environment. Text adventures require natural language understanding to progress through the game but still have an underlying simulated environment. In this paper, we propose tabletop roleplaying games as a challenge due to an in nite action space, multiple (collaborative) players and models of the world, and no explicit reward signal. We present an approach for reinforcement learning agents that can play tabletop roleplaying games.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>those actions serve the purpose of solving puzzles to unlock the story. In TRPGs, even though the players can
choose actions, none of the players know exactly what will happen in response to those actions and should adjust
accordingly. Even the Game Master may encounter valid player actions that are unexpected, and they must
decide how aspects of the world that are not controlled by the players will respond. Most signi cantly, no single
player or system|including the Game Master|possesses a ground-truth understanding of the complete state of
the world.</p>
      <p>In a game like D&amp;D, actions cannot be cleanly mapped to states. Instead, players need to maintain a general
model of the world that can be exibly altered as the story progresses. Since there is no shared simulation
engine that maintains a ground-truth state of the world, there is no way for players to receive feedback about
the consequences of their actions except for intrinsic motivation. This means that an AI player would need a
set of commonsense knowledge and procedures so that it can act in a reasonable manner. The AI should know
what can physically and temporally happen in the world (e.g. if I leave the lightsaber here, it will stay here until
someone picks it up again); what social and cultural norms it should follow (e.g. greet people when you meet
them); and what tropes the genre normally follows (e.g. fairies are found in forests).</p>
      <p>Action selection in TRPGs can be further complicated by the fact that there is no well-de ned win condition.
TRPGs are usually set up with scenarios called campaigns where there are short-term objectives (such as quests)
to complete, but even those might not be clearly de ned. In D&amp;D, characters may die, and \hit points"
(numerical indication of health) can be thought of as an indicator of success in combat, but there are no clear
signals of success or progress in non-combat portions (the majority) of the game. This makes it especially hard
for an AI player to know whether it is acting appropriately (i.e. there is no explicit reward signal).</p>
      <p>D&amp;D is also largely collaborative, which is unusual for a game with multiple players. Collaboration in a game
means that not only does the agent need to understand what their fellow players are trying to do but be able to
work toward a joint goal which might not be explicit. The agent should not be just ful lling its own agenda.</p>
      <p>In this paper, we propose an approach to creating a TRPG player. Since this is an expansive challenge for
the current state of AI, we will focus on the improvisational nature of action selection in the context of a quest.
We have made the following simplifying assumptions in order to initially make the challenge more tractable.
(1) We do not consider combat or actions that are constrained by numerical values such as strength or health.
(2) We also assume that the agent is always \in character" and thus does not interact with other players in
extra-dietetic ways (e.g., out of character conversations to plan out actions). (3) If another player is a GM, we
only consider descriptions of events that occur, but not refereeing communications. The important aspects that
we're still maintaining are collaboration, improvisation, and keeping track of and maintaining a consistent world.
The world is represented as a set of rules acting on the current state, informed by a sense of genre.</p>
      <p>In the remainder of the paper, we relate TRPG playing to interactive ction, interactive storytelling, and story
generation. We put forth a proposal for using a form of reinforcement learning|Deep Q Networks (DQNs)|to
meet the criteria above for the portions of TRPGs we focus on.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background and Related Work</title>
      <p>Interactive Fiction (IF) has been around long before the creation of the personal computer, found in the form
of choose-your-own-adventure books. These stories enabled the user to not only experience the narrative but to
have input into what events will take place. Computerized IF provided users with exibility of input; giving the
player natural-language commands provided a greater sense of agency. Early computer IF, such as Adventure
and Zork, had a command language in the form of hverbi hnoun phrase (NP)i, which can include a prepositional
phrase and/or adjectives. For example, \enter house" or \give book to woman". Patterns like this can be
resolved by simple grammars, but the language that these systems permitted was very limited. Early IF-playing
systems avoided dealing with language all together, having the system work only with propositional logic for
example [HA04]. Recent work has focused on neural networks and reinforcement learning|in particular deep
Q networks (DQNs)|to play IF [NKB15, HZMM18, YCS+18]. IF has become an increasingly common testbed
for AI research, especially with the introduction of toolkits [CKY+18]. We believe that DQNs can also be used
to play D&amp;D; while super cially the same, major di erences in what the agent can know make this a distinct
challenge.</p>
      <p>The eld of interactive narrative concerns itself with the creation of digital interactive experiences in which
users create or in uence a dramatic storyline through actions, either by assuming the role of a character in a
ctional virtual world, issuing commands to computer-controlled characters, or directly manipulating the ctional
world state [RB13]. Interactive narratives sometimes make use of an Experience Manager|also called a Drama
Manager|an intelligent, omniscient, and disembodied agent that monitors the virtual world and intervenes to
drive the narrative forward according to some model for quality of experience. An experience manager progresses
the narrative by intervening in the ctional world, typically by directing computer-controlled characters in how
to respond to the user's actions. Riedl and Bultiko [RB13] give a high-level overview of some of the techniques
that have been attempted. Reinforcement-learning{based approaches to drama management include [BRN+07]
and [HR16].</p>
      <p>Interactive narratives share a lot of similarities with TRPGs. However, players do not describe their actions
in natural language but use point-and-click action interfaces to interact with the world. In some instances, the
player can engage in dialogue with NPCs through unconstrained natural language [MS03]. Nonetheless, NPCs
in interactive narratives are constrained to a xed and pre-speci ed repertoire of actions and dialogue. In this
paper we focus on the opposite problem of AI agents that play TRPGs, and to make the problem more tractable,
there is no external evaluator of actions nor Experience Manager.</p>
      <p>Contrasting IF playing and drama management, Interactive Fiction generation systems use pre-existing
resources to develop dynamic IF that adapts to the player's choices. Systems like Scheherazade-IF [GHLR15] and
DINE [CGO+17] were strongly in uenced by automated story generation, giving the user control again, whereas
a traditional IF simply has the user discover the preexisting story. Playing a TRPG shares many of the same
challenges as being able to automatically generate a story; both story generation and TRPG playing require an
agent to select what a character will do next. Automated story generation has a long history of using planning
systems [Mee77, Leb87, CCM02, PC09, RY10, WY11] that work in well-de ned domains. Recently, machine
learning has been used to build story generation systems that automatically acquire knowledge about domains
and how to tell stories from natural language corpora [LLUJR13, SG12, RG15, KBT17, GAG+17, MAW+18].
Our approach draws heavily from neural-network{based approaches.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Approach</title>
      <p>Similar to text adventure games, a TRPG's game state is hidden. However, what makes TRPGs di erent from
text adventure games is the lack of a shared game engine to maintain a ground-truth state of the ctional world
and to provide a xed set of allowable actions. That is, the \game engine" is largely in the heads of the players
and each player may have a di erent understanding of the world state. This makes playing TRPGs more akin
to improvisational theater acting [MMR+09]. While the Game Master may be considered the maintainer of
ground-truth state and an arbiter of what can and cannot be done in the ctional world, the GM's belief about
the state of the world is just one of many and refereeing is mostly restricted to combat and other formulaic parts
of the game. Still, one may assume that, just as with the real world, the ctional world does have some rules and
conventions, some of which may be explicit, others can be implied. Marie-Laure Ryan named this implication
the principle of minimal departure, which says that, unless stated otherwise, we are to assume that a ctional
world matches our actual world as closely as possible [Rya80]. This means that the ctional world that our agent
operates in should have as many similarities to our actual world as we can give it.</p>
      <p>This poses a problem though; how can the agent acquire models of the explicit and implicit rules of the
ctional world? A standard technique in machine learning is to train a model on a corpus of relevant data. In
our case, the most relevant data from which to learn a model is likely to be stories from the particular genre
of ctional world our agent will be inhabiting. While it is possible to learn a model of likely event sequences
(i.e. machine-learned story generation models [RG15, MAW+18, KBT17, FLD18]), recurrent neural networks
maintain state as hidden neural network layers, which are limited in the length of their memory and do not
explicitly capture the underlying reason why certain events are preceded by others. This is essential because
the other, human players may make choices that are very di erent from sequences in a training corpus|what
are referred to as \out of distribution"|and are capable of remembering events and state information for long
periods of time. Because of the principle of minimal departure, story generation models also fail to capture
details that we take for granted in our own lives|details that are too mundane to mention in stories, such as
the a ordances of objects. For example, the system would be unable to understand why a cow can be hurt but
a cot can't no matter how much weight you put on it.</p>
      <p>Our proposal has two parts. First, we propose a method for acquiring models of the ctional world by blending
commonsense, overarching rules about the real world with automated methods that can extract relevant genre
information from stories. Second, we propose a reinforcement learning technique based on Deep Q Networks
that can learn to use these models to interact with human TRPG players. Our proposed agent works as follows.
It rst converts any human player's declaration of action|a natural language sentence|into an event, which
is an abstract sentence representation that is easier for AI systems to work with. We will describe the event
representation in Section 3.1. This event is used to update the agent's belief about the state of the ctional
world. Once the state is updated, the agent takes its turn, selecting a new event using the deep reinforcement
learner. The state is updated again and the agent's event is convert back into natural language so that the
human player can read what the agent did. This pipeline can be seen in Figure 1(a).</p>
      <p>The training method is shown in Figure 1(b). While the DQN is exploring during training, the previous event
in the story is passed into a Sequence-to-Sequence LSTM [SVL14] that is trained on data from the genre we
selected. The Seq2Seq network generates a distribution over possible subsequent events according to our model of
genre expectations. A set of rules lters the list of events, keeping only events that could occur given the current
state of the game. The agent chooses to exploit its policy model or explore randomly, and once a valid event
is picked, the state is updated. Because we have a rule model, we can conduct multi-step lookahead, wherein
the agent explores several steps into the future before using the reward to update the policy. Each event that is
picked should bring the agent closer to its goal for the campaign. The goal in this case is a genre-appropriate
pre-de ned event that we select.
3.1</p>
      <sec id="sec-3-1">
        <title>The Agent's Model of the Fictional World</title>
        <p>In this section we describe the two-part model of the ctional world that the agent has access to in order to
select actions and update its understanding of the current state of the game world.
3.1.1</p>
      </sec>
      <sec id="sec-3-2">
        <title>The Genre Expectation Model</title>
        <p>Given a corpus of stories from a genre related to the ctional world that the agent will inhabit, we train a genre
expectation model. This model provides the probability, according to the genre-speci c corpus, that certain
events will happen after other events. Speci cally, we model genre expectations as a sequence-to-sequence
LSTM [SVL14], a type of recurrent neural network. However, instead of training on natural language sentences,
we rst convert sentences to events.</p>
        <p>An event is a tuple hs; v; o; n; pi where v is the verb of the sentence, s is the subject of the verb, o is the direct
object of the verb, n is the noun of a propositional phrase (or indirect object, causal complement, or any other
signi cant word), and p is the preposition. Martin et al. [MAW+18] found that representations similar to this
assist with the accuracy of models trained on stories. In this work, we add the p element, which Pichotta and
Mooney [PM16] also found to be helpful for event representations. These words are extracted from the original
sentence, stemmed, and then generalized or looked up in their respective databases. Verbs are abstracted using
VerbNet [KS05], which provides a category{or class{for each verb. All other words are queried in a hierarchical
lexicon called WordNet [Mil95], and the generalized word/Synset is taken from two levels up in the hierarchy (i.e.
the grandparent). Adding the preposition to the event enables us to make closer comparisons between possible
syntactic constructions within a VerbNet class since we would not have access to the original sentence's syntax.</p>
        <p>The story corpus for the genre is pre-processed such that each clause is transformed into an event. Using
these events, we train the Seq2Seq model to predict the probability of an event given a previous event. This
becomes the agent's genre expectation model, giving the agent a pool of likely actions to choose from during its
turn. However, it does not not guarantee that they are valid or logical actions though, nor would it be able to
keep track of long-term dependencies.
3.1.2</p>
      </sec>
      <sec id="sec-3-3">
        <title>Commonsense Rules Model</title>
        <p>To help the agent with selecting appropriate events/actions, we acquire a second model of general, commonsense
rules about the real world. The purpose of this model is to (a) prune out candidate events that would not work
for the current state of the game, and (b) allow the agent to do lookahead planning to determine how current
actions might a ect future world states.</p>
        <p>The rules are acquired from a set of semantic facts we get from VerbNet. In VerbNet, each verb class has a
set of frames. Each frame is determined by a grammatical construction that this verb can be found in. Within
a frame, the syntax is listed, along with a set of semantics. The semantics specify what roles/entities are doing
what in the form of predicates. For example, VerbNet would tell us that the sentence \Lily screwed the handle
to the drawer" yields the following predicates:</p>
        <sec id="sec-3-3-1">
          <title>CAUSE(Agent,Event)</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>TOGETHER(end(Event), Patient, Co-Patient)</title>
        </sec>
        <sec id="sec-3-3-3">
          <title>ATTACHED(end(Event),Patient, Instrument)</title>
        </sec>
        <sec id="sec-3-3-4">
          <title>ATTACHED(end(Event),Co-Patient, Instrument) where Lily is the Agent, the handle is the Patient, the drawer is the Co-Patient, and screw is the Instrument. In other words: Lily caused the event, and at the end of the event, the screw attached the drawer and the handle together.</title>
          <p>Based on the principle of minimal departure, our agent assumes that when an event occurs, the frame's
predicates hold, acting as the agent's knowledge about the actual world. This is reasonable because the frame
semantics are relatively high-level and can occur in a variety of scenarios. Whereas the state of the genre
expectation model is latent, we can use the facts generated by applying commonsense rules to maintain explicit
beliefs about the world that persist until new facts replace them. That is, the drawer and handle will remain
attached until such time that another verb class indicates that they are no longer attached. This is important
because the agent's belief statement won't be tied to a limited, probabilistic window of history maintained by
the genre expectation model.</p>
          <p>However, the predicates currently provided by VerbNet frames are insu cient for our purposes. We augment
VerbNet by breaking down predicates that required more detail. All of the predicates are either considered
\core predicates"|where they cannot be broken down further|or are given other existing predicates to form
preconditions and post-conditions. Preconditions are conditions that must be true in the world prior to the verb
frame being enacted. Post-conditions|or e ects|are facts about the world that hold after a verb frame has
nished being enacted. This information would not be learned by a recurrent neural network.</p>
          <p>We use the preconditions to lter out any actions proposed by the genre expectation model that are not
consistent with the current state of the world. Once an action is selected, we use the post-conditions to update
the agent's belief state about the world. Magerko et al. made use of pre- and post-conditions for actions so
that the individual agents separate from the Game Manager in their game kept the story consistent [MLA+04].
Similarly, Clark et al. [CDT18] broke VerbNet semantics into pre- and post-conditions.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Reinforcement Learning for TRPGs</title>
        <p>Reinforcement learning (RL) is a technique whereby an agent learns a policy, mapping states to actions that
maximize expected future reward. A reward function gives a value indicating how good or bad di erent states are,
and it can be sparse|meaning that a reward is not provided very often. RL agents learn the policy incrementally
by trial and error, attempting to nd correlations between states si and future expected reward, Q(si). Deep
Q networks learn to approximate the expected future reward for states with a neural network. Deep Q network
agents have been shown to be able to play complicated games such as Atari video games [MKS+13] and text
adventures with limited action spaces [NKB15, HZMM18, YCS+18].</p>
        <p>Our event representation turns an in nite number of actions (any natural language sentence) into a large, but
nite action space. Still, we cannot perform exhaustive trial and error learning while training a deep Q network
to play D&amp;D. The genre expectation model provides a beam of highly probably events that are consistent with
the genre. The commonsense rule model lters events and also allows acts as a transition function, helping the
agent to search through possible future states for those that return the highest reward, which will, in turn, help
the agent converge faster in situations where the reward (e.g. reaching a particular state) is sparse. Details on
the ow of information through the DQN can be seen in Figure 1(b).
[CCM02]</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Future Work</title>
      <p>One of the outstanding limitations of our current proposal is the reliance on a reward function. For the near
future, rewards are based on quest completion, although that is only one aspect of the tabletop roleplaying game
experience. Quest completion is a sparse reward, which is one of the reasons why the commonsense rules will be
useful in allowing the agent to lookahead since most states will not provide any reward signal. In the future, we
will need to identify or learn more complete reward functions.</p>
      <p>Future versions of the system can be created to learn what rules are broken by the user and remain consistent
with them. This will require the ability for the agent to identify broken rules and then remove them from its
processing of potential actions to take. It might also import other genre models. For example, if the user has
raised Vinay from the dead, the agent now knows that after a character dies, they are no longer just removed
from the story but can be reanimated. It might also integrate a horror genre that includes zombies. For now,
we will start with a strict set of rules that the agent must obey when it is playing the game, and the agent will
work within one genre at a time.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>As game-playing AI research progresses, we argue that TRPGs like D&amp;D are an appropriate next challenge.
TRPGs are unique from other games in that they have an in nite selection of actions, have a partially-visible
world, contain hidden states, use intrinsic reward, do not have explicit progress markers, and are cooperative.
We outlined a subproblem of TRPGs which focuses less on character stats, which we already know computers
to handle well, but also simpli es the problem slightly by eliminating the refereeing of rules. TRPGs are unlike
text adventure games in that the players have more agency in a ecting the story, but it is also unlike Drama
Management where the system gives the player some control over the story but still has the nal say. TRPG
players are more similar to collaborative automated story generators in this way.</p>
      <p>To create an AI that plays this modi ed TRPG, we proposed that the agent has a model of the world that is
a combination of rules about our actual world and a concept of what events usually occur within similar ctional
worlds. The agent is then trained to use the model through deep Q-learning, which has been successful in playing
games. By sharing our plans for our TRPG player, we hope that we inspire other AI researchers to look into
this unique space of games.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by DARPA W911NF-15-C-0246. The views, opinions, and/or conclusions contained in
this paper are those of the authors and should not be interpreted as representing the o cial views or policies,
either expressed or implied of the DARPA or the DoD.
[BRN+07] Sooraj Bhat, David L. Roberts, Mark Nelson, Charles Isbell, and Michael Mateas. A globally optimal
algorithm for TTD-MDPs. In Proceedings of the 6th International Joint Conference on Autonomous
Agents and Multiagent Systems, 2007.</p>
      <p>M. Cavazza, F. Charles, and S. Mead. Planning characters' behaviour in interactive storytelling.
Journal of Visualization and Computer Animation, 13:121{131, 2002.</p>
      <p>Peter Clark, Bhavana Dalvi, and Niket Tandon. What Happened? Leveraging VerbNet to Predict
the E ects of Actions in Procedural Text. arXiv:1804.05435, 2018.
[CGO+17] Margaret Cychosz, Andrew S. Gordon, Obiageli Odimegwu, Olivia Connolly, Jenna Bellassai, and
Melissa Roemmele. E ective scenario designs for free-text interactive ction. In Nuno Nunes, Ian
Oakley, and Valentina Nisi, editors, Interactive Storytelling, pages 12{23. Springer International
Publishing, 2017.
[CKY+18] Marc-Alexandre Cote, Akos Kadar, Xingdi (Eric) Yuan, Ben Kybartas, Tavian Barnes, Emery
Fine, James Moore, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, Wendy Tay, and Adam
Trischler. TextWorld: A Learning Environment for Text-based Games. In Computer Games
Workshop at ICML/IJCAI 2018, pages 1{29, June 2018.
[FLD18]
[GA74]
[HA04]
[HR16]
[KS05]
[LC16]
[Leb87]</p>
      <p>A. Fan, M. Lewis, and Y. Dauphin. Hierarchical Neural Story Generation. arXiv:1805.04833, 2018.</p>
      <sec id="sec-6-1">
        <title>Gary Gygax and Dave Arneson. Dungeons &amp; Dragons, 1974.</title>
        <p>[KBT17]
[LvL01]
[Mee77]
[Mil95]</p>
        <p>James R. Meehan. TALE-SPIN: An interactive program that writes stories. In Proceedings of the
5th International Joint Conference on Arti cial Intelligence, pages 91{98, 1977.</p>
        <p>George A. Miller. WordNet: a Lexical Database for English. Communications of the ACM, 38(11):39{
41, 1995.
[MKS+13] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan
Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602,
2013.
[MLA+04] Brian Magerko, John E. Laird, Mazin Assanie, Alex Kerfoot, and Devvan Stokes. AI characters
and directors for interactive computer games. Proceedings of the Nineteenth National Conference
on Arti cial Intelligence, Sixteenth Conference on Innovative Applications of Arti cial Intelligence,
pages 877{883, 2004.
[MS03]</p>
        <p>Michael Mateas and Andrew Stern. Integrating plot, character, and natural language processing in
the interactive drama Facade. In Proceedings of the 1st International Conference on Technologies
for Interactive Digital Storytelling and Entertainment, 2003.</p>
        <p>Karthik Narasimhan, Tejas Kulkarni, and Regina Barzilay. Language Understanding for Text-based
Games Using Deep Reinforcement Learning. In EMNLP, page 10, 2015.</p>
      </sec>
      <sec id="sec-6-2">
        <title>OpenAI. OpenAI DOTA 2 1v1 bot, 2017.</title>
        <p>Julie Porteous and Marc Cavazza. Controlling narrative generation with planning trajectories: the
role of constraints. In Proceedings of the 2nd International Conference on Interactive Digital
Storytelling, pages 234{245, 2009.</p>
        <p>Karl Pichotta and Raymond J Mooney. Learning Statistical Scripts with LSTM Recurrent Neural
Networks. In Proceedings of the Thirtieth AAAI Conference on Arti cial Intelligence, pages 2800{
2806, 2016.</p>
        <p>Mark O. Riedl and Vadim Bulitko. Interactive narrative: An intelligent systems approach. AI
Magazine, 34(1):67{77, Spring 2013.</p>
        <p>Melissa Roemmele and Andrew S. Gordon. Creative help: A story writing assistant. In Proceedings
of the Eighth International Conference on Interactive Digital Storytelling, 2015.
Mark O. Riedl and R. Michael Young. Narrative planning: Balancing plot and character. Journal
of Arti cial Intelligence Research, 39:217{268, 2010.</p>
        <p>Marie-Laure Ryan. Fiction, non-factuals, and the principle of minimal departure. Poetics, 9(4):403{
422, 1980.</p>
        <p>Reid Swanson and Andrew S. Gordon. Say Anything: Using Textual Case-Based Reasoning to
Enable Open-Domain Interactive Storytelling. ACM Transactions on Interactive Intelligent Systems,
2(3):1{35, 2012.
[SVL14]
[YCS+18]</p>
        <p>Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks.
In Advances in neural information processing systems, pages 3104{3112, 2014.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [GAG+17]
          <string-name>
            <surname>Jonas</surname>
            <given-names>Gehring</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Auli</surname>
          </string-name>
          , David Grangier,
          <string-name>
            <given-names>Denis</given-names>
            <surname>Yarats</surname>
          </string-name>
          , and
          <string-name>
            <surname>Yann</surname>
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Dauphin</surname>
          </string-name>
          .
          <article-title>Convolutional Sequence to Sequence Learning</article-title>
          .
          <source>arXiv:1705.03122</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [GHLR15]
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Guzdial</surname>
          </string-name>
          , Brent Harrison,
          <string-name>
            <given-names>Boyang</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <article-title>Crowdsourcing open interactive narrative</article-title>
          .
          <source>In 10th International Conference on the Foundations of Digital Games (FDG</source>
          <year>2015</year>
          ),
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Brian</given-names>
            <surname>Hlubocky</surname>
          </string-name>
          and
          <string-name>
            <given-names>Eyal</given-names>
            <surname>Amir</surname>
          </string-name>
          .
          <article-title>Knowledge-gathering agents in adventure games</article-title>
          .
          <source>In AAAI-04 workshop on Challenges in Game AI</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Brent</given-names>
            <surname>Harrsion</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mark O</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <article-title>Learning from stories: Using crowdsourced narratives to train virtual agents</article-title>
          .
          <source>In Proceedings of the 2016 AAAI Conference on Arti cial Intelligence and Interactive Digital Entertainment</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>[HZMM18] Matan</surname>
            <given-names>Haroush</given-names>
          </string-name>
          , Tom Zahavy, Daniel J. Mankowitz, and
          <string-name>
            <given-names>Shie</given-names>
            <surname>Mannor</surname>
          </string-name>
          .
          <article-title>Learning How Not to Act in Text-Based Games</article-title>
          .
          <source>In Workshop Track at ICLR</source>
          <year>2018</year>
          , pages
          <issue>1{4</issue>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Ahmed</given-names>
            <surname>Khalifa</surname>
          </string-name>
          ,
          <source>Gabriella AB Barros, and Julian Togelius. Deeptingle. arXiv:1705.03557</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [KKKR17]
          <string-name>
            <given-names>Bartosz</given-names>
            <surname>Kostka</surname>
          </string-name>
          , Jaroslaw Kwiecieli, Jakub Kowalski, and
          <string-name>
            <given-names>Pawel</given-names>
            <surname>Rychlikowski</surname>
          </string-name>
          .
          <article-title>Text-based adventures of the Golovin AI agent</article-title>
          .
          <source>2017 IEEE Conference on Computational Intelligence and Games</source>
          ,
          <string-name>
            <surname>CIG</surname>
          </string-name>
          <year>2017</year>
          , pages
          <fpage>181</fpage>
          {
          <fpage>188</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Karen</given-names>
            <surname>Kipper-Schuler. VerbNet: A Broad-Coverage</surname>
          </string-name>
          ,
          <article-title>Comprehensive Verb Lexicon</article-title>
          .
          <source>PhD thesis</source>
          , University of Pennsylvania,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Guillaume</given-names>
            <surname>Lample</surname>
          </string-name>
          and
          <article-title>Devendra Singh Chaplot. Playing FPS games with deep reinforcement learning</article-title>
          .
          <source>CoRR, abs/1609.05521</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>Lebowitz</surname>
          </string-name>
          .
          <article-title>Planning stories</article-title>
          .
          <source>In Proceedings of the 9th Annual Conference of the Cognitive Science Society</source>
          , pages
          <fpage>234</fpage>
          {
          <fpage>242</fpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [LLUJR13]
          <string-name>
            <given-names>Boyang</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Lee-Urban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>George</given-names>
            <surname>Johnston</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Mark O.</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <article-title>Story generation with crowdsourced plot graphs</article-title>
          .
          <source>In Proceedings of the 27th AAAI Conference on Arti cial Intelligence</source>
          , Bellevue, Washington,
          <year>July 2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>AI Magazine</source>
          ,
          <volume>22</volume>
          (
          <issue>2</issue>
          ):
          <volume>15</volume>
          {
          <fpage>25</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [MAW+18]
          <string-name>
            <surname>Lara</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>Prithviraj Ammanabrolu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Xinyu</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>William</given-names>
            <surname>Hancock</surname>
          </string-name>
          , Shruti Singh,
          <string-name>
            <given-names>Brent</given-names>
            <surname>Harrison</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Mark O.</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <article-title>Event Representations for Automated Story Generation with Deep Neural Nets</article-title>
          .
          <source>In Thirty-Second AAAI Conference on Arti cial Intelligence (AAAI-18)</source>
          , pages
          <fpage>868</fpage>
          {
          <fpage>875</fpage>
          ,
          <string-name>
            <surname>New</surname>
            <given-names>Orleans</given-names>
          </string-name>
          , Louisiana,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [MMR+09]
          <string-name>
            <surname>Brian</surname>
            <given-names>Magerko</given-names>
          </string-name>
          , Waleed Manzoul, Mark Riedl, Allan Baumer, Daniel Fuller, Kurt Luther, and
          <string-name>
            <given-names>Celia</given-names>
            <surname>Pearce</surname>
          </string-name>
          .
          <article-title>An empirical study of cognition and theatrical improvisation</article-title>
          .
          <source>In Proceedings of the Seventh ACM Conference on Creativity and Cognition</source>
          , pages
          <volume>117</volume>
          {
          <fpage>126</fpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Ware</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. Michael</given-names>
            <surname>Young</surname>
          </string-name>
          .
          <article-title>CPOCL: A narrative planner supporting con ict</article-title>
          .
          <source>In Proceedings of the 7th AAAI Conference on Arti cial Intelligence and Interactive Digital Entertainment</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Xingdi</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Marc-Alexandre</surname>
            <given-names>C</given-names>
          </string-name>
          ^ote, Alessandro Sordoni, Romain Laroche, Remi Tachet Des Combes, Matthew Hausknecht, and
          <string-name>
            <given-names>Adam</given-names>
            <surname>Trischler</surname>
          </string-name>
          .
          <article-title>Counting to Explore and Generalize in Text-based Games</article-title>
          . arXiv:
          <year>1806</year>
          .11525,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>