=Paper=
{{Paper
|id=Vol-2282/EXAG_122
|storemode=property
|title=Learning to Generate Natural Language Rationales for Game Playing Agents
|pdfUrl=https://ceur-ws.org/Vol-2282/EXAG_122.pdf
|volume=Vol-2282
|authors=Upol Ehsan,Pradyumna Tambwekar,Larry Chan,Brent Harrison,Mark O. Riedl
|dblpUrl=https://dblp.org/rec/conf/aiide/EhsanTCHR18
}}
==Learning to Generate Natural Language Rationales for Game Playing Agents==
Learning to Generate Natural Language Rationales for Game Playing Agents Upol Ehsan∗‡‡ , Pradyumna Tambwekar∗† , Larry Chan† , Brent Harrison‡ , and Mark O. Riedl† ‡‡ Department of Information Science, Cornell University † School of Interactive Computing, Georgia Institute of Technology ‡ Department of Computer Science, University of Kentucky Abstract explanations. This is a potentially powerful tool that could be used to create NPCs that can provide human Many computer games feature non-player character (NPC) understandable explanations for their own actions, without teammates and companions; however, playing with or against NPCs can be frustrating when they perform unexpectedly. changing the underlying decision-making algorithms. This These frustrations can be avoided if the NPC has the ability in turn could give users more confidence in NPCs and to explain its actions and motivations. When NPC behavior game playing agents and make NPCs and agents more is controlled by a black box AI system it can be hard understandable and relatable. to generate the necessary explanations. In this paper, we In the work by Ehsan et al., however, the rationale present a system that generates human-like, natural language generation model was trained using a semi-synthetic dataset explanations—called rationales—of an agent’s actions in a by developing a grammar that could generate variations game environment regardless of how the decisions are made of actual human explanations to train their machine. by a black box AI. We outline a robust data collection While their results were promising, creating the grammar and neural network training pipeline that can be used to necessary to construct the requisite training examples is a gather think-aloud data and train a rationale generation model for any similar sequential turn based decision making task. costly endeavor in terms of authorial effort. We build on this A human-subject study shows that our technique produces work by developing a pipeline to automatically acquire a believable rationales for an agent playing the game, Frogger. corpus of human explanations that can be used to train a We conclude with insights about how people perceive rationale generation model to explain the actions of NPCs automatically generated rationales. and game playing agents. In this paper, we describe our automated explanation corpus collection technique, neural rationale generation model, and present the results of a Introduction human-subjects study of human perceptions of generated Non-player characters (NPCs) are interactive, autonomous rationales in the game, Frogger. agents that play critical roles in most modern video games, and are often seen as one crucial component of an engaging Related Work player experience. As NPCs are given more autonomy Adaptive team-mate/adversary cooperation in games has to make decisions, the likelihood that they perform in often been explored through the lens of decision making [2]. an unexpected manner increases. These situations risk Researchers have looked to incorporate adaptive difficulty interrupting a player’s engagement in the game world as they in games (cf. [3, 16]) as well as build NPCs which evolve attempt to justify the reasoning behind the unexpected NPC by learning a player’s profile as ways to improve the players behavior. One method to address this side-effect of increased experience [7, 15]. What is missing from this analysis is the autonomy is to construct NPCs that have the ability to conversational engagement that comes with collaborating explain their own actions and motivations for acting. with another human player. The generation of natural language explanations for NPCs that can communicate in natural language have autonomous agents is challenging when the agent is a previously been explored using classical machine learning black-box AI, meaning that one doesn’t have access to techniques. These methods often undertake a rule based the agent’s decision-making process. Even if access were or probabilistic modeling approach. Buede et al. combine possible, the mapping between inputs and decisions could natural language processing with dynamic probabilistic be difficult for people to interpret. Work by Ehsan et al. [8] models to maximize rapport between two conversing agents showed that machine learning models can be trained to [6]. Prior work has also shown the capacity to use a provide relevant and satisfactory rationales for their actions rule-based system to create a conversational character using examples of human behavior and human-provided generator [12]. Both of these methods, however, have ∗ Denotes equal contribution. a high degree of hand-authoring involved in generating these models. Our work can generate NPCs with similar communicative capabilities with minimal hand-authoring. Figure 1: End to End Pipeline for training a system that can generate explanations. Explainable AI has attracted interest from researchers Figure 2: Players take an action and verbalize their rationale across various domains. The authors of [1] conduct a for that action. (1) After taking each action, the game comprehensive survey on burgeoning trends in explainable pauses for 10 seconds. (2) Speech-to-text transcribes the and intelligible systems research. Certain intelligible participant’s rationale for the action. (3) Participants can systems researchers look to use model-agnostic methods view their transcribed rationales near-real time and edit to add transparency to the latent technology [13, 17]. them, if needed. Other researchers use visual representations to interpret the decision-making process of a machine learning system [9]. We situate our system as an agent that unpacks the thought 1. Create a think-aloud protocol in which players provide process of a human player, if they were to play the game. natural rationales for their actions. Evaluation of explainable AI systems can be difficult because the appropriateness of an explanation is subjective. 2. Design an intuitive player experience that facilitates One approach to evaluating such systems was proposed in accurate matching of the participants’ utterances to the [5]. They presented participants with different fictionalized appropriate state in the environment. explanations for the same decision and measured perceived To train an agent to generate rationales we need data levels of justice among their participants. We adopt a similar linking game states and actions to their corresponding procedure to measure the quality of generated rationales natural language explanations. To achieve this goal, we versus alternate baseline rationales. built a modified version of Frogger in which players simultaneously play the game and also explain each of their Learning to Generate Rationales actions. The entire process is divided into three phases: (1) A We define a rationale as an explanation that justifies an guided tutorial, (2) rationale collection, and (3) transcribed action based on how a human would think. These rationales explanation review. do not reveal the true decision making process of an agent, During the guided tutorial, our interface provides but still provide insights about why an agent made a decision instruction on how to play through the game, how to provide in a form that is easy for non-experts to understand. natural language explanations, and how to review/modify Rationale generation requires translating events in the any explanations they have given. This helps ensure that game environment into natural language outputs. Our users are familiar with the interface and its use before they approach to rationale generation involves two steps: begin providing explanations. (1) collect a corpus of think-aloud data from players who During explanation collection, users play through the explained their actions in a game environment; and (2) use game while explaining their actions out loud. Figure 2 shows this corpus to train an encoder-decoder network to generate the game embedded into the explanation collection interface. plausible rationales for any action taken by an agent (see To help couple explanations with actions, the game pauses Figure 1). for 10 seconds after an action is taken. During this time, the player’s microphone automatically turns on and the Data Collection Interface player is asked to explain their most recent action while a There is no readily available dataset for the task of speech-to-text library transcribes the explanation. learning to generate explanations. Thus, we developed a Participants can view their transcribed text and edit it methodology to collect live “think-aloud” data from players if necessary. During preliminary testing, we observed that as they played through a game. This section covers the two players often repeat a move and the explanation is the objectives of our data collection endeavor: same. For ease, participants can indicate that the explanation state and action information into natural language rationales. The encoder and decoder are both recurrent neural networks (RNN) comprised of GRU cells. The decoder network uses an additional attention mechanism [11] to learn to weight the importance of different components of the input with regard to their effect on the output. To simplify the learning process, the state of the game environment is converted into a sequence of symbols where each symbol represents a type of sprite. To this, we append information concerning Frogger’s position, the most recent action taken, and the number of lives the player has left to Figure 3: Players can step-through each of their create the input representation X. On top of this network action-rationale pairs and edit if necessary. (1) Players structure, we vary the input configurations with the intention can watch a replay of their actions while editing their of producing varying styles of rationales. These two rationales. (2) Players use these buttons to control the flow configurations are titled the focused view configuration and of their step-through. (3) The rationale for the current action the complete-view configuration and are used throughout the gets highlighted for review. experiments presented in this paper. Focused-view Configuration In this configuration we used a windowed representation of the grid, i.e. only a accompanying their most recent explanation is the same as 7 × 7 window around the Frog was used in the input. Both that of the last action performed. playing an optimal game of Frogger and generating relevant During transcribed explanation review, users are given explanations based on the current action taken typically only one final opportunity to review and edit the explanations requires this much local context. Therefore providing the given during gameplay (see Figure 3). Players can step agent with only the window around Frogger helps the agent through all of the actions they performed in the game and produce explanations grounded in it’s neighborhood. In this see their accompanying transcribed explanations so they can configuration we prioritized rationales focused on short term see the game context in which their explanations were given. awareness over long term planning. The interface is designed so that no manual hand-authoring/editing of our data was required before Complete-view Configuration The complete-view pushing it into our machine learning model. Throughout configuration is an alternate setup that provides the entire the game, players were given the opportunity to organically game board as context for the rationale generation. There edit their own data without impeding their work-flow. This are two differences between this configuration and the added layer of frictionless editing was crucial in ensuring focused-view configuration. First, instead of showing the that we can directly input the collected data into the network network only a window of the game, we use the entire with zero manual cleaning. game screen as a part of the input. The agent now has the One core strength that facilitates transferability is that our opportunity to learn which other long-term factors in the pipeline is environment and domain agnostic. While we use game may influence it’s rationale. Second, we added noise Frogger as a test environment in our experiments, a similar to each game state to force the network to generalize when user experience can be designed using other turn-based learning to generate rationales and give the model equal environments with minimal effort. opportunity to consider factors from all sectors of the game screen. In this case noise was introduced by replacing input Neural Translation Model grid values with dummy values. For each grid element, there was a 20% chance that it would get replaced with a We use an encoder-decoder network to teach our network dummy value. to generate relevant natural language explanations for any given action. These kinds of networks are commonly used for machine translation tasks or dialogue generation, but Human Perception of Rationales Study their ability to understand sequential dependencies between In this section, we attempt to assess whether the the input and the output make it suitable for out task. Our rationales generated by our system outperform baselines. encoder decoder architecture is similar to that used in [8]. We further attempt to understand the underlying components The network learns how to translate the input game state that influence the difference in the perceptions of the representation X = x1 , x2 , ..., xn , comprised of the sprite generated rationales along four dimensions of human representation of the game combined with other influencing factors: confidence, human-likeness, adequate justification, factors, into an output explanation as a sequence of words and understandability. Frogger is a good candidate for our Y = y1 , y2 , ..., ym where yi is a word. The input X has experimental design of a rationale generation pipeline for a fixed size of 261 tokens encompassing the game state general sequential decision making tasks because it is a representation, lives left and the location of the frog. The simple Markovian environment; that is, the reasons for each vocabulary sizes for the encoder and the decoder are 491 and action can be easily separated, making it an ideal stepping 1104 respectively. Thus our network learns to translate game stone towards a real world environment. D1. Confidence: This rationale makes me confident in the character’s ability to perform it’s task. D2. Human-likeness: This rationale looks like it was made by a human. D3. Adequate justification: This rationale adequately justifies the action taken. D4. Understandability: This rationale helped me understand why the agent behaved as it did. Response options on the Likert scale ranged from ”strongly disagree” to ”strongly agree.” In a free-text field, they explained why the ratings they gave for a particular a set Figure 4: Screenshot from user study (setup 2) depicting of three rationales were similar or different. After answering the action taken and the rationales: P = Random, Q = these questions, they provided demographic information. Exemplary, R = Candidate Quantitative Results and Analysis We used a multi-level model to analyze both To gather the training set of game state annotations we between-subjects and within-subjects variables. deployed our data collection pipeline on Amazon Turk There were significant main effects of rationale Prime [10]. From 60 participants we collected over 2000 style (χ2 (2) = 594.80, p < .001) and dimension samples of human explanations corresponding to images of (χ2 (2) = 66.86, p < .001) on the ratings. The the game when the explanations were made. This comprised main effect of experimental group was not significant the corpus with which we trained the encoder-decoder (χ2 (1) = 0.070, p = 0.79). Figure 5 shows the rationale generation network. The parallel corpus of the average responses to each question for the two different collected game state images and natural language rationales experimental groups. Our results support our hypothesis was used to train the encoder-decoder network. Each RNN that rationales generated with the focused-view generator in the encoder and the decoder was parameterized with and the complete-view generator were judged significantly GRU cells with a hidden vector size of 256. The entire better across all dimensions than the random baseline encoder-decoder network was trained for 100 epochs. (b = 1.90, t (252) = 8.09, p < .001). Our results also show We recruited an additional 128 participants, split into two that rationales generated by the candidate techniques were experimental groups of our study through TurkPrime [10]; judged significantly lower than the exemplary rationale. Group1 (age range = 23 - 68, M = 37.4, SD = 9.92), Group The difference between the focused-view candidate 2 (age range = 24 - 59, M = 35.8, 7.67). Forty six percent rationales and exemplary rationales were significantly of our participants were women and only two countries, greater than the difference between complete-view United States and India, were reported when participants candidate rationales and exemplary rationales (p = .005). were asked which country they were from. 93% percent of Surprisingly, this was because the exemplary rationales all 128 participants reported that they resided in the United were rated lower in the presence of complete-view States. candidate rationales (t (1530) = −32.12, p < .001). Procedure Since three rationales were presented simultaneously in each video, it is likely that participants were rating the Participants watched a series of five videos, each containing rationales relative to each other. We also observe that an action taken by an agent playing Frogger. In each video, the complete-view candidate rationales received overall the action was accompanied by three rationales generated by higher ratings than the focused-view candidate rationales three different techniques (see Figure 5): (t (1530) = 8.33, p < .001). • The exemplary rationale is the rationale from our corpus In summary, we established that both the focused-view that 3 researchers unanimously agreed on as the best and complete-view configurations produce believable one for a particular action. Researchers independently rationales that perform significantly better than the random selected rationales they deemed best and iterated till baseline along four human factors dimensions. While the consensus was reached. complete-view candidate rationales were judged to be • The candidate rationale is the rationale produced by preferable overall to focused-view candidate rationales, we our network, either the focused-view or complete-view did not compare them to directly to each other because configuration. This is provided as an upper-bound for stylistically one technique may be better suited based on the contrast with the next two techniques. task and/or game. Our between-subjects study methodology are suggestive but cannot be used to prove any claims • The random rationale is a randomly chosen rationale between the two experimental conditions. from our corpus. For each rationale, participants used a 5-point Likert scale to Qualitative Analysis rate their endorsement of each of following four statements, In this section, we look at the open-ended responses which correspond to four dimensions of interest. provided by our participants to better understand the Table 1: Descriptions for the emergent components underlying the human-factor dimensions of the generated rationales. Component Description Contextual Accuracy Accurately describes pertinent events in the context of the environment. Intelligibility Typically error-free and is coherent in terms of both grammar and sentence structure. Awareness Depicts and adequate understanding of the rules of the environment. Relatability Expresses the justification of the action in a relatable manner and style. (a) Focus-View condition. Strategic Detail Exhibits strategic thinking, foresight, and planning. Confidence (D1) This dimension gauges the participant’s faith in the agent’s ability to successfully complete it’s task and has contextual accuracy, awareness, strategic detail, and intelligibility as relevant components. With respect to contextual accuracy, rationales that displayed “. . . recognition of the environmental conditions and [adaptation] to the conditions” (P22) were a positive influence on confidence ratings, while redundant information such as “just stating the obvious” (P42) hindered confidence ratings. (b) Complete-View condition. Rationales that showed awareness “. . . of upcoming dangers and what the best moves to make . . . [and] a Figure 5: Human judgment results. good way to plan” (P17) inspired confidence from the participants. In terms of strategic detail, rationales that showed ”. . . long-term planning and ability to analyze information” (P28) yielded higher confidence ratings criteria that participants used when making judgments about compared to those that were ”. . . short-sighted and unable to the confidence, human-likeness, adequate justification, and think ahead” (P14) led to lower perceptions of confidence. understandability of generated rationales. These situated Intelligibility alone, without awareness or strategic detail, insights augment our understanding of rationale generating was not enough to yield high confidence in rationales. systems, enabling us to design better ones in the future. However, rationales that were not intelligible (unintelligible) We analyzed the open-ended justifications participants or coherent had a negative impact on participants’ provided using a combination of thematic analysis [4] and confidence: grounded theory [14]. We developed codes that addressed different types of reasonings behind the ratings of the four The [random and focused-view rationales] include dimensions under investigation. Next, the research team major mischaracterizations of the environment by clustered the codes under emergent themes, which form referring to an object not present or wrong time the underlying components of the dimensions. Iterating sequence, so I had very low confidence. (P66) until consensus was reached, researchers settled on the Human-likeness (D2) Intelligibility, relatability, and most relevant five components: (1) Contextual Accuracy, strategic detail are components that influenced participants’ (2) Intelligibility, (3) Awareness, (4) Relatability, and perception of the extent to which the rationales were made (5) Strategic Detail (see Table 1). At varying degrees, by a human. Notably, intelligibility had mixed influences multiple components influence more than one dimension; on the human-likeness of the rationales depending on that is, there isn’t a mutually exclusive one-to-one what participants thought “being human” entailed. Some relationship between components and dimensions. perceived humans to be fallible and rated rationales with The remainder of this section will share our conclusions errors more humanlike because rationales “. . . with typos about how these components influence the dimensions of or spelling errors . . . seem even more likely to have been the human factors under investigation. When providing generated by a human” (P19). Conversely, some thought examples of our participants’ responses, we will refer error-free rationales must come from a human, citing that a to them using the following notation; P1 corresponds to “computer just does not have the knowledge to understand participant 1, P2 corresponds to participant 2, etc. what is going on” (P24). With respect to relatability, rationales were often Design Implications perceived as more human-like when participants felt that “it The understanding of the components and dimensions can mirrored [their] thoughts” (P49), and “. . . [layed] things out help us design better autonomous agents from a human in a way that [they] would have” (P58). Affective rationales factors perspective. These insights can also enable tweaking had high relatability because they “express human emotions of the network configuration and reverse-engineering it to including hope and doubt” (P11). maximize the likelihood of producing rationale sytles that Strategic planning had a mixed impact on human-likeness meet the needs of the task, game, or agent persona. just like intelligibility as it also depended on participants’ For instance, given the nature of the inputs, choosing a perception of critical thinking and logical planning. Some network configuration similar to the focused-view can afford participants associated “. . . critical thinking [and ability to] the generation of contextually accurate rationales. On the predict future situations” (P6) with human-likeness whereas other hand, the complete-view network configuration can others associated logical planning with non-human-like, but produce rationales with a higher degree of strategic detail computer-like rigid and algorithmic thinking process flow. that can be beneficial in contexts where detail is important, such an explainable oracle. Moreover, an in-game tutorial Adequate Justification (D3) This dimension unpacks or a companion agent can be designed using a network the extent to which participants think the rationale configuration that generates relatabile outputs to keep the adequately justifies the action taken and is influenced player entertained and engaged. by contextual accuracy, and awareness. Participants downgraded rationales containing low levels of contextual Future Work accuracy such as irrelevant details. As P11 puts it: We can extend our current work in other domains of The [random and exemplary rationales] don’t pertain to Explainable AI, exploring applications for other sequential this situation. [The Complete View] does, and is clearly decision making tasks. We also plan to deploy our rationale the best justification for the action that Frogger took generator with an collaborative NPC in an interactive because it moves him towards his end goal. game to investigate how the perception of a collaborative agent changes when players interact longitudinally (over an Beyond contextual accuracy, rationales that showcase extended period of time). This longitudinal approach can awareness of surroundings rate high on the adequate help us understand novelty effects of rationale generating justification dimension. For instance, P11 rated the random agents. Besides NPCs, our techniaues can improve teaching rationale low because it showed “no awareness of the and collaboration in games, especially around improvisation surroundings”. For the same action, P11 gave high ratings and co-creative collaboration in game-level designs for the exemplary and focused-view rationales because each Our data collection pipeline is currently designed to work made the participant “. . . believe in the character’s ability to with discrete-action games that have natural break points judge their surroundings.” where the player can be asked for explanations, making it less disruptive than continuous-time and -action games. The Understandability (D4) For this dimension, components next challenge is to extend and test our approach with more such as contextual accuracy and relatability influence continuous spaces where states aren’t as well defined and participants’ perceptions of how much the rationales rationales are harder to capture from moment-to-moment. helped them understand the motivation behind the agent’s actions. Contextually accurate rationales were found to Conclusions have a high influence with understandability. In fact, many In this paper, we explore how human justifications for expressed how the contextual accuracy, not the length of their actions in a video game can be used to train the rationale, mattered when it came to understandability. a system to generate explanations for the actions of While comparing the exemplary and focused-view rationales autonomous game-playing agents. We introduce a pipeline for understandability, P41 made a notable observation: for automatically gathering a parallel corpus of game states annotated with human explanations and show how this The [exemplary and focused-view rationale] both corpus can be used to train encoder-decoder networks. The described the activities/objects in the immediate resultant model thus translates the state of the game and vicinity of the frog. However, [exemplary] was not the action performed by the agent into natural language, as strong as [focused-view] given the frog did not which we call a rationale. The rationales generated by our have to move just because of the car in front of technique are judged better than those of a random baseline him. [Focused-view] does a better job of providing and close to matching the upper bound of human rationales. understanding of the action By enabling autonomous agents to communicate about the Participants put themselves in the agent’s shoes and motivations for their actions, we hope to provide users with evaluated the understandability of the rationales based on greater confidence in the agents while increasing perceptions how relatable they were. In essence, some asked “Are these of understanding and relatability. the same reasons I would [give] for this action?” (P43). The more relatable the rationale was, the higher it scored for understandability. References [13] Marco Tulio Ribeiro, Sameer Singh, and Carlos [1] Ashraf Abdul et al. “Trends and trajectories for Guestrin. “Why should i trust you?: Explaining the explainable, accountable and intelligible systems: An predictions of any classifier”. In: Proceedings of hci research agenda”. In: Proceedings of the 2018 the 22nd ACM SIGKDD international conference on CHI Conference on Human Factors in Computing knowledge discovery and data mining. ACM. 2016, Systems. ACM. 2018, p. 582. pp. 1135–1144. [2] Aswin Thomas Abraham and Kevin McGee. “AI [14] Anselm Strauss and Juliet Corbin. “Grounded theory for dynamic team-mate adaptation in games”. In: methodology”. In: Handbook of qualitative research Computational Intelligence and Games (CIG), 2010 17 (1994), pp. 273–85. IEEE Symposium on. IEEE. 2010, pp. 419–426. [15] Chek Tien Tan and Ho-lun Cheng. “Personality-based [3] Maria-Virginia Aponte, Guillaume Levieux, and Adaptation for Teamwork in Game Agents.” In: Stéphane Natkin. “Scaling the level of difficulty AIIDE. 2007, pp. 37–42. in single player video games”. In: International [16] Sang-Won Um, Tae-Yong Kim, and Jong-Soo Choi. Conference on Entertainment Computing. Springer. “Dynamic difficulty controlling game system”. In: 2009, pp. 24–35. IEEE Transactions on Consumer Electronics 53.2 [4] J Aronson. A pragmatic view of thematic analysis: the (2007). qualitative report, 2,(1) Spring. 1994. [17] Jason Yosinski et al. “Understanding neural networks [5] Reuben Binns et al. “’It’s Reducing a Human through deep visualization”. In: arXiv preprint Being to a Percentage’: Perceptions of Justice in arXiv:1506.06579 (2015). Algorithmic Decisions”. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM. 2018, p. 377. [6] Dennis M. Buede, Paul J. Sticha, and Elise T. Axelrad. “Conversational Non-Player Characters for Virtual Training”. In: Social, Cultural, and Behavioral Modeling. Ed. by Kevin S. Xu et al. Cham: Springer International Publishing, 2016, pp. 389–399. ISBN: 978-3-319-39931-7. [7] Silvia Coradeschi and Lars Karlsson. “A role-based decision-mechanism for teams of reactive and coordinating agents”. In: Robot Soccer World Cup. Springer. 1997, pp. 112–122. [8] Upol Ehsan et al. “Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations”. In: Proceedings of the AAAI Conference on Artificial Intelligence, Ethics, and Society. Feb. 2018. [9] Josua Krause, Adam Perer, and Kenney Ng. “Interacting with predictions: Visual inspection of black-box machine learning models”. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM. 2016, pp. 5686–5697. [10] Leib Litman, Jonathan Robinson, and Tzvi Abberbock. “TurkPrime. com: A versatile crowdsourcing data acquisition platform for the behavioral sciences”. In: Behavior research methods 49.2 (2017), pp. 433–442. [11] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. “Effective approaches to attention-based neural machine translation”. In: arXiv preprint arXiv:1508.04025 (2015). [12] Grant Pickett, Foaad Khosmood, and Allan Fowler. “Automated generation of conversational non player characters”. In: Eleventh Artificial Intelligence and Interactive Digital Entertainment Conference. Vol. 362. 2015.