Explainability via Responsibility

                                         Faraz Khadivpour and Matthew Guzdial
                        Department of Computing Science, Alberta Machine Intelligence Institute (Amii)
                                               University of Alberta, Canada
                                            {khadivpour, guzdial}@ualberta.ca


                           Abstract                                 comparisons between the input and the output of a model
                                                                    (Cortez and Embrechts 2011; 2013; Simonyan, Vedaldi,
  Procedural Content Generation via Machine Learning                and Zisserman 2013; Bach et al. 2016; Dabkowski and
  (PCGML) refers to a group of methods for creating game
  content (e.g. platformer levels, game maps, etc.) using ma-
                                                                    Gal 2017; Selvaraju et al. 2017), or analyze the output in
  chine learning models. PCGML approaches rely on black box         terms of the model’s parameters (Boz and Hillman 2000;
  models, which can be difficult to understand and debug by hu-     Garcı́a, Fernández, and Herrera 2009; Letham et al. 2015;
  man designers who do not have expert knowledge about ma-          Hara and Hayashi 2018). Alternatively, there is the strat-
  chine learning. This can be even more tricky in co-creative       egy to attempt to simplify the model (Che et al. 2015;
  systems where human designers must interact with AI agents        Tan et al. 2017; Xu et al. 2018). The major difference be-
  to generate game content. In this paper we present an ap-         tween our approach and these previous ones is that we
  proach to explainable artificial intelligence in which certain    present a method which makes it possible to explain an AI
  training instances are offered to human users as an explana-      agent’s action through a detailed inspection of what it has
  tion for the AI agent’s actions during a co-creation process.     learned during the training phase.
  We evaluate this approach by approximating its ability to pro-
  vide human users with the explanations of AI agent’s actions
                                                                       Questions we might want to ask an AI agent include “How
  and helping them to more efficiently cooperate with the AI        did you learn to do that action?” or “What did you learn
  agent.                                                            that led you to make that decision?” (Cook et al. 2019). We
                                                                    sought to develop an approach that could answer these ques-
                                                                    tions. Thus, our approach needed to find explanations for the
                       Introduction                                 AI agent’s decisions based on its training data.
In science and engineering, a black box is a component that            In this paper, we make use of the problem domain of a
cannot have its internal logic or design directly examined. In      co-creative Super Mario Bros. level design agent. We use
artificial intelligence (AI), “The black box problem” refers        this domain since XAI is critical in co-creative systems. We
to certain kinds of AI agents for which it is difficult or im-      introduce an approach to detect the training instance that
possible to naively determine how they came to a partic-            is most responsible for an AI agent’s action. We can then
ular decision (Zednik 2019). Explainable artificial intelli-        present the most responsible training instance to the human
gence (XAI) is an assembly of methods and techniques to             user as an answer to how the AI agent learned to make a par-
deal with the black box problem (Biran and Cotton 2017).            ticular decision. To evaluate this approach we compare the
Machine Learning (ML) is a subset of artificial intelligence        quality of these responsible training instances to random in-
that focuses on computer algorithms that automatically learn        stances as explanations in two experiments on existing data.
and improve through experience. (Goodfellow, Bengio, and
Courville 2016). The current state-of-the-art models in ML,                              Related Work
deep neural networks, are black box models. Intuitively, it         Our problem domain is generating explanations for a
is difficult to cooperate with an individual when you cannot        PCGML co-creative agent. Therefore we separate the prior
understand them. This is critical in co-creative systems (also      related work into three main areas: Procedural Content Gen-
called mixed-initiative systems), in which a human and an           eration via Machine Learning (PCGML), co-creative sys-
AI agent work together to produce the final output. (Yan-           tems, and Explainable Artificial Intelligence (XAI).
nakakis, Liapis, and Alexopoulos 2014).
   There is a wealth of existing methods in the field of XAI        Procedural Content Generation via Machine
(Adadi and Berrada 2018). For example, those that draw              Learning (PCGML)
Copyright c 2020 for this paper by its authors. Use permitted un-   Procedural Content Generation via Machine Learning
der Creative Commons License Attribution 4.0 International (CC      (PCGML) is a field of research focused on the creation of
BY 4.0).                                                            game content by machine learning models that have been
trained on existing game content (Summerville et al. 2018).
Super Mario Bros. level design represents the most consis-
tent area of research into PCGML. Researchers have applied
many machine learning methods such as Markov chains
(Snodgrass and Ontanón 2016), Monte-Carlo Tree Search
(MCTS) (Summerville, Philip, and Mateas 2015), Long
Short-Term Recurrent Neural Networks (LSTMs) (Sum-
merville and Mateas 2016), Autoencoders (Jain et al. 2016),
Generative Adversarial Neural Networks (GANs) (Volz et
al. 2018), and genetic algorithms through learned evaluation
functions (Dahlskog and Togelius 2014) to generate these                   Figure 1: General steps of our approach
levels. In a recent work, Khalifa et al proposed a framework
to generate game levels using Reinforcement Learning (RL),
though they did not evaluate it in Super Mario Bros. (Khalifa    interaction between user and model (Guzdial et al. 2018).
et al. 2020). We also draw on reinforcement learning for our     Ehsan et al. introduced AI rationalization, an approach
agent, however our approach differs from this prior work in      for explaining agent behavior for automated game playing
terms of focusing on explainability.                             based on how a human would explain a similar behavior
                                                                 (Ehsan et al. 2018). Zhu et al. proposed a new research area
Co-creative systems                                              of eXplainable AI for Designers (XAID) to help game de-
There are numerous prior co-creative systems for game de-        signers better utilize AI and ML in their design tasks through
sign. These approaches traditionally have not made use of        co-creation (Zhu et al. 2018).
ML, instead they rely on approaches like heuristics search,         There exist a few approaches to explain RL agent’s ac-
evolutionary algorithms, and grammars (Smith, Whitehead,         tions (Puiutta and Veith 2020). Madmul et al. presented
and Mateas 2010; Liapis, Yannakakis, and Togelius 2013;          an approach that learns structural causal models to derive
Yannakakis, Liapis, and Alexopoulos 2014; Deterding et al.       causal explanations of the behavior of model-free RL agents
2017; Baldwin et al. 2017; Charity, Khalifa, and Togelius        (Madumal et al. 2019). Kumar et al. presented a deep re-
2020). ML methods have only recently been incorporated           inforcement learning approach to control an energy storage
into co-creative game content generation. Guzdial et al. pro-    system. They visualized the learned policies of the RL agent
posed a Deep RL agent for co-creative Procedural Level           through the course of training and visualized the strategies
Generation via Machine Learning (PLGML) (Guzdial, Liao,          followed by the agent to users (Kumar 2019). Cruz et al. pro-
and Riedl 2018). In another recent work, Schrum et al. pre-      posed a memory-based explainable reinforcement learning
sented a tool for applying interactive latent variable evolu-    (MXRL) where an agent explained the reasons why some
tion to generative adversarial network models that produce       decisions were taken in certain situations using an episodic
video game levels (Schrum et al. 2020). The major differ-        memory (Cruz, Dazeley, and Vamplew 2019). In another re-
ence between our approach and previous ones is that it ex-       cent paper, an approach was presented that employs expla-
plains an AI partner’s actions based on what it learned dur-     nations as feedback from humans in a human-in-the-loop re-
ing training.                                                    inforcement learning system (Guan, Verma, and Kambham-
   It is important to note that we are not actually evaluating   pati 2020).
our approach in the context of co-creative interaction with a       To the best of our knowledge, this is the first XAI work
human subject study. We are only making use of data from         focused on the training data of a target ML model. Our ap-
prior studies in which humans interacted with ML and RL          proach differs from existing XAI work in detailed inspection
agents in co-creative systems.                                   and alteration of the training phase.

Explainable Artificial Intelligence (XAI)                                           System Overview
The majority of existing XAI approaches can be sepa-             In this paper, we present an approach for Explainable AI
rated according to which of two general methods they rely        (XAI) that aims to answer the question “What did the AI
on: (A) visualizing the learned features of a model (Er-         agent learn during training that led it to make that specific
han et al. 2009; Simonyan, Vedaldi, and Zisserman 2013;          action?”. As is shown in Figure 1, the general steps of the
Nguyen, Yosinski, and Clune 2015; 2016; Nguyen et al.            approach are as follows: First, during training a DNN, we
2017; Olah, Mordvintsev, and Schubert 2017; Weidele, Stro-       detect the training instance (or instances) that maximally al-
belt, and Martino 2019) and (B) demonstrating the relation-      ters each neuron inside the network. Secondly, during test-
ship between neurons (Zeiler and Fergus 2014; Fong and           ing, we pass each instance through the network and find the
Vedaldi 2017; Selvaraju et al. 2017). Olah et al. developed a    neuron that is most activated (Erhan, Courville, and Bengio
unified framework that included both (A) and (B) methods.        2010). Then given the information from the first step, we
(Olah et al. 2018).                                              can easily identify an instance (or instances) from the train-
   There are a few prior works focused on XAI applied to         ing data that maximally impacted the most activated neuron.
game design and game playing. Guzdial et al. presented           We refer to this as “the most responsible training instance”
an approach to Explainable PCGML via Design Patterns in          for the AI agent’s action. The intuition is that the user can
which the design patterns act as a vocabulary and mode of        take this explanation as something akin to the end goal of
the agent taking that action. Our hope is that it will be help-
ful in the user deciding whether to keep or remove some
addition by the AI. For example in Figure 3, given the most
responsible level as the explanation, the user might keep the
lower of the two Goombas, despite the fact that it seems to
be floating, if they can match it to the Goombas from the
most responsible level.
    For this purpose, we pre-trained a Deep RL agent using
data from interactions of human users with three different
ML level design partners (LSTM, Markov Chain, and Bayes
Net) to generate the Super Mario Bros level. This is the same
Deep RL architecture and data from prior work by Guzdial          Figure 2: Architecture of our Convolutional Neural Network
et al. (Guzdial, Liao, and Riedl 2018) for co-creative Proce-     (CNN).
dural Level Generation via Machine Learning (PLGML), in
which they made use of the level design editor from (Guzdial
et al. 2017) which is publicly online.1 The agent is designed     of a human user working with the AI in the co-creative tool.
to take in a current level design state and to output additions   We can then search these arrays and find the ID of a training
to that level design, in order to iteratively complete a level    instance that is the most responsible for changes to a partic-
with a human partner.                                             ular weight.
    Our training inputs are states and the outputs are the Q         Our end goal is to determine the most responsible train-
table values for taking a particular action for the particular    ing instance for a particular prediction made by our trained
state. The input comes into the network as a state of shape       CNN. To do that, we need to find out what part of the net-
(40x15x34). The 40 is the width and 15 is the height of a         work was most important in making that prediction. We can
level chunk. At each x,y location there are 34 possible level     then determine the most responsible instance for the final
components (e.g. ground, goomba, pipe, mushroom, tree,            weights of this most important part of the network. The most
Mario, flag, ...) that could be placed there. As is shown in      activated filter of each convolutional layer is a filter that con-
the visualized architecture of the Convolutional Neural Net-      tributes to the slice with the largest magnitude in the output
work (CNN) in Figure 2, it has three convolutional layers         of that layer. Hence the most activated filter can be con-
and a fully connected layer followed by a reshaping function      sidered the most important part of the convolutional layer
to make the output in the form of the action matrix which is      for that specific test instance (Erhan, Courville, and Bengio
(40x15x32). The player (Mario) and flag are the level en-         2010). For example, we pass a test instance into the network.
tities that cannot be counted as an action, so there are 32       A test instance is a (40x15x34) state that is a chunk of a par-
possible action components instead of the 34 state entities.      tially designed level. Since the first convolutional layer has 8
Our activation function is “Leaky ReLu” for every layer and       4x4x34 filters with the same padding, the output would be in
the loss function is “Mean Squared Error” and the optimizer       the shape of (40x15x8). Then we find the (40x15) slice with
is “Adam”, with the network built in Tensorflow (Abadi et         the largest values. The most activated filter is a (4x4x34) ar-
al. 2016). We make use of this existing agent and data since      ray in our convolutional layer which led to the slice with the
it is the only example of a co-creative PCGML agent where         greatest magnitude.
the data from a human subject study is publicly available.           Finally, once we have the maximally activated filter we
    During each training epoch we employ a batch size of          can identify the most responsible training instance (or in-
one to track when each training instance passes through           stances) by querying the MRIN-Conv arrays we built during
the network. We calculate and store the change of neuron          training. The most responsible training instance is the ID
weights between batches. After training, by summing over          that most repeated in the MRIN-Conv array associated with
the changes of each neuron weight with respect to training        the maximally activated filter. We chose the most repeated
data, we are able to identify which training instance max-        ID since it is the one that most frequently impacted the ma-
imally results in alteration of a neuron. Since positive and      jority of the neurons in the filter during training.
negative values can counteract each other’s effects, it is im-
portant to not look at the absolute values until the end of
the training. We can then sum and store this information in-                               Evaluation
side eight arrays of shape (4x4x34) for the first convolutional   In this section, we present two evaluations of our system.
layer, 16 arrays of shape (3x3x8) for the second convolu-         We call the first evaluation our “Explainability Evaluation”
tional layer, and 32 arrays of shape (3x3x16) for the third       as it addresses the ability of our system to provide explana-
convolutional layer. These are the shapes of the filters in       tions that help a user predict an AI agent’s actions. We call
each layer. We name these arrays Most Responsible Instance        the second evaluation our “User Labeling Error Evaluation”
for each Neuron in each Convolutional layer (MRIN-Conv1,          as it addresses the ability of our system to help human users
MRIN-Conv2, and MRIN-Conv3). These data representa-               identify positive and negative AI additions during the co-
tions link neurons to IDs representing a particular instance      creative process. Both evaluations approximate the impact
                                                                  of our approach on human partners by using existing data of
   1
       1https://github.com/mguzdial3/Morai-Maker-Engine           AI-human interactions. Essentially, we act as though the pre-
                                                                     (B) Our second testset is obtained from a study in which ex-
                                                                          pert level designer users interacted with the trained Deep
                                                                          RL agent (Guzdial et al. 2019).
                                                                          If we find success with the first testset then that would in-
                                                                       dicate that our trained Deep RL agent is a good surrogate
                                                                       for the original three ML agents, since we would be in ef-
                                                                       fect predicting the next action of one of these agents. Good
                                                                       results for the second testset would demonstrate the capa-
                                                                       bility for prediction of the Deep RL agent’s actions itself.
                                                                       Since the first convolutional layer is the layer that most di-
                                                                       rectly reasons over the level structure, we decided to find
                                                                       the most responsible training instance of just the first con-
                                                                       volutional layer. However, this setup puts our approach at a
                                                                       disadvantage, since we are going to compare only one most
  Figure 3: An example of explaining an AI agent’s action by           responsible level to 20 random ones.
  representing the most responsible level.                                For comparing the most responsible level and the random
                                                                       levels to the actions, we needed to define a suitable metric.
                                                                       We desired a metric that detects local overlaps and repre-
  recorded actions of the AI agent were outputs from our Deep          sents the similarity between a level and action. We wanted
  RL agent and identify the responsible training instances as          to pick square windows which are not the same size as the
  if this were the case. Due to the fact that our system derives       first convolutional layer, to capture some local structures
  examples as explanations for the behavior of a co-creative           without biasing the metric too far towards our first convo-
  Deep RL agent, a human subject study would be the natu-              lutional layer. As a result, we found all three-by-three non-
  ral way to evaluate our system. However, prior to a human            empty patches for both a given level and an action. Then
  subject study, we first wanted to gather some evidence of the        we counted the number of exact matches of these patches
  value of this approach.                                              on both sides, removing the matched ones from the dataset
                                                                       since we wanted to count the same patches only once. Fi-
  Explainability Evaluation                                            nally, we divided the total number of the matched patches
  The first claim we made was that this approach can help hu-          by the total number of patches in the action, since this was
  man users better understand and predict the actions of an            always smaller than the number from the level. We refer to
  AI agent. In this experiment we use the most responsible             this metric as the local overlap ratio.
  level as an approximation of the AI agent’s goal, in other           Explainability Evaluation Results
  words what final level the AI agent is working towards. The
  most responsible level refers to a level at the end of a human       We had 242 samples in the first testset and 69 samples in the
  user’s interactions with an AI agent. We identify this level         second one. Since we wanted to compare instances in which
  by finding the most responsible training instance as above           the AI agent actually made some serious changes, we chose
  and identifying the level at the end of that training sequence.      instances where the AI agent added more than 10 compo-
  This experiment is meant to determine if this can help a user        nents in its next action. Thus we came to 38 and 46 instances
  to predict the AI agent’s actions. To do this, we passed test        from the first and second testsets, respectively.
  instances into our network and found the most responsible               Our approach outperforms the random baseline in 78.94
  training instances. We then compared the most responsible            percent of 38 instances for the ML agents data and 67.29
  level for some current test instance to the AI agent’s action        percent of 46 instances for the Deep RL agent data. The av-
  in the next test instance. If the most responsible level is sim-     erage of the local overlap ratios is shown in Table 1 (higher
  ilar to the action it would indicate that the most responsible       is better). The minimum value here would be 0 for zero over-
  level can be a potential explanation for the AI agent’s action       lap and the maximum value would be 1 for complete overlap
  by priming the user to better predict future actions by the AI       between the action and the most responsible level or the ran-
  agent. In comparison, we randomly selected 20 levels from            dom level. This normalization means that even small differ-
  the training data and found their similarities to the AI agent’s     ences in this metric represent large perceptual differences.
  action in the next test instance. If our approach outperforms        For example, a 0.04 difference in the local overlap ratio be-
  the random levels, it will support the claim that the respon-        tween the most responsible level and the random levels in
  sible level is better suited to helping predict future AI agent      Table 1 indicates the most responsible level has 20 more
  actions compared to random levels.                                   three-by-three non-empty overlaps. We expect that the rea-
                                                                       son that the Deep RL agent values are generally lower is
     We used two different sets of test data:
                                                                       that the second study made use of published level designers
(A) Our first testset is derived from a study in which users           rather than novices and an adaptive Deep RL Agent, mean-
    interacted with pairs of three different ML agents as men-         ing that there was more varied behavior compared with the
    tioned in our System Overview section (Guzdial, Liao,              three ML agents.
    and Riedl 2018). We used the same testset identified in               An example of explainability is demonstrated in Figure 3.
    that paper.                                                        As is shown in the figure, the AI agent made an action and
     TestSet        Most Responsible Level       Random Levels        they reached this point. Thus we compared these two states
    ML Agents              0.4653                   0.3841            to find all the changes that the AI agent or the user made and
     Deep RL               0.2880                   0.2472            named this the Difference-state (D-state).
                                                                         We compared each D-state with the final generated level
  Table 1: A table comparing the average of the most respon-          derived from the most responsible training instance. We also
  sible levels to the average of the random levels for both test-     compared each D-state with 20 other randomly selected lev-
  sets.                                                               els from the existing data. For the comparison, we used the
                                                                      local overlap ratio defined in the previous evaluation. If our
                                                                      approach outperforms the random baseline, we will be able
  added some components (e.g. goomba and ground) to the               to say that there is some support for the responsible level
  existing state. By looking at the chunk of the most respon-         helping the user avoid false-positives and false-negatives in
  sible level, the user might realize that the AI agent wants         comparison to random levels.
  to generate a level including some goombas as enemies and
  some blocks in the middle of the screen. The AI agent also          User Labeling Error Evaluation Results
  added ground at the bottom and top of the screen, which the         We found five false-negative and 24 false-positive exam-
  user could identify as being consistent with both their input       ples in the first testset and five false-negative and 54 false-
  to the agent and the most responsible level.                        positive examples in the second one. The results of the eval-
                                                                      uation are demonstrated in Figures 4.
  User Labeling Error Evaluation                                         For the first dataset which included the actions of the three
  For the second evaluation, we wanted to get some sense of           ML agents, our approach outperformed the random baseline
  whether this approach could be successful in terms of as-           in 65.51 percent of the examples. The average of the local
  sisting a human user in better understanding good and bad           overlap ratio values for our approach was 0.1717 which is
  agent actions during the co-creation process. To do this, we        more than the 0.1647 for the random levels. For the sec-
  needed to identify specific instances where our tool could be       ond dataset obtained from the Deep RL agent, our approach
  helpful in the data we have available. We defined two such          outperformed the baseline in 59.32 percent of the examples.
  concepts: (A) false-positive decisions and (B) false-negative       The average of the local overlap ratio values were 0.2665
  decisions, based on the interactions between users and AI           and 0.2328 for the most responsible level and random levels,
  partner during level generation:                                    respectively. Again this represents a large perceptual differ-
(A) False-positive decisions are additions by the AI partner          ence of roughly 15 more non-empty 3x3 overlaps.
    that the user kept at first but then deleted later.                  Interestingly, our approach outperforms the random levels
                                                                      in all of the false-negative examples in the second dataset,
(B) False-negative decisions are additions by the AI partner          compared with just 20 percent of false-negatives in the first
    that the user deleted at first but then added later.              dataset. Further, our approach performs around 1.5 times
  Given these concepts, if we could help the user avoid making        better than the random levels in 15 false-positive examples
  these kinds of decisions, our approach could help a human           in the second dataset. These instances come from the study
  user during level generation. We anticipated that one reason        that used the same RL agent as we used to derive our expla-
  that users made these kinds of decisions was from a lack of         nations, which could account for this performance.
  context of the AI agent’s action. Thus, if the user had context
  they may not delete or keep what they would otherwise keep                                  Discussion
  or delete, respectively.                                            In this paper, we present an XAI approach for a pre-trained
     To accomplish this, we implemented an algorithmic way            Deep RL agent. Our hypothesis was that our method could
  to determine false-positives and false-negatives among the          be helpful to human users. We evaluated it by approximat-
  two testsets described in the previous evaluation. In this al-      ing this process for two tasks using two existing datasets.
  gorithm, we first find all user decisions in terms of deleting      These datasets are obtained from studies using three ML
  or keeping an addition by the AI agent. Then we look at the         partners and an RL agent. Essentially, we used the XAI-
  level at the end of the user and the AI agent’s interaction. If a   enabled agent in this paper as if it were the agents used
  deleted AI addition exists in the final level, it is counted as a   in these datasets. The results of our first evaluation demon-
  false-negative example, and if a kept addition does not exist       strates that our method is able to represent examples as ex-
  in the final level it is counted as a false-positive example.       planations to help users predict an agent’s next action. The
     Once we discovered all false-negative and false-positive         results of our second evaluation support our hypothesis and
  examples, we found the state before the example was                 give us an initial signal that this approach could be success-
  added by the AI agent and named it the Introduction-                ful in order to help human users more efficiently cooperate
  state (I-state). We found the state in which false-positivity       with a Deep RL agent. This indicates the ability of our ap-
  or false-negativity occurred (i.e. when a user re-added a           proach to help human designers by presenting an explana-
  false-negative or deleted a false-positive) and named it the        tion for an AI agent’s actions during a co-creation process.
  Contradiction-state (C-state). Since some change between               A human subject study would be a more reasonable way
  the I-state and the C-state led to the user altering their de-      to evaluate this system since human users might be able to
  cision, we wanted to see some sign that presenting the most         derive meaning from the responsible level that our similar-
  responsible level to the user could change their mind before        ity metric could not capture. Our approach performs better
                                    Figure 4: Results of the User Labeling Error Evaluation.


than our baseline of random levels in both evaluation meth-        In the future, we intend to explore how more general rep-
ods and this presents evidence towards its value at this task.     resentations of responsibility such as Shapely values might
However, we look forward to investigating a human subject          intersect with this approach (Ghorbani and Zou 2019).
study in order to fully validate these results.                       Only the domain of a co-creative system for designing Su-
   There could be other alternatives to a human subject            per Mario Bros. levels is explored in this paper. Thus mak-
study. For example, a secondary AI agent that predicts our         ing use of other games will be required to ensure this is a
primary AI agent’s actions can play a human partner’s role in      general method for level design co-creativity. Beyond that,
the co-creative system. Thus making use of a secondary AI          we anticipate a need to demonstrate our approach on differ-
agent to evaluate our system before running a human subject        ent domains outside of games. We look forward to running
study might be a simple next step.                                 another study to apply our approach to human-in-the-loop
   It is important to mention that we only offer one most          reinforcement learning or other co-creative domains.
responsible level from only the first convolutional layer as
an explanation. Looking into providing a user with multiple                              Conclusions
responsible levels or looking into the most responsible lev-       In this paper we present an approach to XAI that provides
els of the other layers could be a potential way to further        human users with the most responsible training instance as
improve our approach. Our metric for determining the most          an explanation for an AI agent’s action. In support of this
responsible training instance is based on finding the most         approach, we present results from two evaluations. The first
repeated instance inside the MRIN-Conv arrays associated           evaluation demonstrates the ability of our approach to of-
with the most activated filter. We identified the most acti-       fer explanations and to help a human partner predict an
vated filter by looking at the absolute values. We plan to in-     AI agent’s actions. The second evaluation demonstrates the
vestigate other metrics such as looking for the most activated     ability of our approach to help human users better identify
neurons outside of the filters. In addition, considering neg-      good and bad instances of an AI agent’s behavior. To the
ative and positive values separately in the maximal activa-        best of our knowledge this represents the first XAI approach
tion process could also lead to improved behavior. Negative        focused on training instances.
values might indicate that an instance negatively impacted
a neuron. It could be the case then that the filter might be                        Acknowledgements
maximally activated because it was giving a very strong sig-
nal against some action.                                           We acknowledge the support of the Natural Sciences and
   One quirk of our current approach is that the most respon-      Engineering Research Council of Canada (NSERC) and the
sible training instance depends on the order in which it was       Alberta Machine Intelligence Institute (Amii).
presented to the model during the training. Thus, this mea-
sure does not tell us about any inherent quality of a partic-                             References
ular training data instance, only it’s relevance to a particu-     Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean,
lar model that has undergone a particular training regimen.        J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al.
2016. Tensorflow: A system for large-scale machine learn-         Erhan, D.; Bengio, Y.; Courville, A.; and Vincent, P. 2009.
ing. In 12th {USENIX} symposium on operating systems              Visualizing higher-layer features of a deep network. Univer-
design and implementation ({OSDI} 16), 265–283.                   sity of Montreal 1341(3):1.
Adadi, A., and Berrada, M. 2018. Peeking inside the black-        Erhan, D.; Courville, A.; and Bengio, Y. 2010. Understand-
box: A survey on explainable artificial intelligence (xai).       ing representations learned in deep architectures. Depart-
IEEE Access 6:52138–52160.                                        ment dInformatique et Recherche Operationnelle, University
Bach, S.; Binder, A.; Müller, K.-R.; and Samek, W. 2016.         of Montreal, QC, Canada, Tech. Rep 1355:1.
Controlling explanatory heatmap resolution and semantics          Fong, R. C., and Vedaldi, A. 2017. Interpretable expla-
via decomposition depth. In 2016 IEEE International Con-          nations of black boxes by meaningful perturbation. In Pro-
ference on Image Processing (ICIP), 2271–2275. IEEE.              ceedings of the IEEE International Conference on Computer
Baldwin, A.; Dahlskog, S.; Font, J. M.; and Holmberg, J.          Vision, 3429–3437.
2017. Mixed-initiative procedural generation of dungeons          Garcı́a, S.; Fernández, A.; and Herrera, F. 2009. Enhanc-
using game design patterns. In 2017 IEEE Conference               ing the effectiveness and interpretability of decision tree and
on Computational Intelligence and Games (CIG), 25–32.             rule induction classifiers with evolutionary training set se-
IEEE.                                                             lection over imbalanced problems. Applied Soft Computing
Biran, O., and Cotton, C. 2017. Explanation and justification     9(4):1304–1314.
in machine learning: A survey. In IJCAI-17 workshop on            Ghorbani, A., and Zou, J. 2019. Data shapley: Equi-
explainable AI (XAI), volume 8.                                   table valuation of data for machine learning. arXiv preprint
Boz, O., and Hillman, D. 2000. Converting a trained neural        arXiv:1904.02868.
network to a decision tree dectext-decision tree extractor.       Goodfellow, I.; Bengio, Y.; and Courville, A. 2016. Deep
Citeseer.                                                         learning. MIT press.
Charity, M.; Khalifa, A.; and Togelius, J. 2020. Baba is y’all:   Guan, L.; Verma, M.; and Kambhampati, S. 2020. Expla-
Collaborative mixed-initiative level design. arXiv preprint       nation augmented feedback in human-in-the-loop reinforce-
arXiv:2003.14294.                                                 ment learning. arXiv preprint arXiv:2006.14804.
Che, Z.; Purushotham, S.; Khemani, R.; and Liu, Y. 2015.          Guzdial, M. J.; Chen, J.; Chen, S.-Y.; and Riedl, M. 2017.
Distilling knowledge from deep networks with applications         A general level design editor for co-creative level design.
to healthcare domain. arXiv preprint arXiv:1512.03542.            In Thirteenth Artificial Intelligence and Interactive Digital
Cook, M.; Colton, S.; Pease, A.; and Llano, M. T. 2019.           Entertainment Conference.
Framing in computational creativity-a survey and taxonomy.        Guzdial, M.; Reno, J.; Chen, J.; Smith, G.; and Riedl, M.
In ICCC, 156–163.                                                 2018. Explainable pcgml via game design patterns. arXiv
Cortez, P., and Embrechts, M. J. 2011. Opening black box          preprint arXiv:1809.09419.
data mining models using sensitivity analysis. In 2011 IEEE       Guzdial, M.; Liao, N.; Chen, J.; Chen, S.-Y.; Shah, S.; Shah,
Symposium on Computational Intelligence and Data Mining           V.; Reno, J.; Smith, G.; and Riedl, M. O. 2019. Friend,
(CIDM), 341–348. IEEE.                                            collaborator, student, manager: How design of an ai-driven
Cortez, P., and Embrechts, M. J. 2013. Using sensitivity          game level editor affects creators. In Proceedings of the
analysis and visualization techniques to open black box data      2019 CHI Conference on Human Factors in Computing Sys-
mining models. Information Sciences 225:1–17.                     tems, 1–13.
Cruz, F.; Dazeley, R.; and Vamplew, P. 2019. Memory-              Guzdial, M.; Liao, N.; and Riedl, M.              2018.     Co-
based explainable reinforcement learning. In Australasian         creative level design via machine learning. arXiv preprint
Joint Conference on Artificial Intelligence, 66–77. Springer.     arXiv:1809.09420.
Dabkowski, P., and Gal, Y. 2017. Real time image saliency         Hara, S., and Hayashi, K. 2018. Making tree ensembles in-
for black box classifiers. In Advances in Neural Information      terpretable: A bayesian model selection approach. In Inter-
Processing Systems, 6967–6976.                                    national Conference on Artificial Intelligence and Statistics,
Dahlskog, S., and Togelius, J. 2014. A multi-level level          77–85.
generator. In 2014 IEEE Conference on Computational In-           Jain, R.; Isaksen, A.; Holmgård, C.; and Togelius, J. 2016.
telligence and Games, 1–8. IEEE.                                  Autoencoders for level generation, repair, and recognition.
Deterding, S.; Hook, J.; Fiebrink, R.; Gillies, M.; Gow, J.;      In Proceedings of the ICCC Workshop on Computational
Akten, M.; Smith, G.; Liapis, A.; and Compton, K. 2017.           Creativity and Games.
Mixed-initiative creative interfaces. In Proceedings of the       Khalifa, A.; Bontrager, P.; Earle, S.; and Togelius, J.
2017 CHI Conference Extended Abstracts on Human Fac-              2020. Pcgrl: Procedural content generation via reinforce-
tors in Computing Systems, 628–635.                               ment learning. arXiv preprint arXiv:2001.09212.
Ehsan, U.; Harrison, B.; Chan, L.; and Riedl, M. O. 2018.         Kumar, H. 2019. Explainable ai: Deep reinforcement learn-
Rationalization: A neural machine translation approach to         ing agents for residential demand side cost savings in smart
generating natural language explanations. In Proceedings of       grids. arXiv preprint arXiv:1910.08719.
the 2018 AAAI/ACM Conference on AI, Ethics, and Society,          Letham, B.; Rudin, C.; McCormick, T. H.; Madigan, D.;
81–87.                                                            et al. 2015. Interpretable classifiers using rules and bayesian
analysis: Building a better stroke prediction model. The An-     Summerville, A. J.; Philip, S.; and Mateas, M. 2015. Mcm-
nals of Applied Statistics 9(3):1350–1371.                       cts pcg 4 smb: Monte carlo tree search to guide platformer
Liapis, A.; Yannakakis, G. N.; and Togelius, J. 2013. Sen-       level generation. In Eleventh Artificial Intelligence and In-
tient sketchbook: computer-assisted game level authoring.        teractive Digital Entertainment Conference.
Madumal, P.; Miller, T.; Sonenberg, L.; and Vetere, F. 2019.     Tan, S.; Caruana, R.; Hooker, G.; and Lou, Y. 2017. Detect-
Explainable reinforcement learning through a causal lens.        ing bias in black-box models using transparent model distil-
arXiv preprint arXiv:1905.10958.                                 lation. arXiv preprint arXiv:1710.06169.
Nguyen, A.; Clune, J.; Bengio, Y.; Dosovitskiy, A.; and          Volz, V.; Schrum, J.; Liu, J.; Lucas, S. M.; Smith, A.; and
Yosinski, J. 2017. Plug & play generative networks: Condi-       Risi, S. 2018. Evolving mario levels in the latent space
tional iterative generation of images in latent space. In Pro-   of a deep convolutional generative adversarial network. In
ceedings of the IEEE Conference on Computer Vision and           Proceedings of the Genetic and Evolutionary Computation
Pattern Recognition, 4467–4477.                                  Conference, 221–228.
Nguyen, A.; Yosinski, J.; and Clune, J. 2015. Deep neural        Weidele, D.; Strobelt, H.; and Martino, M. 2019. Deepling:
networks are easily fooled: High confidence predictions for      Avisual interpretability system for convolutional neural net-
unrecognizable images. In Proceedings of the IEEE confer-        works. Proceedings SysML.
ence on computer vision and pattern recognition, 427–436.        Xu, K.; Park, D. H.; Yi, C.; and Sutton, C. 2018. Interpret-
Nguyen, A.; Yosinski, J.; and Clune, J. 2016. Multifaceted       ing deep classifier by visual distillation of dark knowledge.
feature visualization: Uncovering the different types of fea-    arXiv preprint arXiv:1803.04042.
tures learned by each neuron in deep neural networks. arXiv      Yannakakis, G. N.; Liapis, A.; and Alexopoulos, C. 2014.
preprint arXiv:1602.03616.                                       Mixed-initiative co-creativity.
Olah, C.; Satyanarayan, A.; Johnson, I.; Carter, S.; Schubert,   Zednik, C. 2019. Solving the black box problem: A norma-
L.; Ye, K.; and Mordvintsev, A. 2018. The building blocks        tive framework for explainable artificial intelligence. Phi-
of interpretability. Distill 3(3):e10.                           losophy & Technology 1–24.
Olah, C.; Mordvintsev, A.; and Schubert, L. 2017. Feature        Zeiler, M. D., and Fergus, R. 2014. Visualizing and under-
visualization. Distill 2(11):e7.                                 standing convolutional networks. In European conference
                                                                 on computer vision, 818–833. Springer.
Puiutta, E., and Veith, E. 2020. Explainable reinforcement
learning: A survey. arXiv preprint arXiv:2005.06247.             Zhu, J.; Liapis, A.; Risi, S.; Bidarra, R.; and Youngblood,
                                                                 G. M. 2018. Explainable ai for designers: A human-
Schrum, J.; Gutierrez, J.; Volz, V.; Liu, J.; Lucas, S.; and
                                                                 centered perspective on mixed-initiative co-creation. In
Risi, S. 2020. Interactive evolution and exploration within
                                                                 2018 IEEE Conference on Computational Intelligence and
latent level-design space of generative adversarial networks.
                                                                 Games (CIG), 1–8. IEEE.
arXiv preprint arXiv:2004.00151.
Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.;
Parikh, D.; and Batra, D. 2017. Grad-cam: Visual explana-
tions from deep networks via gradient-based localization. In
Proceedings of the IEEE international conference on com-
puter vision, 618–626.
Simonyan, K.; Vedaldi, A.; and Zisserman, A. 2013.
Deep inside convolutional networks: Visualising image
classification models and saliency maps. arXiv preprint
arXiv:1312.6034.
Smith, G.; Whitehead, J.; and Mateas, M. 2010. Tanagra:
A mixed-initiative level design tool. In Proceedings of the
Fifth International Conference on the Foundations of Digital
Games, 209–216.
Snodgrass, S., and Ontanón, S. 2016. Learning to generate
video game maps using markov models. IEEE transactions
on computational intelligence and AI in games 9(4):410–
422.
Summerville, A., and Mateas, M. 2016. Super mario as a
string: Platformer level generation via lstms. arXiv preprint
arXiv:1603.00930.
Summerville, A.; Snodgrass, S.; Guzdial, M.; Holmgård, C.;
Hoover, A. K.; Isaksen, A.; Nealen, A.; and Togelius, J.
2018. Procedural content generation via machine learning
(pcgml). IEEE Transactions on Games 10(3):257–270.