=Paper= {{Paper |id=Vol-2282/EXAG_107 |storemode=property |title=Explainable PCGML via Game Design Patterns |pdfUrl=https://ceur-ws.org/Vol-2282/EXAG_107.pdf |volume=Vol-2282 |authors=Matthew Guzdial,Joshua Reno,Jonathan Chen,Gillian Smith,Mark Riedl |dblpUrl=https://dblp.org/rec/conf/aiide/GuzdialRCSR18 }} ==Explainable PCGML via Game Design Patterns== https://ceur-ws.org/Vol-2282/EXAG_107.pdf
                            Explainable PCGML via Game Design Patterns

         Matthew Guzdial1 , Joshua Reno1 , Jonathan Chen1 , Gillian Smith2 , and Mark Riedl1
                                              Georgia Institute of Technology1
                                              Worcester Polytechnic Institute2
                   {mguzdial3, jreno, jonathanchen}@gatech.edu, gmsmith@wpi.edu, riedl@cc.gatech.edu



                           Abstract                                   Design patterns (Bjork and Holopainen 2004) represent
                                                                   one popular way to represent game design knowledge. A de-
  Procedural content generation via Machine Learning
                                                                   sign pattern is a category of game structure that serves a gen-
  (PCGML) is the umbrella term for approaches that generate
  content for games via machine learning. One of the benefits      eral design purpose across similar games. Researchers tend
  of PCGML is that, unlike search or grammar-based PCG, it         to derive design patterns via subjective application of design
  does not require hand authoring of initial content or rules.     expertise (Hullett and Whitehead 2010), which makes it dif-
  Instead, PCGML relies on existing content and black box          ficult to broadly apply one set of patterns across different
  models, which can be difficult to tune or tweak without          designers and games. The same subjective limitation also
  expert knowledge. This is especially problematic when a          means that an individual set of design patterns can serve to
  human designer needs to understand how to manipulate             clarify what elements of a game matter to an individual de-
  their data or models to achieve desired results. We present      signer. Given a set of design patterns specialized to a partic-
  an approach to Explainable PCGML via Design Patterns in          ular designer one could leverage these design patterns in an
  which the design patterns act as a vocabulary and mode of
                                                                   Explainable PCGML system to help a designer understand
  interaction between user and model. We demonstrate that
  our technique outperforms non-explainable versions of our        and tweak a model to their needs. We note our usage of the
  system in interactions with five expert designers, four of       term pattern differs from the literature. Typically, a design
  whom lack any machine learning expertise.                        pattern generalizes across designers, whereas we apply it to
                                                                   indicate the unique structures across a game important to an
                                                                   individual designer.
                       Introduction                                   We present an underlying system for a potential co-
Procedural Content Generation (PCG), represents a field            creative PCGML tool, intended for designers without ML
of research into, and a set of techniques for, generating          expertise. This system takes user-defined design patterns for
game content algorithmically. PCG historically requires a          a target level, and outputs a PCGML model. The design pat-
significant amount of human-authored knowledge to gener-           terns provided by designers and generated by our system can
ate content, such as rules, heuristics, and individual compo-      be understood as labels on level structure, which allow our
nents, creating a time and design expertise burden. Procedu-       PCGML model to better represent and reflect the design val-
ral Content Generation via Machine Learning (PCGML) at-            ues of an individual designer. This system has two major
tempts to solve these issues by applying machine learning to       components: (1) a classification system that learns to clas-
extract this design knowledge from existing corpora of game        sify level structures with the user-specified design pattern
content (Summerville et al. 2017). However, this approach          labels. This system ensures a user does not have to label all
has its own weaknesses; Applied naively, these models re-          existing content to train (2) a level generation system that in-
quire machine learning literacy to understand and debug.           corporates the user’s level design patterns, and can use these
Machine learning literacy is uncommon, especially among            patterns as a vocabulary with which to interact with the user.
those designers who might most benefit from PCGML.                 For example, generating labels on level structure to represent
   Explainable AI represents a field of research into opening      the model’s interpretation of that structure to the user.
up black box Artificial Intelligence and Machine Learning             The rest of this paper is organized as follows. First, we re-
models to users (Biran and Cotton 2017). The promise of ex-        late our work to prior, related work. Second, we describe our
plainable AI is not just that it will help users understand such   Explainable PCGML (XPCGML) system in terms of the two
models, but also tweak these models to their needs (Olah et        major components. Third, we discuss the three evaluations
al. 2018). If we could include some representation of an in-       we ran with five expert designers. We end with a discussion
dividual game designer’s knowledge into a model, we could          of the systems limitations, future work, and conclusions. Our
help designers without ML expertise better understand and          major contributions are the first application of explainable
alter these models to their needs.                                 AI to PCGML, the use of a random forest classifier to min-
                                                                   imize user effort, and the results of our evaluations. Our re-
                                                                   sults demonstrate both the promise of these pattern labels
                                                  dropout
                                          conv:              conv:                             deconv:   deconv:
                                         64x5x5             64x5x5          encoding:          64x5x5    64x5x5
     “intro”                         …                                                                                            …   “intro”
                                                                              512
                   input: 8x8x30+n                                   relu               relu                       output: 8x8x30+n


                                         Figure 1: Network architecture for the Autoencoder.


in improving user interaction and a positive impact on the                      ery sample be hand-labeled with an explanation and treated
underlying model’s performance.                                                 explanations from different authors as equivalent. Ehsan et
                                                                                al. (2017) made use of explainable AI for explainable agent
                     Related Work                                               behavior for automated game playing. Their approach relies
                                                                                on rationalization, which relies on a second machine learn-
There exist many prior approaches to co-creative or mixed-                      ing interpretation of the original behavior, rather than visual-
initiative design agents and editors (Yannakakis, Liapis, and                   izing or explaining the original model as our approach does.
Alexopoulos 2014; Deterding et al. 2017). However, the                             Design patterns represent a well-researched approach to
majority of existing approaches have relied upon search                         game design (Bjork and Holopainen 2004). In theory, game
or grammar-based approaches instead of machine learning,                        design patterns describe general solutions to game design
making it difficult to adapt to the needs of a particular de-                   problems that occur across many different games. Game De-
signer over time (Liapis, Yannakakis, and Togelius 2013;                        sign patterns have been used as heuristics in evolutionary
Shaker, Shaker, and Togelius 2013; Baldwin et al. 2017). A                      PCG systems including in the domain of Super Mario Bros.
final version of our system would focus on machine learn-                       (Dahlskog and Togelius 2012). Researchers tend to derive
ing, adapting to the user, and explaining and visualizing its                   game design patterns through either rigorous, cross-domain
inner model/process.                                                            analysis (Milam and El Nasr 2010) or based upon their sub-
   Procedural content generation via Machine Learning                           jective interpretation of game structure. We embrace this
(Summerville et al. 2017) is a relatively new field, focused                    subjectivity in our work by having designers create a lan-
on generating content through machine learning methods.                         guage of game design patterns unique to themselves with
The majority of PCGML approaches represent black box                            which to interact with a PCGML system.
methods, without any prior approach focused on explainabil-
ity or co-creativity. We note some discussion in the Sum-
merville et al. survey paper on potential collaborative ap-
                                                                                                           System Overview
proaches. Summerville (2016a) explored adapting levels to                       The approach presented in this paper builds an Explainable
players, but no work to our knowledge looks at adapting                         PCGML model based on existing level structure and an ex-
models to individual designers.                                                 pert labeling design patterns upon that structure. We chose
   Super Mario Bros. (SMB) represents a common area                             Super Mario Bros. as a domain given its familiarity to the
of research into PCGML (Dahlskog and Togelius 2012;                             game designers who took part in our evaluation. The gen-
Summerville and Mateas 2016; Jain et al. 2016; Snodgrass                        eral process for building a final model is as follows: First,
and Ontanón 2017). Beyond explainability, our approach                         users label existing game levels with the game design pat-
differs from prior SMB PCGML approaches in terms of rep-                        terns they want to use for communicating with the system.
resentation quality and the size of generated content. We                       For example, one might label both areas with large amounts
focus on the generation of individual level sections instead                    of enemies and areas that require precise jumps as “chal-
of entire levels in order to better afford collaborative level                  lenges”. The exact label can be anything as long as it is used
building (Smith, Whitehead, and Mateas 2011). Second,                           consistently. Given this initial user labeling of level struc-
prior approaches have abstracted away the possible level                        ture, we train a random forest classifier to classify additional
components into higher order groups. For example, treating                      level structure according to the labeled level chunks (Liaw,
all enemy types as equivalent and ignoring decorative ele-                      Wiener, and others 2002), which we then use to label all
ments. We make use of a rich representation of all possible                     available levels with the user design pattern labels. Given
level components and an ordering that allows our approach                       this now larger training set of both level structure and labels,
to place decorative elements appropriately.                                     we train a convolutional neural network-based autoencoder
   Explainable AI represents an emerging field of research                      on both levels structure and its associated labels (Lang 1988;
(Biran and Cotton 2017), focused on translating or rational-                    LeCun et al. 1989), which can then be used to generate new
izing the behavior of black box models. To the best of our                      level structure and label its generated content with these de-
knowledge, this has not been previously applied to PCGML.                       sign pattern labels (Jain et al. 2016).
Codella et al. (2018) demonstrated how explanations could                          We make use of Super Mario Bros. as our domain, and, in
improve model accuracy on three tasks, but required that ev-                    particular, we utilize those Super Mario Bros levels present
in the Video Game Level Corpus (Summerville et al. 2016b).         cabulary, and labeling content is much easier than designing
We do not include underwater or boss/castle Super Mario            new content. Further, we note that when two humans collab-
Bros. levels. We made this choice as we perceived these two        orate they must negotiate a shared vocabulary.
level types to be significantly different from all other level
types. Further, while we make use of the VGLC levels, we           Generator
do not make use of any of the VGLC Super Mario Bros.               The existing level generation system is based on an autoen-
representations, which abstract away level components into         coder, and we visualize its architecture in Figure 1. The input
higher order groups. Instead, we draw on the image pars-           comes in the form of a chunk of level content and the associ-
ing approach introduced in (Guzdial and Riedl 2016), using         ated design patterns label, such as “intro” in the figure. This
a spritesheet and OpenCV (Bradski and Kaehler 2000) to             chunk is represented as an eight by eight by thirty input ten-
parse images of each level for a richer representation.            sor plus a tensor of size n where n indicates the total number
   In total we identified thirty unique classes of level compo-    of design pattern labels given by the user. This last vector of
nents, and make use of a matrix representation for each level      size n is a one-hot encoded vector of level design pattern
section of size 8 × 8 × 30. The first two dimensions deter-        labels.
mine the tiles in the x and y axes, while the last dimension          After input, the level structure and design pattern label
represents a one-hot vector of length 30 expressing compo-         vector are separated. The level structure passes through a
nent class. This vector is all 0’s for any empty tile of a Super   two layer convolutional neural network (CNN). We note that
Mario Bros. level, and otherwise has 1’s at the index asso-        we placed a dropout layer in between the two CNN layers to
ciated with that particular level component. Thus, we can          allow better generalization. After the CNN layers the output
represent all level components, including background deco-         of this section and the design patterns vector recombine and
ration. We note that we treat palette swaps of the same com-       pass through a fully connected layer with relu activation to
ponent as equivalent in class.                                     an embedded vector of size 512. We note that, while large,
   We make use of the SciPy random forest classifier (Jones,       this is much smaller than the 1920+n features of the input
Oliphant, and Peterson 2014) and tensorflow for the autoen-        layer. The decoder section is an inverse of the encoder sec-
coder (Abadi et al. 2016).                                         tion of the architecture, starting with a relu fully connected
                                                                   layer, then deconvolutional neural network layers with up-
Design Pattern Label Classifier                                    sampling handling the level structure. We implemented this
Our goal for the design pattern label classifier is to minimize    model with an adam optimizer and mean square loss. Note
the amount of work and time costs for a potential user of the      that for the purposes of evaluation this is a standard autoen-
system. Users have to label level structure with the design        coder. We intend to make use of a variational autoencoder in
patterns they would like to use, but the label classifier en-      future work (Kingma and Welling 2013).
sures they do not have to hand-label all available levels. The
classifier for this task must be able to perform given access                              Evaluation
to whatever small amount of training data a designer is will-      Our system has two major parts: (1) a random forest clas-
ing to label for it, along with being able to easily update its    sifier that attempts to label additional content with user-
model given potential feedback from a user. We anticipate          provided design patterns to learn the designer’s vocabulary
the exact amount of training data the system has access to         and (2) an autoencoder over level structure and associated
will differ widely between users, but we do not wish to over-      patterns for generation. In this section we present three eval-
burden authors with long data labeling tasks. Random forest        uations of our system. The first addresses the random forest
classifiers are known to perform reasonably under these con-       classifier of labels, the second the entirety of the system, and
straints (Michalski, Carbonell, and Mitchell 2013).                the third addresses the limiting factor of time in human com-
   The random forest model takes in an eight by eight level        puter interactions. For all three evaluations we make use of
section and returns a level design pattern (either a user-         a dataset of levels from Super Mario Bros. labeled by five
defined design pattern or none). We train the random forest        expert designers.
model based on the design pattern labels submitted by the
user. We use a forest of size 100 with a maximum depth of          Dataset Collection
100 in order to encourage generality.
                                                                   We reached out to ten design experts to label three or more
   In the case of an interactive, iterative system the random
                                                                   Super Mario Bros. levels of their choice to serve as a dataset
forest can be easily retrained. In the case where the random
                                                                   for this evaluation. We do not include prior, published aca-
forest classifier correctly classifies any new design pattern
                                                                   demic patterns of Super Mario Bros. levels (e.g. (Dahlskog
there is no need for retraining. Otherwise, we can delete a
                                                                   and Togelius 2012)) as these patterns were designed for gen-
subset of the trees of the random forest that incorrectly clas-
                                                                   eral automated design instead of explainable co-creation.
sified the design pattern, and retrain an appropriate number
                                                                   Our goals for choosing these ten designers were to get as
of trees to return to the maximum forest size on the existing
                                                                   diverse a pool of labels as possible. Of these ten, five re-
labels and any additional new information.
                                                                   sponded and took part in this study.
   Even with the design pattern level classifier this system re-
quires the somewhat unusual step of labeling existing level        • Adam Le Doux: Le Doux is a game developer and de-
structure with design patterns a user finds important. How-          signer best known for his Bitsy game engine. He is cur-
ever, this is a necessary step for the benefit of a shared vo-       rently a Narrative Tool Developer at Bungie.
 set     labels      total   top three
 Le        47         259    platform (29), jump (22),
 Doux                        pipe-flower (22)
 Del       26         38     concept introduction (3), ob-
 Rosario                     jective (3), completionist re-
                             ward (2)
 Smith        21      95     enemy pair (17), staircase
                             (11), ditch (9)
 Smith-       25     118     multi level (18), enemy pair
 Naive                       (17), staircase (13)
 Kini         23      28     hazard introduction (3), hid-
                             den powerup (2), pipe trap (2)
 Snyder       34     155     walking enemy (18), power           Figure 2: The first example of the the top label in each set of
                             up block (17), coins (13)           design pattern labels.

Table 1: A table comparing the characteristics of our six sets       set     approach            train             test
of design pattern labels.                                            Le         RF            84.06±0.76       28.73±1.89
                                                                     Doux
                                                                     Le        CNN            49.06±4.43       26.73±3.70
• Dee Del Rosario: Del Rosario is an events and commu-               Doux
  nity organizer in games with organizations such as Differ-         Del        RF            86.71±0.78        0.77±1.07
  ent Games Collective and Seattle Indies, along with being          Rosario
  a gamedev hobbyist. They currently work as a web devel-            Del       CNN           36.08±19.01        1.04±0.82
  oper and educator.                                                 Rosario
• Kartik Kini: Kini is an indie game developer through his           Smith      RF            92.11±0.89       33.28±3.48
  studio Finite Reflection, and an associate producer at Car-        Smith     CNN            59.76±8.16       26.19±5.02
  toon Network Games.                                                Smith-     RF            88.49±0.92       42.31±2.96
• Gillian Smith: Smith is an Assistant Professor at WPI.             Naive
  She focuses on game design, AI, craft, and generative de-          Smith-    CNN           65.50±19.68      36.82±11.84
  sign.                                                              Naive
                                                                     Kini       RF            90.19±0.69       41.93±2.29
• Kelly Snyder: Snyder is an Art Producer at Bethesda and            Kini      CNN            44.32±2.89       29.07±2.13
  previously a Technical Producer at Bungie.
                                                                     Snyder     RF            86.72±1.62       1.85±5.56
   All five of these experts were asked to label their choice        Snyder    CNN            49.40±0.77       0.00±0.00
of three levels with labels that established “a common lan-
guage/vocabulary that you’d use if you were designing lev-       Table 2: A table comparing the accuracy of our random for-
els like this with another human”. Of these experts only         est label classifier compared to a CNN baseline.
Smith had any knowledge of the underlying system. She
produced two sets of design patterns for the levels she la-
beled, one including only those patterns she felt the sys-
                                                                 pert produced very distinct labels, with less than one percent
tem could understand and the second including all patterns
                                                                 of labels shared between different experts. We include the
that matched the above criteria. We refer to these labels as
                                                                 first example for the top label for each set of design patterns
Smith and Smith-Naive through the rest of this section, re-
                                                                 in Figure 2. Even in the case of Kini and Del Rosario, where
spectively.
                                                                 there is a similar area and design pattern label, the focus dif-
   These experts labeled static images of non-boss and non-
                                                                 fers. We train six separate models, one for each set of design
underwater Super Mario Bros. levels present in the Video
                                                                 pattern labels (Smith has two).
Game Level Corpus (VGLC) (Summerville et al. 2016b).
The experts labeled these images by drawing a rectangle
over the level structure in which the design pattern occurred    Label Classifier Evaluation
with some string to define the pattern. These rectangles         In this section we seek to understand how well our random
could be of arbitrary size, but we translated each into ei-      forest classifier is able to identify design patterns in level
ther a single training example centered on the eight by eight    structure. For the purposes of this evaluation we made use
chunk our system requires, or multiple training examples if      of AlexNet as a baseline (Szegedy et al. 2016), given that
it was larger than eight by eight.                               a convolutional neural network would be the naive way one
   We include some summarizing information about these           might anticipate solving this problem. We chose AlexNet
six sets of design pattern labels in Table 1. Specifically, we   given its popularity and success at similar image recognition
include the total number of labels and the top three labels,     tasks. In all instances we trained the AlexNet until its error
sorted by frequency and alphabetically, of each set. Each ex-    converged. We make use of a three-fold cross validation on
      set     No labels       No Auto Tag        Full              prove overall representative quality. The second autoencoder
      Le      12.3±6.3         152.8±4.8       10.6±5.6            variation does not make use of the automatic design pattern
      Doux                                                         label classifier, thus greatly reducing the training data. The
      Del     10.4±5.0          135.7±3.8       9.0±4.4            last variation is simply our full system. For all approaches
      Rosario                                                      we trained till training error converged. We note that we
      Smith   11.5±6.1          157.2±3.5      10.4±5.6            trained a single ’no labels’ variation and tested it on each
      Smith- 12.7±4.8           167.6±4.0      11.5±4.4            expert, but trained models for the no automatic classifier and
      Naive                                                        full versions of our approach for each expert.
      Kini     9.4±4.4          129.6±3.6      10.6±3.3               Given these three variations, we chose to measure the dif-
      Snyder 28.6±9.9           144.2±5.0      15.0±9.4            ference in structure when the autoencoder was fed the test
                                                                   portions of each of the three folds. Specifically we capture
Table 3: A table comparing the error in terms of incorrect         the number of incorrect structure features predicted. This
sprites for our three generators. Smaller values represent         can be understood as a stand in for representation quality,
fewer mistakes.                                                    given that the output of the autoencoder for the test sample
                                                                   will be the closest thing the autoencoder can represent to the
                                                                   test sample.
the labels for this and the remaining evaluations. We make            We give the average number and standard deviation of in-
use of a three-fold validation to address the variance across      correct structural features/tiles over all three folds in Table
even a single expert’s labels and due to the small set of labels   2. We note that the minimum value here would be 0 errors
available for some experts.                                        and the maximum value would be 8 × 8 × 30 or 1920 in-
   Our major focus is training and test accuracy across the        correct structural feature values. For every expert except for
folds. We summarize the results of this evaluation in Ta-          Kini, who authored the smallest number of labels, our full
ble 2, giving the average training and test accuracies across      system outperformed both variations. While some of these
all folds along with the standard deviation. We note that in       numbers are fairly close between the full and no labels vari-
all instances our random forest (RF) approach outperformed         ation, the values in bold were significantly lower according
AlexNet CNN in terms of training accuracy, and nearly al-          to the paired Wilcoxon Mann Whitney U test (p < 0.001).
ways in terms of test accuracy. We note that given more               Given the results in Table 3. We argue that both our hy-
training time AlexNet’s training accuracy might improve,           potheses were shown to be correct, granted that the expert
but at the cost of test accuracy. We further note that AlexNet     gives sufficient labels, with the cut-off appearing to be be-
was on average one and a half times slower than the random         tween Kini’s 28 and Del Rosario’s 38. Specifically the rep-
forest in terms of training time. These results indicate that      resentation quality is improved when labels are used, and the
our random forest produces a more general classifier com-          label classifier improves performance over not applying the
pared to AlexNet.                                                  label classifier.
   We note that our random forest performed fairly consis-
tently in terms of training accuracy, at around 85%, but that      Transfer Evaluation
the test accuracy varied significantly. Notably, the test ac-      A major concern for any co-creative tool based on Machine
curacy did not vary according to the the number of train-          Learning is training time. In the prior autoencoder evalua-
ing samples or number of labels per expert. This indicates         tion, both the no labels and full versions of our system took
that individual experts identify patterns that are more or less    hours to train to convergence. This represents a major weak-
easy to classify automatically. Further we note that Snyder        ness, given that in some co-creative contexts designers may
and Del Rosario had very low testing error across the board,       not want to wait for an offline training process, especially
which indicates a large amount of variance between tagged          when we anticipate authors wanting to rapidly update their
examples. Despite this, we demonstrate the utility of this ap-     set of labels. Given these concerns, we evaluate a variation
proach in the next section.                                        of our approach utilizing transfer learning. This drastically
                                                                   speeds up training time by adapting the weights of a pre-
Autoencoder Structure Evaluation                                   trained network on one task to a new task.
We hypothesize that the inclusion of design pattern labels            We make use of student-teacher or born again neural net-
into our autoencoder network will improve its overall repre-       works, a transfer learning approach in which the weights of
sentative quality. Further, that the use of an automatic label     a pre-trained neural network are copied into another network
classifier will allow us to gather sufficient training data to     of a different size. In this case we take the weights from our
train the autoencoder. This evaluation addresses both these        no labels autoencoder from the prior evaluation, copy them
hypotheses. We draw upon the same dataset and the same             into our full architecture, and train from there. We construct
three folds from the prior evaluation and create three varia-      two variations of this approach, once again depending on the
tions of our system. The first autoencoder variation has no        use of the random forest label classifier or not. We compare
design pattern labels and is trained on all 8 × 8 chunks of        both variations to the full and no labels system from the prior
level instead of only those chunks labeled or autolabeled          evaluation, using the same metric.
with a design pattern. Given that this means fewer features           We present the results of this evaluation in Table 4. We
and smaller input and output tensors, this model should out-       note that, while the best performing variation did not change
perform our full model unless the design pattern labels im-        from the prior variation, in all cases except for the Kini
                                       Le Doux      Del Rosario    Smith       Smith-Naive        Kini       Snyder
            No labels                 12.3±16.3      10.4±5.0     11.5±6.1      12.7±4.8        9.3±4.4     28.6±9.9
            Transfer No Auto Tag       11.1±5.8      10.1±5.0     11.2±6.1      12.6±4.8        9.3±4.4     16.7±9.8
            Transfer w/ Auto           10.8±5.8      9.8±5.0      11.0±6.0      11.8±6.3       10.3±4.2     16.1±9.6
            Full                       10.6±5.6      9.0±4.4      10.4±5.6      11.5±4.4       10.6±3.3     15.0±9.4

         Table 4: A table comparing the transfer learning approach structure errors to the full system structure errors.


                                                                   full system, including the random forest classifier, trains on
                                                                   these examples (and the other labels from Del Rosario), and
                                                                   is then given as input the eight by eight chunk with only the
                                                                   floating bar within it on the left of the image along with the
                                                                   desired label “completionist reward”. One can imagine that
                                                                   Del Rosario as a user wants to add a reward to this section,
                                                                   but doesn’t have any strong ideas. Given this input the sys-
                                                                   tem outputs the image on the right.
                                                                      We asked Del Rosario what they thought of the perfor-
                                                                   mance of the system and whether they considered this out-
                                                                   put matched their definition of completionist reward. They
                                                                   replied “Yes – I think? I would because I’m focusing on
                                                                   the position of the coins.” We note that Del Rosario did not
                                                                   see the most decisive patch when making this statement,
                                                                   which we extracted as in (Olah et al. 2018). This clearly
                                                                   demonstrates some harmony between the learned model and
                                                                   the design intent. However, Del Rosario went on to say “I
                                                                   think if I were to go... more strict with the definition/phrase,
                                                                   I’d think of some other configuration that would make you
                                                                   think, ‘oooooh, what a tricky design!!’ ”. This indicates a
                                                                   desire to further clarify the model. Thus, we imagine an iter-
                                                                   ative model is necessary for a tool utilizing this system and
Figure 3: Above: the two examples of the pattern “comple-          a user to reach a state of harmonious interaction.
tionist reward” labeled by the expert Dee Del Rosario. Be-
low: example of the system given the input on the left and
that label and its output.
                                                                                          Conclusions
                                                                   In this paper, we present an approach to explainable PCGML
                                                                   (XPCGML) through user-authored design pattern labels
models, the transfer approaches got closer to the full vari-       over existing level structure. We evaluate our autoencoder
ation approach, sometimes off by as little as a fraction of        and random forest labeler components on levels labeled by
one structure feature. Further, these approaches were signif-      game design experts. These labels serve as a shared language
icantly faster to train, with the no automatic labeling trans-     between the user and level design agent, which allows for
fer approach training in an average of 4.48 seconds and the        the possibility of explainability and meaningful collabora-
automatic labeler transfer approach training in an average         tive interaction. We intend to take our system and incorpo-
of 144.92 seconds, compared to the average of roughly five         rate it into a co-creative tool for novice and expert level de-
hours of the full approach on the same computer. This points       signers. To the best of our knowledge this represents the first
to a clear breakdown in when it makes sense to apply what          approach to explainable PCGML.
variation of our approach, depending on time requirements
and processing power. In addition, it continues to support                           Acknowledgements
our hypotheses concerning the use of automatic labeler and         We want to thank our five expert volunteers. This mate-
personal level design pattern labels.                              rial is based upon work supported by the National Science
                                                                   Foundation under Grant No. IIS-1525967. We would also
Qualitative Example                                                like to thank the organizers and attendees of Dagstuhl Sem-
We do not present a front-end or interaction paradigm for          inar 17471 on Artificial and Computational Intelligence in
the use of this Explainable PCGML system, as we feel such          Games: AI-Driven Game Design, where the discussion that
implementation details will depend upon the intended audi-         lead to this research began.
ence. However, it is illustrative to give an example of how
the system could be used. In Figure 3 we present an ex-                                    References
ample of the two training examples of the pattern “comple-         Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean,
tionist reward” labeled by the expert Dee Del Rosario. The         J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al.
2016. Tensorflow: A system for large-scale machine learn-       Liaw, A.; Wiener, M.; et al. 2002. Classification and regres-
ing. In OSDI, volume 16, 265–283.                               sion by randomforest. R news 2(3):18–22.
Baldwin, A.; Dahlskog, S.; Font, J. M.; and Holmberg, J.        Michalski, R. S.; Carbonell, J. G.; and Mitchell, T. M.
2017. Mixed-initiative procedural generation of dungeons        2013. Machine learning: An artificial intelligence approach.
using game design patterns. In Computational Intelligence       Springer Science & Business Media.
and Games (CIG), 2017 IEEE Conference on, 25–32. IEEE.          Milam, D., and El Nasr, M. S. 2010. Design patterns to
Biran, O., and Cotton, C. 2017. Explanation and justification   guide player movement in 3d games. In Proceedings of the
in machine learning: A survey. In IJCAI 2017 Workshop on        5th ACM SIGGRAPH Symposium on Video Games, 37–42.
Explainable AI (XAI).                                           ACM.
Bjork, S., and Holopainen, J. 2004. Patterns in game design.    Olah, C.; Satyanarayan, A.; Johnson, I.; Carter, S.; Schubert,
ISBN:1584503548.                                                L.; Ye, K.; and Mordvintsev, A. 2018. The building blocks
Bradski, G., and Kaehler, A. 2000. Opencv. Dr. Dobbs            of interpretability. Distill 3(3):e10.
journal of software tools 3.                                    Shaker, N.; Shaker, M.; and Togelius, J. 2013. Ropossum:
Codella, N. C.; Hind, M.; Ramamurthy, K. N.; Campbell,          An authoring tool for designing, optimizing and solving cut
M.; Dhurandhar, A.; Varshney, K. R.; Wei, D.; and Mo-           the rope levels. In Proceedings of the Ninth AAAI Confer-
jsilovic, A. 2018. Teaching meaningful explanations.            ence on Artificial Intelligence and Interactive Digital Enter-
arXiv:1805.11648.                                               tainment.
Dahlskog, S., and Togelius, J. 2012. Patterns and procedu-      Smith, G.; Whitehead, J.; and Mateas, M. 2011. Tanagra:
ral content generation: revisiting mario in world 1 level 1.    Reactive planning and constraint solving for mixed-initiative
In Proceedings of the First Workshop on Design Patterns in      level design. IEEE Transactions on Computational Intelli-
Games, 1. ACM.                                                  gence and AI in Games 3(3):201–215.
Deterding, C. S.; Hook, J. D.; Fiebrink, R.; Gow, J.; Akten,    Snodgrass, S., and Ontanón, S. 2017. Learning to generate
M.; Smith, G.; Liapis, A.; and Compton, K. 2017. Mixed-         video game maps using markov models. IEEE Transactions
initiative creative interfaces. In CHI EA’17: Proceedings       on Computational Intelligence and AI in Games 9(4):410–
of the 2016 CHI Conference Extended Abstracts on Human          422.
Factors in Computing Systems. ACM.                              Summerville, A., and Mateas, M. 2016. Super mario as
Ehsan, U.; Harrison, B.; Chan, L.; and Riedl, M. O.             a string: Platformer level generation via lstms. In The 1st
2017.      Rationalization: A neural machine translation        International Conference of DiGRA and FDG.
approach to generating natural language explanations.           Summerville, A.; Guzdial, M.; Mateas, M.; and Riedl, M. O.
arXiv:1702.07826.                                               2016a. Learning player tailored content from observation:
Guzdial, M., and Riedl, M. 2016. Game level generation          Platformer level generation from video traces using lstms.
from gameplay videos. In Twelfth Artificial Intelligence and    In Twelfth Artificial Intelligence and Interactive Digital En-
Interactive Digital Entertainment Conference.                   tertainment Conference.
Hullett, K., and Whitehead, J. 2010. Design patterns in fps     Summerville, A. J.; Snodgrass, S.; Mateas, M.; and On-
levels. In FDG’10 Proceedings of the Fifth International        tanón, S. 2016b. The vglc: The video game level corpus. In
Conference on the Foundations of Digital Games. ACM.            Procedural Content Generation Workshop at DiGRA/FDG.
Jain, R.; Isaksen, A.; Holmgård, C.; and Togelius, J. 2016.    Summerville, A.; Snodgrass, S.; Guzdial, M.; Holmgård, C.;
Autoencoders for level generation, repair, and recognition.     Hoover, A. K.; Isaksen, A.; Nealen, A.; and Togelius, J.
In Proceedings of the ICCC Workshop on Computational            2017. Procedural content generation via machine learning
Creativity and Games.                                           (pcgml). arXiv preprint arXiv:1702.00539.
Jones, E.; Oliphant, T.; and Peterson, P. 2014. {SciPy}:        Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; and Wojna,
open source scientific tools for {Python}.                      Z. 2016. Rethinking the inception architecture for computer
                                                                vision. In Proceedings of the IEEE Conference on Computer
Kingma, D. P., and Welling, M. 2013. Auto-encoding varia-       Vision and Pattern Recognition, 2818–2826.
tional bayes. In The 2nd International Conference on Learn-
                                                                Yannakakis, G. N.; Liapis, A.; and Alexopoulos, C. 2014.
ing Representations (ICLR).
                                                                Mixed-initiative co-creativity. In Proceedings of the 9th
Lang, K. J. 1988. A time-delay neural network architecture      Con- ference on the Foundations of Digital Games. FDG.
for speech recognition. Technical Report CMU-CS-88-152.
LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard,
R. E.; Hubbard, W.; and Jackel, L. D. 1989. Backpropaga-
tion applied to handwritten zip code recognition. Neural
computation 1(4):541–551.
Liapis, A.; Yannakakis, G. N.; and Togelius, J. 2013. Sen-
tient sketchbook: Computer-aided game level authoring. In
Proceedings of ACM Conference on Foundations of Digital
Games, 213–220. FDG.