=Paper= {{Paper |id=Vol-2282/EXAG_107 |storemode=property |title=Explainable PCGML via Game Design Patterns |pdfUrl=https://ceur-ws.org/Vol-2282/EXAG_107.pdf |volume=Vol-2282 |authors=Matthew Guzdial,Joshua Reno,Jonathan Chen,Gillian Smith,Mark Riedl |dblpUrl=https://dblp.org/rec/conf/aiide/GuzdialRCSR18 }} ==Explainable PCGML via Game Design Patterns== https://ceur-ws.org/Vol-2282/EXAG_107.pdf

Explainable PCGML via Game Design Patterns

Matthew Guzdial1 , Joshua Reno1 , Jonathan Chen1 , Gillian Smith2 , and Mark Riedl1
Georgia Institute of Technology1
Worcester Polytechnic Institute2
{mguzdial3, jreno, jonathanchen}@gatech.edu, gmsmith@wpi.edu, riedl@cc.gatech.edu

Abstract Design patterns (Bjork and Holopainen 2004) represent
one popular way to represent game design knowledge. A de-
Procedural content generation via Machine Learning
sign pattern is a category of game structure that serves a gen-
(PCGML) is the umbrella term for approaches that generate
content for games via machine learning. One of the benefits eral design purpose across similar games. Researchers tend
of PCGML is that, unlike search or grammar-based PCG, it to derive design patterns via subjective application of design
does not require hand authoring of initial content or rules. expertise (Hullett and Whitehead 2010), which makes it dif-
Instead, PCGML relies on existing content and black box ficult to broadly apply one set of patterns across different
models, which can be difficult to tune or tweak without designers and games. The same subjective limitation also
expert knowledge. This is especially problematic when a means that an individual set of design patterns can serve to
human designer needs to understand how to manipulate clarify what elements of a game matter to an individual de-
their data or models to achieve desired results. We present signer. Given a set of design patterns specialized to a partic-
an approach to Explainable PCGML via Design Patterns in ular designer one could leverage these design patterns in an
which the design patterns act as a vocabulary and mode of
Explainable PCGML system to help a designer understand
interaction between user and model. We demonstrate that
our technique outperforms non-explainable versions of our and tweak a model to their needs. We note our usage of the
system in interactions with five expert designers, four of term pattern differs from the literature. Typically, a design
whom lack any machine learning expertise. pattern generalizes across designers, whereas we apply it to
indicate the unique structures across a game important to an
individual designer.
Introduction We present an underlying system for a potential co-
Procedural Content Generation (PCG), represents a field creative PCGML tool, intended for designers without ML
of research into, and a set of techniques for, generating expertise. This system takes user-defined design patterns for
game content algorithmically. PCG historically requires a a target level, and outputs a PCGML model. The design pat-
significant amount of human-authored knowledge to gener- terns provided by designers and generated by our system can
ate content, such as rules, heuristics, and individual compo- be understood as labels on level structure, which allow our
nents, creating a time and design expertise burden. Procedu- PCGML model to better represent and reflect the design val-
ral Content Generation via Machine Learning (PCGML) at- ues of an individual designer. This system has two major
tempts to solve these issues by applying machine learning to components: (1) a classification system that learns to clas-
extract this design knowledge from existing corpora of game sify level structures with the user-specified design pattern
content (Summerville et al. 2017). However, this approach labels. This system ensures a user does not have to label all
has its own weaknesses; Applied naively, these models re- existing content to train (2) a level generation system that in-
quire machine learning literacy to understand and debug. corporates the user’s level design patterns, and can use these
Machine learning literacy is uncommon, especially among patterns as a vocabulary with which to interact with the user.
those designers who might most benefit from PCGML. For example, generating labels on level structure to represent
Explainable AI represents a field of research into opening the model’s interpretation of that structure to the user.
up black box Artificial Intelligence and Machine Learning The rest of this paper is organized as follows. First, we re-
models to users (Biran and Cotton 2017). The promise of ex- late our work to prior, related work. Second, we describe our
plainable AI is not just that it will help users understand such Explainable PCGML (XPCGML) system in terms of the two
models, but also tweak these models to their needs (Olah et major components. Third, we discuss the three evaluations
al. 2018). If we could include some representation of an in- we ran with five expert designers. We end with a discussion
dividual game designer’s knowledge into a model, we could of the systems limitations, future work, and conclusions. Our
help designers without ML expertise better understand and major contributions are the first application of explainable
alter these models to their needs. AI to PCGML, the use of a random forest classifier to min-
imize user effort, and the results of our evaluations. Our re-
sults demonstrate both the promise of these pattern labels
dropout
conv: conv: deconv: deconv:
64x5x5 64x5x5 encoding: 64x5x5 64x5x5
“intro” … … “intro”
512
input: 8x8x30+n relu relu output: 8x8x30+n

Figure 1: Network architecture for the Autoencoder.

in improving user interaction and a positive impact on the ery sample be hand-labeled with an explanation and treated
underlying model’s performance. explanations from different authors as equivalent. Ehsan et
al. (2017) made use of explainable AI for explainable agent
Related Work behavior for automated game playing. Their approach relies
on rationalization, which relies on a second machine learn-
There exist many prior approaches to co-creative or mixed- ing interpretation of the original behavior, rather than visual-
initiative design agents and editors (Yannakakis, Liapis, and izing or explaining the original model as our approach does.
Alexopoulos 2014; Deterding et al. 2017). However, the Design patterns represent a well-researched approach to
majority of existing approaches have relied upon search game design (Bjork and Holopainen 2004). In theory, game
or grammar-based approaches instead of machine learning, design patterns describe general solutions to game design
making it difficult to adapt to the needs of a particular de- problems that occur across many different games. Game De-
signer over time (Liapis, Yannakakis, and Togelius 2013; sign patterns have been used as heuristics in evolutionary
Shaker, Shaker, and Togelius 2013; Baldwin et al. 2017). A PCG systems including in the domain of Super Mario Bros.
final version of our system would focus on machine learn- (Dahlskog and Togelius 2012). Researchers tend to derive
ing, adapting to the user, and explaining and visualizing its game design patterns through either rigorous, cross-domain
inner model/process. analysis (Milam and El Nasr 2010) or based upon their sub-
Procedural content generation via Machine Learning jective interpretation of game structure. We embrace this
(Summerville et al. 2017) is a relatively new field, focused subjectivity in our work by having designers create a lan-
on generating content through machine learning methods. guage of game design patterns unique to themselves with
The majority of PCGML approaches represent black box which to interact with a PCGML system.
methods, without any prior approach focused on explainabil-
ity or co-creativity. We note some discussion in the Sum-
merville et al. survey paper on potential collaborative ap-
System Overview
proaches. Summerville (2016a) explored adapting levels to The approach presented in this paper builds an Explainable
players, but no work to our knowledge looks at adapting PCGML model based on existing level structure and an ex-
models to individual designers. pert labeling design patterns upon that structure. We chose
Super Mario Bros. (SMB) represents a common area Super Mario Bros. as a domain given its familiarity to the
of research into PCGML (Dahlskog and Togelius 2012; game designers who took part in our evaluation. The gen-
Summerville and Mateas 2016; Jain et al. 2016; Snodgrass eral process for building a final model is as follows: First,
and Ontanón 2017). Beyond explainability, our approach users label existing game levels with the game design pat-
differs from prior SMB PCGML approaches in terms of rep- terns they want to use for communicating with the system.
resentation quality and the size of generated content. We For example, one might label both areas with large amounts
focus on the generation of individual level sections instead of enemies and areas that require precise jumps as “chal-
of entire levels in order to better afford collaborative level lenges”. The exact label can be anything as long as it is used
building (Smith, Whitehead, and Mateas 2011). Second, consistently. Given this initial user labeling of level struc-
prior approaches have abstracted away the possible level ture, we train a random forest classifier to classify additional
components into higher order groups. For example, treating level structure according to the labeled level chunks (Liaw,
all enemy types as equivalent and ignoring decorative ele- Wiener, and others 2002), which we then use to label all
ments. We make use of a rich representation of all possible available levels with the user design pattern labels. Given
level components and an ordering that allows our approach this now larger training set of both level structure and labels,
to place decorative elements appropriately. we train a convolutional neural network-based autoencoder
Explainable AI represents an emerging field of research on both levels structure and its associated labels (Lang 1988;
(Biran and Cotton 2017), focused on translating or rational- LeCun et al. 1989), which can then be used to generate new
izing the behavior of black box models. To the best of our level structure and label its generated content with these de-
knowledge, this has not been previously applied to PCGML. sign pattern labels (Jain et al. 2016).
Codella et al. (2018) demonstrated how explanations could We make use of Super Mario Bros. as our domain, and, in
improve model accuracy on three tasks, but required that ev- particular, we utilize those Super Mario Bros levels present
in the Video Game Level Corpus (Summerville et al. 2016b). cabulary, and labeling content is much easier than designing
We do not include underwater or boss/castle Super Mario new content. Further, we note that when two humans collab-
Bros. levels. We made this choice as we perceived these two orate they must negotiate a shared vocabulary.
level types to be significantly different from all other level
types. Further, while we make use of the VGLC levels, we Generator
do not make use of any of the VGLC Super Mario Bros. The existing level generation system is based on an autoen-
representations, which abstract away level components into coder, and we visualize its architecture in Figure 1. The input
higher order groups. Instead, we draw on the image pars- comes in the form of a chunk of level content and the associ-
ing approach introduced in (Guzdial and Riedl 2016), using ated design patterns label, such as “intro” in the figure. This
a spritesheet and OpenCV (Bradski and Kaehler 2000) to chunk is represented as an eight by eight by thirty input ten-
parse images of each level for a richer representation. sor plus a tensor of size n where n indicates the total number
In total we identified thirty unique classes of level compo- of design pattern labels given by the user. This last vector of
nents, and make use of a matrix representation for each level size n is a one-hot encoded vector of level design pattern
section of size 8 × 8 × 30. The first two dimensions deter- labels.
mine the tiles in the x and y axes, while the last dimension After input, the level structure and design pattern label
represents a one-hot vector of length 30 expressing compo- vector are separated. The level structure passes through a
nent class. This vector is all 0’s for any empty tile of a Super two layer convolutional neural network (CNN). We note that
Mario Bros. level, and otherwise has 1’s at the index asso- we placed a dropout layer in between the two CNN layers to
ciated with that particular level component. Thus, we can allow better generalization. After the CNN layers the output
represent all level components, including background deco- of this section and the design patterns vector recombine and
ration. We note that we treat palette swaps of the same com- pass through a fully connected layer with relu activation to
ponent as equivalent in class. an embedded vector of size 512. We note that, while large,
We make use of the SciPy random forest classifier (Jones, this is much smaller than the 1920+n features of the input
Oliphant, and Peterson 2014) and tensorflow for the autoen- layer. The decoder section is an inverse of the encoder sec-
coder (Abadi et al. 2016). tion of the architecture, starting with a relu fully connected
layer, then deconvolutional neural network layers with up-
Design Pattern Label Classifier sampling handling the level structure. We implemented this
Our goal for the design pattern label classifier is to minimize model with an adam optimizer and mean square loss. Note
the amount of work and time costs for a potential user of the that for the purposes of evaluation this is a standard autoen-
system. Users have to label level structure with the design coder. We intend to make use of a variational autoencoder in
patterns they would like to use, but the label classifier en- future work (Kingma and Welling 2013).
sures they do not have to hand-label all available levels. The
classifier for this task must be able to perform given access Evaluation
to whatever small amount of training data a designer is will- Our system has two major parts: (1) a random forest clas-
ing to label for it, along with being able to easily update its sifier that attempts to label additional content with user-
model given potential feedback from a user. We anticipate provided design patterns to learn the designer’s vocabulary
the exact amount of training data the system has access to and (2) an autoencoder over level structure and associated
will differ widely between users, but we do not wish to over- patterns for generation. In this section we present three eval-
burden authors with long data labeling tasks. Random forest uations of our system. The first addresses the random forest
classifiers are known to perform reasonably under these con- classifier of labels, the second the entirety of the system, and
straints (Michalski, Carbonell, and Mitchell 2013). the third addresses the limiting factor of time in human com-
The random forest model takes in an eight by eight level puter interactions. For all three evaluations we make use of
section and returns a level design pattern (either a user- a dataset of levels from Super Mario Bros. labeled by five
defined design pattern or none). We train the random forest expert designers.
model based on the design pattern labels submitted by the
user. We use a forest of size 100 with a maximum depth of Dataset Collection
100 in order to encourage generality.
We reached out to ten design experts to label three or more
In the case of an interactive, iterative system the random
Super Mario Bros. levels of their choice to serve as a dataset
forest can be easily retrained. In the case where the random
for this evaluation. We do not include prior, published aca-
forest classifier correctly classifies any new design pattern
demic patterns of Super Mario Bros. levels (e.g. (Dahlskog
there is no need for retraining. Otherwise, we can delete a
and Togelius 2012)) as these patterns were designed for gen-
subset of the trees of the random forest that incorrectly clas-
eral automated design instead of explainable co-creation.
sified the design pattern, and retrain an appropriate number
Our goals for choosing these ten designers were to get as
of trees to return to the maximum forest size on the existing
diverse a pool of labels as possible. Of these ten, five re-
labels and any additional new information.
sponded and took part in this study.
Even with the design pattern level classifier this system re-
quires the somewhat unusual step of labeling existing level • Adam Le Doux: Le Doux is a game developer and de-
structure with design patterns a user finds important. How- signer best known for his Bitsy game engine. He is cur-
ever, this is a necessary step for the benefit of a shared vo- rently a Narrative Tool Developer at Bungie.
set labels total top three
Le 47 259 platform (29), jump (22),
Doux pipe-flower (22)
Del 26 38 concept introduction (3), ob-
Rosario jective (3), completionist re-
ward (2)
Smith 21 95 enemy pair (17), staircase
(11), ditch (9)
Smith- 25 118 multi level (18), enemy pair
Naive (17), staircase (13)
Kini 23 28 hazard introduction (3), hid-
den powerup (2), pipe trap (2)
Snyder 34 155 walking enemy (18), power Figure 2: The first example of the the top label in each set of
up block (17), coins (13) design pattern labels.

Table 1: A table comparing the characteristics of our six sets set approach train test
of design pattern labels. Le RF 84.06±0.76 28.73±1.89
Doux
Le CNN 49.06±4.43 26.73±3.70
• Dee Del Rosario: Del Rosario is an events and commu- Doux
nity organizer in games with organizations such as Differ- Del RF 86.71±0.78 0.77±1.07
ent Games Collective and Seattle Indies, along with being Rosario
a gamedev hobbyist. They currently work as a web devel- Del CNN 36.08±19.01 1.04±0.82
oper and educator. Rosario
• Kartik Kini: Kini is an indie game developer through his Smith RF 92.11±0.89 33.28±3.48
studio Finite Reflection, and an associate producer at Car- Smith CNN 59.76±8.16 26.19±5.02
toon Network Games. Smith- RF 88.49±0.92 42.31±2.96
• Gillian Smith: Smith is an Assistant Professor at WPI. Naive
She focuses on game design, AI, craft, and generative de- Smith- CNN 65.50±19.68 36.82±11.84
sign. Naive
Kini RF 90.19±0.69 41.93±2.29
• Kelly Snyder: Snyder is an Art Producer at Bethesda and Kini CNN 44.32±2.89 29.07±2.13
previously a Technical Producer at Bungie.
Snyder RF 86.72±1.62 1.85±5.56
All five of these experts were asked to label their choice Snyder CNN 49.40±0.77 0.00±0.00
of three levels with labels that established “a common lan-
guage/vocabulary that you’d use if you were designing lev- Table 2: A table comparing the accuracy of our random for-
els like this with another human”. Of these experts only est label classifier compared to a CNN baseline.
Smith had any knowledge of the underlying system. She
produced two sets of design patterns for the levels she la-
beled, one including only those patterns she felt the sys-
pert produced very distinct labels, with less than one percent
tem could understand and the second including all patterns
of labels shared between different experts. We include the
that matched the above criteria. We refer to these labels as
first example for the top label for each set of design patterns
Smith and Smith-Naive through the rest of this section, re-
in Figure 2. Even in the case of Kini and Del Rosario, where
spectively.
there is a similar area and design pattern label, the focus dif-
These experts labeled static images of non-boss and non-
fers. We train six separate models, one for each set of design
underwater Super Mario Bros. levels present in the Video
pattern labels (Smith has two).
Game Level Corpus (VGLC) (Summerville et al. 2016b).
The experts labeled these images by drawing a rectangle
over the level structure in which the design pattern occurred Label Classifier Evaluation
with some string to define the pattern. These rectangles In this section we seek to understand how well our random
could be of arbitrary size, but we translated each into ei- forest classifier is able to identify design patterns in level
ther a single training example centered on the eight by eight structure. For the purposes of this evaluation we made use
chunk our system requires, or multiple training examples if of AlexNet as a baseline (Szegedy et al. 2016), given that
it was larger than eight by eight. a convolutional neural network would be the naive way one
We include some summarizing information about these might anticipate solving this problem. We chose AlexNet
six sets of design pattern labels in Table 1. Specifically, we given its popularity and success at similar image recognition
include the total number of labels and the top three labels, tasks. In all instances we trained the AlexNet until its error
sorted by frequency and alphabetically, of each set. Each ex- converged. We make use of a three-fold cross validation on
set No labels No Auto Tag Full prove overall representative quality. The second autoencoder
Le 12.3±6.3 152.8±4.8 10.6±5.6 variation does not make use of the automatic design pattern
Doux label classifier, thus greatly reducing the training data. The
Del 10.4±5.0 135.7±3.8 9.0±4.4 last variation is simply our full system. For all approaches
Rosario we trained till training error converged. We note that we
Smith 11.5±6.1 157.2±3.5 10.4±5.6 trained a single ’no labels’ variation and tested it on each
Smith- 12.7±4.8 167.6±4.0 11.5±4.4 expert, but trained models for the no automatic classifier and
Naive full versions of our approach for each expert.
Kini 9.4±4.4 129.6±3.6 10.6±3.3 Given these three variations, we chose to measure the dif-
Snyder 28.6±9.9 144.2±5.0 15.0±9.4 ference in structure when the autoencoder was fed the test
portions of each of the three folds. Specifically we capture
Table 3: A table comparing the error in terms of incorrect the number of incorrect structure features predicted. This
sprites for our three generators. Smaller values represent can be understood as a stand in for representation quality,
fewer mistakes. given that the output of the autoencoder for the test sample
will be the closest thing the autoencoder can represent to the
test sample.
the labels for this and the remaining evaluations. We make We give the average number and standard deviation of in-
use of a three-fold validation to address the variance across correct structural features/tiles over all three folds in Table
even a single expert’s labels and due to the small set of labels 2. We note that the minimum value here would be 0 errors
available for some experts. and the maximum value would be 8 × 8 × 30 or 1920 in-
Our major focus is training and test accuracy across the correct structural feature values. For every expert except for
folds. We summarize the results of this evaluation in Ta- Kini, who authored the smallest number of labels, our full
ble 2, giving the average training and test accuracies across system outperformed both variations. While some of these
all folds along with the standard deviation. We note that in numbers are fairly close between the full and no labels vari-
all instances our random forest (RF) approach outperformed ation, the values in bold were significantly lower according
AlexNet CNN in terms of training accuracy, and nearly al- to the paired Wilcoxon Mann Whitney U test (p < 0.001).
ways in terms of test accuracy. We note that given more Given the results in Table 3. We argue that both our hy-
training time AlexNet’s training accuracy might improve, potheses were shown to be correct, granted that the expert
but at the cost of test accuracy. We further note that AlexNet gives sufficient labels, with the cut-off appearing to be be-
was on average one and a half times slower than the random tween Kini’s 28 and Del Rosario’s 38. Specifically the rep-
forest in terms of training time. These results indicate that resentation quality is improved when labels are used, and the
our random forest produces a more general classifier com- label classifier improves performance over not applying the
pared to AlexNet. label classifier.
We note that our random forest performed fairly consis-
tently in terms of training accuracy, at around 85%, but that Transfer Evaluation
the test accuracy varied significantly. Notably, the test ac- A major concern for any co-creative tool based on Machine
curacy did not vary according to the the number of train- Learning is training time. In the prior autoencoder evalua-
ing samples or number of labels per expert. This indicates tion, both the no labels and full versions of our system took
that individual experts identify patterns that are more or less hours to train to convergence. This represents a major weak-
easy to classify automatically. Further we note that Snyder ness, given that in some co-creative contexts designers may
and Del Rosario had very low testing error across the board, not want to wait for an offline training process, especially
which indicates a large amount of variance between tagged when we anticipate authors wanting to rapidly update their
examples. Despite this, we demonstrate the utility of this ap- set of labels. Given these concerns, we evaluate a variation
proach in the next section. of our approach utilizing transfer learning. This drastically
speeds up training time by adapting the weights of a pre-
Autoencoder Structure Evaluation trained network on one task to a new task.
We hypothesize that the inclusion of design pattern labels We make use of student-teacher or born again neural net-
into our autoencoder network will improve its overall repre- works, a transfer learning approach in which the weights of
sentative quality. Further, that the use of an automatic label a pre-trained neural network are copied into another network
classifier will allow us to gather sufficient training data to of a different size. In this case we take the weights from our
train the autoencoder. This evaluation addresses both these no labels autoencoder from the prior evaluation, copy them
hypotheses. We draw upon the same dataset and the same into our full architecture, and train from there. We construct
three folds from the prior evaluation and create three varia- two variations of this approach, once again depending on the
tions of our system. The first autoencoder variation has no use of the random forest label classifier or not. We compare
design pattern labels and is trained on all 8 × 8 chunks of both variations to the full and no labels system from the prior
level instead of only those chunks labeled or autolabeled evaluation, using the same metric.
with a design pattern. Given that this means fewer features We present the results of this evaluation in Table 4. We
and smaller input and output tensors, this model should out- note that, while the best performing variation did not change
perform our full model unless the design pattern labels im- from the prior variation, in all cases except for the Kini
Le Doux Del Rosario Smith Smith-Naive Kini Snyder
No labels 12.3±16.3 10.4±5.0 11.5±6.1 12.7±4.8 9.3±4.4 28.6±9.9
Transfer No Auto Tag 11.1±5.8 10.1±5.0 11.2±6.1 12.6±4.8 9.3±4.4 16.7±9.8
Transfer w/ Auto 10.8±5.8 9.8±5.0 11.0±6.0 11.8±6.3 10.3±4.2 16.1±9.6
Full 10.6±5.6 9.0±4.4 10.4±5.6 11.5±4.4 10.6±3.3 15.0±9.4

Table 4: A table comparing the transfer learning approach structure errors to the full system structure errors.

full system, including the random forest classifier, trains on
these examples (and the other labels from Del Rosario), and
is then given as input the eight by eight chunk with only the
floating bar within it on the left of the image along with the
desired label “completionist reward”. One can imagine that
Del Rosario as a user wants to add a reward to this section,
but doesn’t have any strong ideas. Given this input the sys-
tem outputs the image on the right.
We asked Del Rosario what they thought of the perfor-
mance of the system and whether they considered this out-
put matched their definition of completionist reward. They
replied “Yes – I think? I would because I’m focusing on
the position of the coins.” We note that Del Rosario did not
see the most decisive patch when making this statement,
which we extracted as in (Olah et al. 2018). This clearly
demonstrates some harmony between the learned model and
the design intent. However, Del Rosario went on to say “I
think if I were to go... more strict with the definition/phrase,
I’d think of some other configuration that would make you
think, ‘oooooh, what a tricky design!!’ ”. This indicates a
desire to further clarify the model. Thus, we imagine an iter-
ative model is necessary for a tool utilizing this system and
Figure 3: Above: the two examples of the pattern “comple- a user to reach a state of harmonious interaction.
tionist reward” labeled by the expert Dee Del Rosario. Be-
low: example of the system given the input on the left and
that label and its output.
Conclusions
In this paper, we present an approach to explainable PCGML
(XPCGML) through user-authored design pattern labels
models, the transfer approaches got closer to the full vari- over existing level structure. We evaluate our autoencoder
ation approach, sometimes off by as little as a fraction of and random forest labeler components on levels labeled by
one structure feature. Further, these approaches were signif- game design experts. These labels serve as a shared language
icantly faster to train, with the no automatic labeling trans- between the user and level design agent, which allows for
fer approach training in an average of 4.48 seconds and the the possibility of explainability and meaningful collabora-
automatic labeler transfer approach training in an average tive interaction. We intend to take our system and incorpo-
of 144.92 seconds, compared to the average of roughly five rate it into a co-creative tool for novice and expert level de-
hours of the full approach on the same computer. This points signers. To the best of our knowledge this represents the first
to a clear breakdown in when it makes sense to apply what approach to explainable PCGML.
variation of our approach, depending on time requirements
and processing power. In addition, it continues to support Acknowledgements
our hypotheses concerning the use of automatic labeler and We want to thank our five expert volunteers. This mate-
personal level design pattern labels. rial is based upon work supported by the National Science
Foundation under Grant No. IIS-1525967. We would also
Qualitative Example like to thank the organizers and attendees of Dagstuhl Sem-
We do not present a front-end or interaction paradigm for inar 17471 on Artificial and Computational Intelligence in
the use of this Explainable PCGML system, as we feel such Games: AI-Driven Game Design, where the discussion that
implementation details will depend upon the intended audi- lead to this research began.
ence. However, it is illustrative to give an example of how
the system could be used. In Figure 3 we present an ex- References
ample of the two training examples of the pattern “comple- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean,
tionist reward” labeled by the expert Dee Del Rosario. The J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al.
2016. Tensorflow: A system for large-scale machine learn- Liaw, A.; Wiener, M.; et al. 2002. Classification and regres-
ing. In OSDI, volume 16, 265–283. sion by randomforest. R news 2(3):18–22.
Baldwin, A.; Dahlskog, S.; Font, J. M.; and Holmberg, J. Michalski, R. S.; Carbonell, J. G.; and Mitchell, T. M.
2017. Mixed-initiative procedural generation of dungeons 2013. Machine learning: An artificial intelligence approach.
using game design patterns. In Computational Intelligence Springer Science & Business Media.
and Games (CIG), 2017 IEEE Conference on, 25–32. IEEE. Milam, D., and El Nasr, M. S. 2010. Design patterns to
Biran, O., and Cotton, C. 2017. Explanation and justification guide player movement in 3d games. In Proceedings of the
in machine learning: A survey. In IJCAI 2017 Workshop on 5th ACM SIGGRAPH Symposium on Video Games, 37–42.
Explainable AI (XAI). ACM.
Bjork, S., and Holopainen, J. 2004. Patterns in game design. Olah, C.; Satyanarayan, A.; Johnson, I.; Carter, S.; Schubert,
ISBN:1584503548. L.; Ye, K.; and Mordvintsev, A. 2018. The building blocks
Bradski, G., and Kaehler, A. 2000. Opencv. Dr. Dobbs of interpretability. Distill 3(3):e10.
journal of software tools 3. Shaker, N.; Shaker, M.; and Togelius, J. 2013. Ropossum:
Codella, N. C.; Hind, M.; Ramamurthy, K. N.; Campbell, An authoring tool for designing, optimizing and solving cut
M.; Dhurandhar, A.; Varshney, K. R.; Wei, D.; and Mo- the rope levels. In Proceedings of the Ninth AAAI Confer-
jsilovic, A. 2018. Teaching meaningful explanations. ence on Artificial Intelligence and Interactive Digital Enter-
arXiv:1805.11648. tainment.
Dahlskog, S., and Togelius, J. 2012. Patterns and procedu- Smith, G.; Whitehead, J.; and Mateas, M. 2011. Tanagra:
ral content generation: revisiting mario in world 1 level 1. Reactive planning and constraint solving for mixed-initiative
In Proceedings of the First Workshop on Design Patterns in level design. IEEE Transactions on Computational Intelli-
Games, 1. ACM. gence and AI in Games 3(3):201–215.
Deterding, C. S.; Hook, J. D.; Fiebrink, R.; Gow, J.; Akten, Snodgrass, S., and Ontanón, S. 2017. Learning to generate
M.; Smith, G.; Liapis, A.; and Compton, K. 2017. Mixed- video game maps using markov models. IEEE Transactions
initiative creative interfaces. In CHI EA’17: Proceedings on Computational Intelligence and AI in Games 9(4):410–
of the 2016 CHI Conference Extended Abstracts on Human 422.
Factors in Computing Systems. ACM. Summerville, A., and Mateas, M. 2016. Super mario as
Ehsan, U.; Harrison, B.; Chan, L.; and Riedl, M. O. a string: Platformer level generation via lstms. In The 1st
2017. Rationalization: A neural machine translation International Conference of DiGRA and FDG.
approach to generating natural language explanations. Summerville, A.; Guzdial, M.; Mateas, M.; and Riedl, M. O.
arXiv:1702.07826. 2016a. Learning player tailored content from observation:
Guzdial, M., and Riedl, M. 2016. Game level generation Platformer level generation from video traces using lstms.
from gameplay videos. In Twelfth Artificial Intelligence and In Twelfth Artificial Intelligence and Interactive Digital En-
Interactive Digital Entertainment Conference. tertainment Conference.
Hullett, K., and Whitehead, J. 2010. Design patterns in fps Summerville, A. J.; Snodgrass, S.; Mateas, M.; and On-
levels. In FDG’10 Proceedings of the Fifth International tanón, S. 2016b. The vglc: The video game level corpus. In
Conference on the Foundations of Digital Games. ACM. Procedural Content Generation Workshop at DiGRA/FDG.
Jain, R.; Isaksen, A.; Holmgård, C.; and Togelius, J. 2016. Summerville, A.; Snodgrass, S.; Guzdial, M.; Holmgård, C.;
Autoencoders for level generation, repair, and recognition. Hoover, A. K.; Isaksen, A.; Nealen, A.; and Togelius, J.
In Proceedings of the ICCC Workshop on Computational 2017. Procedural content generation via machine learning
Creativity and Games. (pcgml). arXiv preprint arXiv:1702.00539.
Jones, E.; Oliphant, T.; and Peterson, P. 2014. {SciPy}: Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; and Wojna,
open source scientific tools for {Python}. Z. 2016. Rethinking the inception architecture for computer
vision. In Proceedings of the IEEE Conference on Computer
Kingma, D. P., and Welling, M. 2013. Auto-encoding varia- Vision and Pattern Recognition, 2818–2826.
tional bayes. In The 2nd International Conference on Learn-
Yannakakis, G. N.; Liapis, A.; and Alexopoulos, C. 2014.
ing Representations (ICLR).
Mixed-initiative co-creativity. In Proceedings of the 9th
Lang, K. J. 1988. A time-delay neural network architecture Con- ference on the Foundations of Digital Games. FDG.
for speech recognition. Technical Report CMU-CS-88-152.
LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard,
R. E.; Hubbard, W.; and Jackel, L. D. 1989. Backpropaga-
tion applied to handwritten zip code recognition. Neural
computation 1(4):541–551.
Liapis, A.; Yannakakis, G. N.; and Togelius, J. 2013. Sen-
tient sketchbook: Computer-aided game level authoring. In
Proceedings of ACM Conference on Foundations of Digital
Games, 213–220. FDG.