Co-Creative Level Design via Machine Learning Matthew Guzdial, Nicholas Liao, and Mark Riedl College of Computing Georgia Institute of Technology Atlanta, GA 30332 mguzdial3@gatech.edu, nliao7@gatech.edu, riedl@cc.gatech.edu Abstract to demonstrate the following points: (1) existing meth- ods are insufficient for co-creative level design, and (2) Procedural Level Generation via Machine Learning co-creative PLGML requires training on examples of co- (PLGML), the study of generating game levels with machine learning, has received a large amount of recent academic creative PLGML or an approximation. In support of this ar- attention. For certain measures these approaches have shown gument we present results from a user study in which users success at replicating the quality of existing game levels. interacted with existing PLGML approaches adapted to co- However, it is unclear the extent to which they might benefit creation and quantitative experiments comparing these ex- human designers. In this paper we present a framework for isting approaches to approaches designed for co-creation. co-creative level design with a PLGML agent. In support of this framework we present results from a user study and Related Work results from a comparative study of PLGML approaches. The concept of co-creative PCGML has been previously dis- cussed in the literature (Summerville et al. 2017; Zhu et al. Introduction 2018), but no prior approaches or systems exist. Compar- Procedural content generation via Machine Learning atively there exist many prior approaches to co-creative or (PCGML) has drawn increasing academic interest in re- mixed-initiative level design agents without machine learn- cent years (Summerville et al. 2017). In PCGML a machine ing (Smith, Whitehead, and Mateas 2010; Yannakakis, Li- learning model trains on some existing corpus of game con- apis, and Alexopoulos 2014; Deterding et al. 2017). Instead tent to learn a distribution over possible game content. New these approaches rely upon search or grammar-based ap- content can then be sampled from this distribution. This ap- proaches (Liapis, Yannakakis, and Togelius 2013; Shaker, proach has shown some success at replicating existing game Shaker, and Togelius 2013; Baldwin et al. 2017). Thus these content, particularly of game levels, according to user stud- approaches require significant developer effort to adapt to a ies (Guzdial and Riedl 2016) and quantitative metrics (Snod- novel game. grass and Ontanón 2017; Summerville 2018). The practical application of PCGML approaches has not yet been investi- User Study gated. One might naively suggest that PCGML could serve As an initial exploration into co-creative level design via as a cost-cutting measure given its ability to generate new machine learning we conducted a user study. We began content that matches existing content. However, this requires by taking existing procedural level generation via machine a large corpus of existing game content. If designers for a learning (PLGML) approaches and adapting them to co- new game produced such a corpus, they might as well use creation. We call these adapted approaches AI level design that corpus for the final game. Beyond this issue, a learned partners. Our intention with these partners is to determine distribution is not guaranteed to contain a designer’s desired the strengths and weaknesses of these existing approaches output. when applied to co-creation and the extent to which these A co-creative framework could act as an alternative to existing approaches are sufficient for this task. asking designers to find desired output from a learned dis- We make use of Super Mario Bros. as the domain for this tribution . In a co-creative framework, also called mixed ini- study and later experiments given that all three of the exist- tiative, a human and AI partner work together to produce ing PLGML approaches had previously been applied to this final content. In this way, it does not matter if an AI partner domain. Further, we anticipated its popularity would lead to is incapable of creating some desired output alone. more familiarity from our study participants. In this paper we propose an approach to co-creative PCGML for level design or Procedural Level Generation Level Design Editor via Machine Learning (PLGML). In particular, we intend To run our user study we needed some Level Design Editor to serve as an interface between participants and the AI level design partners. For this purpose we made use of the editor Figure 1: Screenshot of the Level Editor, reproduced from (Guzdial et al. 2017). from (Guzdial et al. 2017), which is publicly available on- • Markov Chain: This approach is a Markov chain based line.1 We reproduce a screenshot of the interface from the on Snodgrass and Ontanón (2014), based on Java code paper in Figure 1. The major parts of the interface are as supplied by the authors. It trains on existing game levels follows: by deriving all 2-by-2 squares of tiles and deriving prob- • The current level map in the center of the interface, which abilities of a final tile from the remaining three tiles in allows for scrolling side-to-side the square. We made use of the same representation as that paper, which represented elements like enemies and • A minimap on the bottom left of the interface, users can solid tiles as equivalent. To convert this representation to click on this to jump to a particular place in the level the editor representation we applied rules to determine the • A palette of level components or sprites in the middle of appropriate sprite from the solid tile class based on its the bottom row position and chose randomly from available enemies for the enemy class (with the stipulation that flying enemies • An “End Turn” button on the bottom right. By pressing could only appear in the air). Otherwise, our only vari- this End Turn button the current AI level design partner ation from this baseline was to limit the number of new is queried for an addition. A pop-up appears while the generated tiles to a maximum of thirty per turn. partner processes, and then its additions are added sprite- by-sprite to the main screen. The camera scrolls to follow • Bayes Net: This approach is a probabilistic graphical each addition, so that the user is aware of any changes to model or hierarchical Bayesian network based on Guz- the level. The user then regains control and level building dial and Riedl (2016). It derives shapes of sprite types continues in this turn-wise fashion. and samples from a probability of relative positions to determine the next sprite shape to add and where. This At any time during the interaction users can hit the top left approach was originally trained on gameplay video, thus “Run” button to play through the current version of the level. we split each level into a set of frame-sized chunks, and A backend logging system tracks all events, including ad- generated an additional shape for each chunk. This ap- ditions and deletions and which entity (human or AI) was proach was already iterative and so naturally fit into the responsible for them. turn-based level design format. We do not limit the num- ber of additions, but the agent only made additions when AI Level Design Partners there was a sufficient probability, and thus almost always For this user study we created three AI agents to serve as produced fewer additions than the other agents. level design partners. Each is based on a previously pub- lished PLGML approach, adapted to work in an iterative • LSTM: This approach is a Long Short Term Memory manner to fit the requirements of our level editor interface. Recurrent Neural Network (LSTMRNN or just LSTM) We lack the space to fully describe each system but cover based on Summerville and Mateas (2016), recreated in a high-level summary of the approaches and our alterations Tensorflow from the information given in the paper and below. training data supplied by the authors. It takes as input a game level represented as a sequence and outputs the 1 https://github.com/mguzdial3/Morai-Maker-Engine next tile type. We modified this approach to a bidirectional above ground or below ground level. We supplied two op- tional examples of the first two levels of each type taken from the original Super Mario Bros.. This leads to a total of twelve possible conditions in terms of pair of partners, order of the pair, and order of the level design assignments. Participants were given a maximum of fifteen minutes for each task, though most participants finished well before then. Participants were asked to press the “End Turn” button to interact with their AI partner at least once. Those who did not do so had their results thrown out. After both rounds of interaction participants took a brief survey in which they ranked the two partners they interacted with in terms of fun, frustration, challenge to work with, the partner that most aided the design, the partner that lead to the most surprising or valuable ideas, and which of the two partners the participant would most like to use again. We also gave participants the option to leave a comment reflect- ing on each agent. The survey ended by collecting demo- graphic data including experience with level design, Super Mario Bros., games in general, the participant’s gender (we collected gender in a free response field), and age. Figure 2: Examples of six final levels from our study, each Results pair of levels from a specific co-creative agent: Markov In this subsection we discuss an initial analysis of the results Chain (top), Bayes Net (middle), and LSTM (bottom). These of our user study. Overall 91 participants took part in this levels were selected at random from the set of final levels, study. However, of these seven participants did not interact split by co-creative agent. with one or both of their partners, and we removed them from our final data. The remaining 84 participants were split evenly between the twelve possible conditions, meaning a LSTM given it was collaborating and not just building a total of seven participants for each condition. level from start to end. We further modified the approach 62% of our respondents had previously designed Mario to only make additions to a 65-tile wide chunk of the level, levels at least once before. This is likely due to prior experi- centered on the user’s current camera placement in the ed- ence playing Mario Maker, a level design game/tool released itor. As with the Markov Chain we limited the additions by Nintendo on the Wii U. Our subjects were nearly evenly to 30 at most, and converted from the agent’s abstract rep- split between those who had never designed a level before resentation to the editor representation according to the 26%, designed a level once before 36%, or had designed same process. multiple levels in the past 38%. All but 7 of the subjects had We chose these three approaches as they represent the most previously played Super Mario Bros., and all the subjects successful prior PLGML approaches in terms of depth and played games in general regularly. breadth of evaluations. Further, each approach is distinct Our first goal in analyzing our results was to determine if from the other two. For example, each approach has a differ- the level design task (above or underground) mattered and ence in terms of local vs. global reasoning, with the Markov if the ordering of the pair of partners mattered. We ran a Chain being hyper-local (only generating based on a 2x2 one-way repeated measures ANOVA and found that neither square) to the much more global LSTM approach which variable lead to any significance. Thus, we can safely treat reads in almost the entirety of the current level. Notably, our data as having only three conditions, dependent on the because all three approaches were previously used for au- pair of partners each subject interacted with. tonomous generation, the agents could only make additions We give the ratio of first place to second place rankings to the level, never any deletions. We did not put any effort for each partner in Table 1. Therefore one can read the re- to including deletions in order to minimize the damage the sults as the Markov Chain agent being generally preferred, agent could cause to a user’s intended design of a level. though more challenging to use. Comparatively, the Bayes net agent was considered less challenging to use, but also less fun, with subjects less likely to want to reuse the agent. Study Method The LSTM on the other hand had the worst reaction overall. Each study participant went through the same process. First, The ratio of ranking results would seem to indicate a clear they were given a short tutorial on the level editor and its ordering of the agents. However, this is misleading. We ap- function. They then interacted with two distinct AI partners plied the Kruskal Wallis test to the results of each question back-to-back. The partners were assigned at random from and found it unable to reject the null hypothesis that all of the three possible options. During each interaction, the user the results from all separate agents arose from the same dis- was assigned one of two possible tasks, either to create an tribution. This indicates that in fact the agents are too close Table 1: A table comparing the ratio by which each system was ranked first or second. Most Fun Most Frustrating Most Challenging Most Aided Most Creative Reuse Markov Chain 33:23 26:30 29:27 30:26 33:23 32:24 Bayes Net 27:29 26:30 20:36 31:25 29:27 28:28 LSTM 24:32 32:24 35:21 23:33 22:34 24:32 in performance to state a significant ordering. In fact, many representations of the actions taken during each partner’s subjects greatly preferred the LSTM agent over the other turns. We also have final scores in terms of the user rank- two, stating that it was “Pretty smart overall, added elements ing. These final scores could serve as reward or feedback that collaborate well with my ideas” and “This agent seemed to a supervised learning system, however, we would ideally to build towards an ‘idea’ so to speak, by adding blocks in like some way to assign partial credit to all of the actions the interesting ways”. AI agent took to receive those final scores. Towards this pur- pose we decided to model this problem as a general, semi- User Study Results Discussion Markov Decision Process (SMDP) with concurrent actions These initial results of our user study do not indicate a as in (Rohanimanesh and Mahadevan 2003). clearly superior agent. Instead, they suggest that individ- Our SMDP with concurrent actions is from the AI part- ual participants varied in terms of their preferences. This ner’s perspective, given that we wish to use it to train a new matches our own experience with the agents. When attempt- AI partner. It has the following components: ing to build a very standard Super Mario Bros. level, the • State: We represent the level at the end of each human LSTM agent performed well. However, as is common with user turn as the state. deep learning methods it was brittle, defaulting to the most common behavior (e.g. adding grounds or blocks) when • Action: Each single addition by the agent per turn then confronted with unfamiliar input. In comparison the Bayes becomes a primary action, with the total turn representing net agent was more flexible, and the Markov Chain agent the concurrent action. more flexible still, given its hyper-local reasoning. • Reward: For the reward we make use of the Reuse rank- We include two randomly selected levels for each agent ing, as it represents our desire that the agent be helpful and in Figure 2. They clearly demonstrate some departures from usable first and foremost. In addition, we include a small typical Super Mario Bros. levels, meaning none of these lev- negative reward (-0.1) if the user deletes an addition made els could have been generated by any of these agents. Given by the AI partner. We make use of a γ value of 0.1 in or- this, and the results of the prior section, we have presented der to determine partial credit across the sequences of AI some evidence towards the first part of our argument, that partner actions. existing methods are insufficient to handle the task of co- creative level design. By which we mean, no existing agents Due to some network drops, some of the logs for our study are able to handle the variety of human level design or hu- were corrupted. Thus we ended up with 122 final sequences man preferences when it comes to AI agent partners. We will from our logs. We split this dataset into a 80-20 train-test present further evidence towards this and the second point in split by participant, ensuring that our test split only included the following sections. participants with both logs from both interactions uncor- rupted. Thus we had the logs of 11 participants held out for Proposed Co-Creative Approach testing purposes. The results of the prior section indicate a need for an ap- We further divided each state-action-reward triplet such proach designed for co-creative PLGML instead of adapted that we represent each state as a 40x15x32 matrix and each from autonomous PLGML. In particular, given that none of action as a 40x15x32 matrix. The state represents a screens our existing agents were able to sufficiently handle the va- worth of the current level (40x15), and the action represents riety of participants, we expect instead a need for an ideal the additions made over that chunk of level. The 32 in this partner to either more effectively generalize across all po- case is a one-hot encoding of sprites, based on the 32 possi- tential human designers or to adapt to a human designer ac- ble sprites in the editor’s sprite palette. We did this in order tively during the design task. We present a proposed archi- to further increase the amount of training data. This lead to tecture based on the results of the user study, and present a total of 1501 training samples and 242 test samples. both pre-trained and active learning variations to investigate these possibilities. Architecture From our user study we found that local coherency (Markov Dataset Chain) tended to outperform global coherency (LSTM). For the remainder of this paper we make use of the results of Thus for a proposed co-creative architecture we chose to the user study as a dataset. In particular, as stated in the Level make use a Convolutional Neural Network (CNN). A CNN Design Editor subsection, we logged all actions by both hu- is capable of learning local features that impact decision man and AI agent partners. These logs can be considered making, and to replicate those local features for generation requires training on examples of co-creative PLGML or an Table 2: A table comparing the summed reward each agent approximation. Thus our proposed approach can be consid- receives on the test data. ered the former of these two and the variation of our ap- participant Ours SMB MC GR LSTM proach trained on the Super Mario Bros. dataset the latter. 0 1.45 7.34 10.0 0.00 10.0 If these two approaches outperform the three baselines we 1 1.32 -4.63 -4.00 -1.00 -6.00 will have evidence for this, and our first claim that existing 2 0.00 0.00 0.00 0.00 0.00 PLGML methods were insufficient for co-creation. 3 -0.53 -1.57 0.00 0.00 -3.00 4 0.01 0.31 0.00 0.00 0.00 Pretrained Evaluation Results 5 5.50 1.36 0.00 0.00 1.00 6 0.29 -0.07 0.00 0.00 0.00 We summarize the results of this evaluation in Table 2. The 7 0.10 1.00 2.00 0.00 1.00 columns represent in order the results of our approach, the 8 -0.14 -10.1 -60.1 0.00 -40.2 SMB-trained variation of our approach, the Markov Chain 9 3.85 14.0 0.00 0.00 -1.10 baseline, the Bayes net baseline, and the LSTM baseline. 10 -3.01 -5.89 0.00 0.00 0.00 The rows represent the results separated by each participant in our test set. We separate the results in this way given the Avg % 53.9 0.8 -0.6 -0.0 -0.5 variance each participant displayed, and since the total pos- sible reward would depend upon the number of interactions, which differed between participants. Further, each partici- purposes. Further, they have shown success in approximat- pant must have given both a positive and negative final re- ing the Q-table in more traditional deep reinforcement learn- ward (ranking agents first and second in terms of reuse). Due ing applied to game playing (Mnih et al. 2013). to this reason we present the results in terms of summed re- We made use of a three layer CNN, with the first layer ward per-participant. Thus, higher is better. It is possible for having 8 4x4 filters, the second layer having 16 3x3 filters, an agent to achieve a negative reward if it places items that and the final layer having 32 3x3 filters. The final layer is the participant removed or that correspond with a final -1 a fully connected layer followed by a reshape to place the reward. Further, it is possible to end up with a summed re- output in the form of the action matrix (40x15x32). Each ward of 0 if the agent takes actions that we cannot assign layer made use of leaky relu activation, meaning that each any reward. For example, if we know that a human partici- index of the final matrix could vary from -1 to 1. We made pant doesn’t want an enemy, but the agent adds a pipe. We use of mean square loss and adam as our optimizer, with the cannot estimate reward in this case. Finally, it is possible network built in Tensorflow (Abadi et al. 2016). We trained to end with a summed reward much larger than 1.0 given this model to the point of convergence in terms of training a large number of actions that encompassed a large amount set error. of the level (thus many 40x15x32 testing chunks). The final row indicates the average percentile performance our of the Pretrained Evaluation maximum possible reward for each participant, since once normalized we can average these results to present them in For our first evaluation we compared the total reward ac- aggregate. crued on the test set across our 242 withheld test samples. In The numbers in Table 2 cannot be compared between comparison we make use of four baselines, the three existing rows given how different the possible rewards and actions agents and one variation on our approach. of each participant was. However, we can compare between For the variation on our approach, we instead trained on columns. For the final row, our approach and the SMB vari- a dataset created from the existing levels of Super Mario ation are the only two approaches on average to receive pos- Bros. (SMB), represented in our SMDP format. To accom- itive reward. We note that the Markov Chain partner does plish this, we derived all 40x15x32 chunks of SMB levels. well for some individuals, but overall has a worse perfor- We then removed all sprites of each single type from that mance than the LSTM agent. The Bayes net agent may ap- chunk, which became our state, with the action being the pear to do better, but this is largely because it either predicted addition of those sprites. We made the assumption that each nothing for each action or something for which the dataset action should receive a reward of 1, given that it would lead did not have a reward. We note that participant 2 in the Table to a complete Super Mario Bros. level. received a summed reward of 0.0 for all the approaches, but This evaluation can be understood as running these five this is because that participant only interacted with their two agents (our approach, the SMB variation, and the three al- agents once and did not make any deletions. ready introduced agents) through a simulated interaction with the held out test set of eleven participants. This is not a perfect simulation, given that we cannot estimate reward Active Evaluation without user feedback. However, given the nature of our re- The prior evaluation demonstrates that by training on a ward function, actions that we cannot assign reward to will dataset or approximated dataset of co-creative interactions receive 0.0. This makes the final amount of reward each one can outperform machine learning approaches trained agent receives a reasonable estimate of how each person to autonomously produce levels. This suggests these ap- might respond to the agent. proaches do a reasonable job of generalizing across the The second claim we made was that co-creative PLGML variety of interactions in our training dataset. However, just to a human designer given sufficient interaction. Table 3: A table comparing two variations on an active learn- ing version of our agent. Discussion and Limitations participant Ours Episodic Continuous 0 1.45 1.47 1.47 In this paper we presented results towards an argument for 1 1.32 -11.7 -10.1 co-creative level design via machine learning. We presented 2 0.00 0.00 0.00 evidence from a user study and two comparative experi- ments that (1) current approaches to procedural level gen- 3 -0.53 0.94 1.08 eration via machine learning are insufficient for co-creative 4 0.01 -0.05 -0.25 level design and (2) that co-creative level design requires 5 5.50 5.50 -7.55 training on a dataset or an approximated dataset of co- 6 0.29 0.29 0.04 creative level design. In support, we demonstrate that no cur- 7 0.10 0.10 -0.04 rent approach significantly outperforms any of the remaining 8 -0.14 5.22 0.42 approaches, and in fact that users are too varied for any one 9 3.85 42.7 41.0 model to meet an arbitrary user’s needs. Instead, we antic- 10 -3.01 -3.76 -4.62 ipate the need to apply active learning to adapt a general Avg % 53.9 56.6 53.1 model to particular individuals. We present a variety of evidence towards our stated claims. However, we note that we only present evidence in the domain of Super Mario Bros.. Further, while our com- if designers vary extremely from one another, generaliz- parative evaluations had strong results, these can only be ing too much between designers will actively harm a co- considered simulations of user interaction. In particular, our creative agent’s potential performance. This second compar- simulated test interactions essentially assume users will cre- ative evaluation tests if this is the case. ate the same final level, no matter what the AI partner does. For this evaluation we create two active learning varia- To fully validate these results we will need to run a new user tions of our approach. For both, after making a prediction study. We anticipate running a follow up study in order to and receiving reward for each test sample we then train on verify these results. that sample for one epoch. In the first, we reset the weights Beyond a follow-up user study, we also hope to investi- of our network to the final weights after training on our gate ways of speeding up the process of creating co-creative training set after every participant (we call this variation level design partners. Under the process described in this pa- “Episodic”). In the second, we never reset the weights, al- per, one would have to run a 60+ user study with three dif- lowing the agent to learn and generalize more from each ferent naive AI partners every time you wanted a co-creative participant it interacts with (We call this variation “Contin- level design partner for a new game. We plan to investigate uous”). If it is the case that user designs vary too extremely transfer learning and other ways to approximate co-creative for an approach to generalize between them, then we would datasets from existing corpora. Further, we anticipate a need anticipate “Continuous” to do worse, especially as it gets to for explainable AI in co-creative level design to help the hu- the end of the sequence of participants. man partner give appropriate feedback to the AI partner. Active Evaluation Results Conclusions We summarize the results of this evaluation in Table 3. We We introduce the problem of co-creative level design via replicate the results of the non-active learning version of our machine learning. This represents a new domain of re- approach from Table 2. Overall, these results support out search for Procedural Level Generation via Machine Learn- hypothesis. The average percentile of the maximum pos- ing (PLGML). In a user study and two comparative evalua- sible reward increased by roughly three percent from the tions we demonstrate evidence towards the claim that exist- non-active version to the episodic active learner, and de- ing PLGML methods are insufficient to address co-creation, creased by roughly a percentage point for the continuous and that co-creative AI level designers must train on datasets active learner. The continuous active learner did worse than or approximated datasets of co-creative level design. either the episodic active learner or our non-active learner for six of the eleven participants. This indicates that partic- ipants do tend to vary too much to generalize between, at Acknowledgements least for our current representation. This material is based upon work supported by the Na- Overall, it appears that some participants were more or tional Science Foundation under Grant No. IIS-1525967. less easy to learn from. For example, participants 1, 4, and This work was also supported in part by a 2018 Unity Grad- 10 all did worse with agents attempting to adapt to them dur- uate Fellowship. ing the simulated interaction. However, participants 8 and 9 both seemed well-suited to adaption given that their scores References increased over ten times from the non-active learner. This Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, follows from the fact that these two participants had the sec- J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. ond most and most interactions respectively across the test 2016. Tensorflow: A system for large-scale machine learn- participants. This suggests the ability for these agents to ad- ing. In OSDI, volume 16, 265–283. Baldwin, A.; Dahlskog, S.; Font, J. M.; and Holmberg, J. perspective on mixed-initiative co-creation. Computational 2017. Mixed-initiative procedural generation of dungeons Intelligence in Games. using game design patterns. In Computational Intelligence and Games (CIG), 2017 IEEE Conference on, 25–32. IEEE. Deterding, C. S.; Hook, J. D.; Fiebrink, R.; Gow, J.; Akten, M.; Smith, G.; Liapis, A.; and Compton, K. 2017. Mixed- initiative creative interfaces. In CHI EA’17: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems. ACM. Guzdial, M., and Riedl, M. 2016. Game level generation from gameplay videos. In Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference. Guzdial, M.; Chen, J.; Chen, S.-Y.; and Riedl, M. O. 2017. A general level design editor for co-creative level design. Fourth Experimental AI in Games Workshop. Liapis, A.; Yannakakis, G. N.; and Togelius, J. 2013. Sen- tient sketchbook: Computer-aided game level authoring. In Proceedings of ACM Conference on Foundations of Digital Games, 213–220. FDG. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; and Riedmiller, M. 2013. Play- ing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. Rohanimanesh, K., and Mahadevan, S. 2003. Learning to take concurrent actions. In Advances in neural information processing systems, 1651–1658. Shaker, N.; Shaker, M.; and Togelius, J. 2013. Ropossum: An authoring tool for designing, optimizing and solving cut the rope levels. In Proceedings of the Ninth AAAI Confer- ence on Artificial Intelligence and Interactive Digital Enter- tainment. Smith, G.; Whitehead, J.; and Mateas, M. 2010. Tanagra: A mixed-initiative level design tool. In Proceedings of the Fifth International Conference on the Foundations of Digital Games, 209–216. ACM. Snodgrass, S., and Ontañón, S. 2014. Experiments in map generation using markov chains. In FDG. Snodgrass, S., and Ontanón, S. 2017. Learning to generate video game maps using markov models. IEEE Transactions on Computational Intelligence and AI in Games 9(4):410– 422. Summerville, A., and Mateas, M. 2016. Super mario as a string: Platformer level generation via lstms. In The 1st International Conference of DiGRA and FDG. Summerville, A.; Snodgrass, S.; Guzdial, M.; Holmgård, C.; Hoover, A. K.; Isaksen, A.; Nealen, A.; and Togelius, J. 2017. Procedural content generation via machine learning (pcgml). arXiv preprint arXiv:1702.00539. Summerville, A. 2018. Learning from Games for Generative Purposes. Ph.D. Dissertation, UC Santa Cruz. Yannakakis, G. N.; Liapis, A.; and Alexopoulos, C. 2014. Mixed-initiative co-creativity. In Proceedings of the 9th Con- ference on the Foundations of Digital Games. FDG. Zhu, J.; Liapis, A.; Risi, S.; Bidarra, R.; and Youngblood, G. M. 2018. Explainable ai for designers: A human-centered