Towards Automatic Extraction of Tile Types from Level Images Sam Snodgrass Northeastern University Boston, MA USA s.snodgrass@northeastern.edu Abstract tile types from video game level images, which can then be used to represent levels from the given game. Our unsuper- In recent years, the use of machine learning for procedu- ral content generation (PCGML) has been growing. These vised approach attempts to find groups of functionally simi- PCGML approaches require a training corpus of levels, of- lar objects using only positional and structural level informa- ten annotated or represented in some abstracted way. Due to tion. Our goal is to further increase the usability of PCGML the manual effort required to annotate or translate a sufficient techniques and broaden the applicability of such techniques training corpus, most PCGML techniques have only been ex- to new domains by reducing the amount of domain knowl- plored in a handful of domains. In this paper we take a step edge required to explore a new domain. towards addressing this core issue of PCGML by presenting The remainder of this paper is organized as follows: first, an unsupervised method for automatically extracting a rep- we discuss the relevant related work; we then present our resentation for a level domain, given only images of the lev- approach for extracting tile sets; next, we present our ex- els. This approach is a move towards making PCGML more perimental set-up, including the domain in which we test broadly applicable by reducing the effort required to create a training corpus. We evaluate our approach by comparing our approach and how we evaluate our approach; then we the automatically extracted tile representation against exist- present and discuss our results; finally, we close by drawing ing PCGML training level corpus representations. our conclusions, and suggesting avenues of future work. Related Work Introduction Most of the current PCGML techniques for level gener- Procedural content generation via machine learning ation techniques require annotated or abstracted training (PCGML) (Summerville et al. 2018) is a growing field of levels, often derived from level images (Summerville and research that automatically extracts models from existing Mateas 2016; Snodgrass and Ontañón 2016b; Summerville game content, and uses those learned models to generate and Mateas 2015). A notable exception is Guzdial and new content. These approaches rely on a corpus of training Reidl’s (Guzdial and Riedl 2016) approach which leverages data from which to estimate their models. However, the a spritesheet and gameplay videos in order to automatically creation of such training data (often through manual anno- identify structures and construct its own internal graphical tation or domain-specific scripts) can require a large time representation of the levels. This is an interesting approach commitment as well as expert domain knowledge in order that is able to leverage its representation to create remark- to reason about the representation of the data for a given able results. Additionally, the use of gameplay videos can domain. This requirement of annotating training data is in be reduced to using a static level image where each frame is direct opposition to one of the core benefits of PCGML: either considered separately as a level, or concatenated to- reducing the amount of domain knowledge that must be gether to form a full level. Therefore, methods that rely on encoded by users. Subsequently, most PCGML approaches gameplay videos can also benefit from an unsupervised rep- have only been tested in a handful of domain where training resentation learning approach. data is readily available (e.g., Super Mario Bros. (Guz- We are not the first to recognize the tension of needing an- dial and Riedl 2016; Snodgrass and Ontañón 2016b; notated training data in PCGML. Summerville et al. (Sum- Summerville and Mateas 2016), The Legend of Zelda (Sum- merville et al. 2016) created and maintain the Video Game merville and Mateas 2015), and Lode Runner and Kid Level Corpus, a repository of video game levels represented Icarus (Snodgrass and Ontañón 2016b)). in a variety of formats, including graphical and tile-based In this paper we begin research into relieving PCGML for the purpose of video game research. Regardless of these techniques’ reliance on manual annotations and domain spe- efforts, there are a vast number of video games, and it is cific knowledge from users. We present a proof of concept infeasible to convert many of their levels using the current unsupervised approach for extracting a representative set of manual or domain specific methods. Others have attempted to sidestep the need for annotated training data in new do- mains by combining models for various domains (Guzdial Input Levels Unique Sprites Re-represented Levels Sprite Represent Extractor Levels PCGML Final Converted Levels Identified Clusters/Extracted Tile Types Trained Tile Distributions A: B: Represent C: Levels D: Clustering E: F: G: Figure 1: This figure shows the flow of our approach. We start with a set of level images; extract a set of unique sprites from those images; re-represent the levels using a unique identifier associated with each unique sprite; train a model on those represented levels; perform clustering on the sprites using the distance between the trained distributions as the metric, yielding a set of clusters corresponding to tile types; and finally we represent the input levels using the extracted tile types. and Riedl 2018) or by transferring a learned model from one Figure 1 shows the flow of our approach. We discuss each of domain which has training data to another domain with more the above stages in more detail below. limited training data (Snodgrass and Ontanon 2016a). How- ever, they still require training data for some (ideally func- Labeling tionally similar) domain, and thus do not address the root of In this stage we first parse the input level images in order to the problem. extract a set of unique x × y pixel sprites. We then treat each Some have explored the role of different structures and of those unique sprites as a temporary tile type, and use them tile types in different game domains through interaction. In to re-represent the input levels in an intermediate tile-based the General Video Game AI competition (Pérez-Liébana et representation, resulting in a set of tile-based levels that can al. 2016) the various agents needed to analyze and determine be passed to the next stage of the pipeline. what the different elements in the given levels were without prior knowledge. The agents in this case were able to inter- Training act with the provided level and build up a world model this way. The most closely related work is that of Summerville et In this stage we train a statistical model on the newly created al (Summerville et al. 2017) which tries to determine what tile levels. Many machine learning approaches can be used the elements in a Super Mario Bros. level do by analyzing here in order to extract a generalized representation of the gameplay traces and in-game events, and clustering player input levels, but in our experiments we leverage a Markov and object interactions modeled as probabilistic events. No- random field approach (Cross and Jain 1983). tice, each of these approaches relies on the use of agent inter- We train a Markov random field using a neighborhood of actions in order to extract the function of the tiles, while we the four surrounding sprites in the level, as shown in Fig- are first interested in seeing how far we can go only analyz- ure 2. Using the Markov random field, we estimate P (c|t) ing the structural elements of the level in order to determine from the set of levels represented in the tile format described the functional groupings. above, where c is a configuration of surrounding tile types at a given position in the level, and t is the tile type at the cen- Approach ter of that configuration. A similar modeling approach using Markov random fields has previously been used by Snod- In this section we propose our approach for automatically grass and Ontañón 2016b to model and generate game lev- determining a representative set of tile types for a domain els. The key difference here is that Snodgrass and Ontañón given a set of level images. At a high level our approach estimated P (t|c) in order to capture the proper placement works in three phases: first, we automatically label the im- of tile types within a level and replicate it during genera- ages using the set of unique sprites found in the level images; tion, whereas we estimate P (c|t) so that we can more eas- next, we train a Markov random field (Cross and Jain 1983) ily compare which configurations and patterns occur around model on the levels treating each of those unique sprites as specific tile types, thus allowing us to reason more directly a temporary tile type; finally, we cluster the sprites using about how different tile types occur with different patterns the distances between their learned probability distributions. in the input levels. …" …" …" …" S - - - - - - - - - - - - S1,3 S2,3 S3,3 S4,3 …" S - - - - - - - - - - - - S - - - - - - - - - - - - S - - - - ? - - - - - - - S - - - - - - - - - - - - S - - - - - - - - - - - - S1,2 S2,2 S3,2 S4,2 …" S - - - - - - - - - - - - S - - B M B ? B - - - - - S - - - - - - - - - - - - S - - - - - - - - - - [ ] S1,1 S2,1 S3,1 S4,1 …" S S - # - # - g - # # # - # - # - # - # - # p # P # S S S S S S S S S S S S S P(Sx,y | Sx-1,y , Sx+1,y , Sx,y-1 , Sx,y+1) Figure 2: This figure shows the network structure used when Figure 3: This figure shows a section of a Super Mario Bros. training our Markov random field approach. The red cell in- level represented using the complex manual tile set. dicates the current tile and the blue cells indicated the sur- rounding configuration or neighborhood. our experiments in order to explore the feasibility of our ap- proach by using a limited number of unique sprites. Many Clustering current PCGML approaches that have been tested in this do- In this stage we cluster the sprites based on the learned main have leveraged a tile-based representation of the lev- probability distributions corresponding to each sprite. To els, and there have been several different tile representations achieve the clustering, we leverage a hierarchical clustering used with varying degrees of fidelity. In our experiments, approach implemented in R (Maechler et al. 2013). We use we compare our automatically extracted tile sets against two a hierarchical clustering approach so that we can easily in- manually-defined tile sets: spect the clusters at varying levels of granularity. For our distance metric, we compute the total variation • Simple Manual: This is the tile set used by the VGLC distance (Verdú 2014) between the probability distributions to represent levels in this domain. It consists of 11 tile learned for the given sprites. The total variation distance can types, and abstracts different enemy types to a single tile be thought of as the maximum distance between two proba- type, and represents above ground levels without treetops, bility distributions for any one event. Specifically, we com- bridges, or moving platforms. For the level we use in our pute: experiments 7 of these tile types used. max (|P (c|ti ) − P (c|tj )|) ∀c ∈ C, • Complex Manual: This is the tile set used by Snodgrass where the P are the probability distributions trained by the and Ontañón in their more recent work (Snodgrass and MRF, and ti and tj are the tiles for which the distributions Ontañón 2016b). This tile set consists of 45 tile types. It are being compared, and C is the set of all possible sur- distinguishes between the different enemy types, it dis- rounding tile configurations. tinguishes blocks based on their contents, and is able to This hierarchical clustering approach results in a dendro- represent all above ground levels, castle levels, and un- gram where each leaf corresponds to a unique sprite. Thus, derground levels. Figure 3 shows a section of a level rep- once the clustering is complete we can experiment with cut- resented in this format. For the level we use in our exper- ting the tree at different heights to investigate the clusters iments 15 of these tile types are needed. resulting from that cut. It is beneficial to be able to explore The mappings of the set of unique extracted sprites to these different granularities of clusters for our analysis, but other sets of tile types can be seen in Tables 1 (left) and 2 (left), common methods for estimating the ideal number of clusters respectively. can be employed in place of manual inspection (e.g., the av- erage silhouette method (Kaufman and Rousseeuw 2009)). Evaluation Methods Additionally, other common clustering techniques can be Using the approach outlined previously, we extract a set of employed here, such as k-medoids or DBSCAN. unique sprites from the input level image, and then cluster them in order to automatically determine different sets of Experimental Evaluation tile types. We evaluate the results of our clustering by in- In this section we discuss our experimental design including vestigating the cluster statistics and through manual inspec- the domain explored, the evaluations metrics used, and the tion of the clusters themselves. For the cluster statistics, we results of our experiments. compute the average silhouette of the clusters created by cutting the dendrogram created by the hierarchical cluster- Domain ing method at several levels. This measures how well, on We test our approach with the first level of Super Mario average, each sprite fits within its own cluster versus other Bros., a platforming game that has commonly been used clusters. We also note the largest and smallest cluster sizes as a testbed in PCG (Marino, Reis, and Lelis 2015; for each clustering. This metric shows us the spread of the Shaker et al. 2011; Mawhorter and Mateas 2010) and clusters and can help inform users about what a desirable PCGML (Dahlskog, Togelius, and Nelson 2014; Guzdial number of clusters may be. For the manual inspection, we and Riedl 2016; Snodgrass and Ontañón 2016b; Sum- explore the differences between the found clusters and the merville and Mateas 2016). We use only the first level in manually defined tile types. For our evaluation we cut the Simple Manual Simple Clustering Tile Type Sprites Cluster Sprites Empty Cluster 1 Enemy Cluster 2 ?-Block Cluster 3 Brick Cluster 4 Solid Cluster 5 Left Pipe Cluster 6 Right Pipe Cluster 7 Table 1: This table shows the Simple Manual set tile types and their corresponding sprites from the training level (left), and the Simple Clustering set of automatically extracted tile types determined via clustering (right). The arrangement of the clustering results does not indicate a relation to the manual tile type. Complex Manual Complex Clustering Tile Type Sprites Cluster Sprites Empty Cluster 1 Flagpole Cluster 2 Goomba Cluster 3 ?-Block Cluster 4 Brick Cluster 5 Powerup Cluster 6 Solid Cluster 7 Extra Life Cluster 8 Top-Left Pipe Cluster 9 Top-Right Pipe Cluster 10 Coin Brick Cluster 11 Star Block Cluster 12 Left Pipe Cluster 13 Right Pipe Cluster 14 Koopa Cluster 15 Table 2: This table shows the Complex Manual set tile types and their corresponding sprites from the training level (left), and the Complex Clustering set of automatically extracted tile types determined via clustering (right). The arrangement of the clustering results does not indicate a relation to the manual tile type. dendrogram to produce 7 clusters and 15 clusters, so that we Results can compare these clustering results to the manually defined 7 and 15 tile types that have been used previously to repre- Tables 1 and 2 show the manually defined tile sets and the sent the chosen level. sets of sprites represented by each tile type (left) and the clusters found by our approach (right). In many cases our approach clusters tiles that are functionally similar. For ex- k Avg. Silhouette Max. Size Min. Size lar sprites together (e.g., bricks, pipes) it struggled with the 2 0.3778221 46 7 grouping of others (most notably the background sprites: 7 0.1788397 21 1 sky, cloud, hill, and bush). This is partially a shortcoming 11 0.1791865 21 1 of the distance metric and clustering employed, as several 15 0.1825074 20 1 of the background sprites appear in similar configurations 20 0.09250035 16 1 as other sprites (e.g., the hills and the pipes). We aim to 26 0.08664249 10 1 first explore more robust modeling and distance metric op- tions. However, as Guzdial et al. discuss (Guzdial, Sturte- Table 3: Cluster statistics for the clusters found by cutting vant, and Li 2016), we likely need both static and dynamic the dendrogram for various numbers of clusters. 7 and 15 analysis (i.e., analysis of the structural and gameplay ele- are used in the rest of our evaluation because the manually ments, respectively) to get a complete understanding of the defined tile sets have represent the chosen training level with domain. Therefore, once we more fully explore our static 7 (simple) and 15 (complex) tile types. analysis options (i.e., distance metrics, clustering options, modeling approaches), we will incorporate dynamic anal- ysis. Dynamic analysis will help our model reason more ample, in both settings, a cluster is found containing many clearly about the functionality of the sprites, and cluster of the brick and question mark tiles. Additionally, the pipe- them more cohesively. Incorporating dynamic analysis can top tiles are accurately grouped together in the simple setting easily undermine the goal of our work (i.e., reducing re- (albeit with a background tile), and are accurately split in the liance on domain knowledge and domain specific scripts) complex setting. This is encouraging as it shows that these by requiring a specific agent for each domain. To avoid this distinctions can be made automatically with only structural potential tension, we will explore the use of reinforcement information (to some extent), but there are clear issues. Clus- learning agents, specifically those used in the General Video ter 1 (in the simple setting) contains a mix of background Game AI competition, that may be able to learn how to play sprites and enemies, and in general background sprites are the game without requiring much domain knowledge. interspersed with many of the clusters. This may be because The second limitation of our work is that we have thus while the background sprites all perform similar functions, far only applied it to one domain, and only a fraction of that the individual sprites (e.g., cloud sections, bush sections, domain. To address this, we will first apply our approach to etc.) often only appear in specific configurations, and thus a larger subset of Super Mario Bros. levels, including those have a very sparse (and very distinct) probability distribu- with different visual sprite sets, such as castle and under- tion, which makes them more likely to get grouped in with ground levels. We are also interested in exploring our ap- other sprites. For example, in the simple setting the bush and proach in domains unexplored by PCGML techniques thus hill background sprites get clustered with the bottom pipe far, such as Metroid, which do not have a defined set of tiles sections. Notably, all of these sprites typically appear just used by researchers that can be treated as the “ground truth” above the ground sprites. This behavior is reflected in the as we did in the Super Mario Bros. domain. This will help complex setting as well. This suggests a shortcoming of the us explore the robustness of our refinements and force us to distance metric used for clustering and potentially an insuffi- devise and explore more general methods of evaluation. ciency in using only the positional information of the sprites. In the future, more informative distance metrics could be Conclusions considered which may encompass frequencies of the tiles In this paper we presented a proof of concept approach for appearances or perhaps the shapes on contiguous tiles simi- automatically extracting a set of representative tile types lar to Guzdial and Riedl’s clustering approach (Guzdial and from a set of input level images without requiring domain Riedl 2016). or design knowledge from the user. This approach is meant Tables 3 shows the average silhouette of the clusters, as as a step towards alleviating the reliance of PCGML users well as the maximum and minimum cluster sizes. Recall on domain specific scripts and manual annotation when cre- that the average silhouette of a clustering approximates the ating training data. Using our approach, we automatically spread of the clusters, and a smaller silhouette means that extracted 2 sets of tile types and found some similarities be- the elements within a cluster are more closely related. As tween the extracted tile sets and manually defined tile sets of expected, with more clusters the average silhouette typically the same size. The found clusters also contained quite a bit decreases. An interesting exception here is that the silhou- of noise and did not perfectly delineate the sprites by func- ette when k = 15 (for the complex setting) is larger than tionality. In the future we will explore methods for defining when k = 7 (for the simple setting). This indicates that cleaner clusters and further exploring how this automatic ap- when k = 7 we get tighter, more similar clusters. This is proach performs more broadly, both in other domains and somewhat reflected in our manual inspection of the cluster- paired with a variety of PCGML techniques. ing results discussed above. Limitations and Future Work As this was an initial step, there are two major limitations that we hope to address in the future. First, while our clus- tering approach was able to group some functionally simi- References In Seventh Workshop on Procedural Content Generation at First Joint International Conference of DiGRA and FDG. Cross, G. R., and Jain, A. K. 1983. Markov random field texture models. Pattern Analysis and Machine Intelligence, Summerville, A.; Behrooz, M.; Mateas, M.; and Jhala, A. IEEE Transactions on (1):25–39. 2017. What does that?-block do? learning latent causal af- fordances from mario play traces. In Proceedings of the Dahlskog, S.; Togelius, J.; and Nelson, M. J. 2014. Linear first Whats Next for AI in Games Workshop at AAAI, volume levels through n-grams. Proceedings of the 18th Interna- 2017. tional Academic MindTrek. Summerville, A.; Snodgrass, S.; Guzdial, M.; Holmgård, C.; Guzdial, M., and Riedl, M. 2016. Game level generation Hoover, A. K.; Isaksen, A.; Nealen, A.; and Togelius, J. from gameplay videos. In Twelfth Artificial Intelligence and 2018. Procedural content generation via machine learning Interactive Digital Entertainment Conference. (PCGML). IEEE Transactions on Games. Guzdial, M., and Riedl, M. O. 2018. Combinatorial creativ- Verdú, S. 2014. Total variation distance and the distribution ity for procedural content generation via machine learning. of relative information. In ITA, 1–3. Citeseer. ”The First Knowledge Extraction from Games Workshop”. Guzdial, M.; Sturtevant, N.; and Li, B. 2016. Deep static and dynamic level analysis: A study on infinite mario. In Experimental AI in Games Workshop, volume 3. Kaufman, L., and Rousseeuw, P. J. 2009. Finding groups in data: an introduction to cluster analysis, volume 344. John Wiley & Sons. Maechler, M.; Rousseeuw, P.; Struyf, A.; Hubert, M.; and Hornik, K. 2013. cluster: Cluster Analysis Basics and Ex- tensions. Marino, J. R.; Reis, W. M.; and Lelis, L. H. 2015. An em- pirical evaluation of evaluation metrics of procedurally gen- erated Mario levels. In Eleventh Artificial Intelligence and Interactive Digital Entertainment Conference. Mawhorter, P., and Mateas, M. 2010. Procedural level gen- eration using occupancy-regulated extension. In Computa- tional Intelligence and Games (CIG), 2010 IEEE Sympo- sium on, 351–358. IEEE. Pérez-Liébana, D.; Samothrakis, S.; Togelius, J.; Schaul, T.; and Lucas, S. M. 2016. Analyzing the robustness of general video game playing agents. In Computational Intelligence and Games (CIG), 2016 IEEE Conference on, 1–8. IEEE. Shaker, N.; Togelius, J.; Yannakakis, G. N.; Weber, B.; Shimizu, T.; Hashiyama, T.; Sorenson, N.; Pasquier, P.; Mawhorter, P.; Takahashi, G.; et al. 2011. The 2010 mario AI championship: Level generation track. TCIAIG, IEEE Transactions on 3(4):332–347. Snodgrass, S., and Ontanon, S. 2016a. An approach to domain transfer in procedural content generation of two- dimensional videogame levels. In Twelfth Artificial Intel- ligence and Interactive Digital Entertainment Conference. Snodgrass, S., and Ontañón, S. 2016b. Learning to generate video game maps using Markov models. IEEE Transactions on Computational Intelligence and AI in Games. Summerville, A., and Mateas, M. 2015. Sampling hyrule: Sampling probabilistic machine learning for level genera- tion. Summerville, A., and Mateas, M. 2016. Super Mario as a string: Platformer level generation via LSTMs. Proceedings of 1st International Joint Conference of DiGRA and FDG. Summerville, A. J.; Snodgrass, S.; Mateas, M.; and Ontañón, S. 2016. The VGLC: The video game level corpus.