Towards Automatic Extraction of Tile Types from Level Images

                                                             Sam Snodgrass
                                                          Northeastern University
                                                             Boston, MA USA
                                                       s.snodgrass@northeastern.edu


                            Abstract                                    tile types from video game level images, which can then be
                                                                        used to represent levels from the given game. Our unsuper-
  In recent years, the use of machine learning for procedu-
  ral content generation (PCGML) has been growing. These
                                                                        vised approach attempts to find groups of functionally simi-
  PCGML approaches require a training corpus of levels, of-             lar objects using only positional and structural level informa-
  ten annotated or represented in some abstracted way. Due to           tion. Our goal is to further increase the usability of PCGML
  the manual effort required to annotate or translate a sufficient      techniques and broaden the applicability of such techniques
  training corpus, most PCGML techniques have only been ex-             to new domains by reducing the amount of domain knowl-
  plored in a handful of domains. In this paper we take a step          edge required to explore a new domain.
  towards addressing this core issue of PCGML by presenting                The remainder of this paper is organized as follows: first,
  an unsupervised method for automatically extracting a rep-            we discuss the relevant related work; we then present our
  resentation for a level domain, given only images of the lev-         approach for extracting tile sets; next, we present our ex-
  els. This approach is a move towards making PCGML more                perimental set-up, including the domain in which we test
  broadly applicable by reducing the effort required to create
  a training corpus. We evaluate our approach by comparing
                                                                        our approach and how we evaluate our approach; then we
  the automatically extracted tile representation against exist-        present and discuss our results; finally, we close by drawing
  ing PCGML training level corpus representations.                      our conclusions, and suggesting avenues of future work.

                                                                                              Related Work
                        Introduction
                                                                        Most of the current PCGML techniques for level gener-
Procedural content generation via machine learning                      ation techniques require annotated or abstracted training
(PCGML) (Summerville et al. 2018) is a growing field of                 levels, often derived from level images (Summerville and
research that automatically extracts models from existing               Mateas 2016; Snodgrass and Ontañón 2016b; Summerville
game content, and uses those learned models to generate                 and Mateas 2015). A notable exception is Guzdial and
new content. These approaches rely on a corpus of training              Reidl’s (Guzdial and Riedl 2016) approach which leverages
data from which to estimate their models. However, the                  a spritesheet and gameplay videos in order to automatically
creation of such training data (often through manual anno-              identify structures and construct its own internal graphical
tation or domain-specific scripts) can require a large time             representation of the levels. This is an interesting approach
commitment as well as expert domain knowledge in order                  that is able to leverage its representation to create remark-
to reason about the representation of the data for a given              able results. Additionally, the use of gameplay videos can
domain. This requirement of annotating training data is in              be reduced to using a static level image where each frame is
direct opposition to one of the core benefits of PCGML:                 either considered separately as a level, or concatenated to-
reducing the amount of domain knowledge that must be                    gether to form a full level. Therefore, methods that rely on
encoded by users. Subsequently, most PCGML approaches                   gameplay videos can also benefit from an unsupervised rep-
have only been tested in a handful of domain where training             resentation learning approach.
data is readily available (e.g., Super Mario Bros. (Guz-                   We are not the first to recognize the tension of needing an-
dial and Riedl 2016; Snodgrass and Ontañón 2016b;                     notated training data in PCGML. Summerville et al. (Sum-
Summerville and Mateas 2016), The Legend of Zelda (Sum-                 merville et al. 2016) created and maintain the Video Game
merville and Mateas 2015), and Lode Runner and Kid                      Level Corpus, a repository of video game levels represented
Icarus (Snodgrass and Ontañón 2016b)).                                in a variety of formats, including graphical and tile-based
   In this paper we begin research into relieving PCGML                 for the purpose of video game research. Regardless of these
techniques’ reliance on manual annotations and domain spe-              efforts, there are a vast number of video games, and it is
cific knowledge from users. We present a proof of concept               infeasible to convert many of their levels using the current
unsupervised approach for extracting a representative set of            manual or domain specific methods. Others have attempted
                                                                        to sidestep the need for annotated training data in new do-
                                                                        mains by combining models for various domains (Guzdial
                Input Levels                                       Unique Sprites                                   Re-represented Levels


                                            Sprite                                         Represent
                                           Extractor                                        Levels


                                                                                                                               PCGML


      Final Converted Levels                           Identified Clusters/Extracted Tile Types                      Trained Tile Distributions
                                                A:
                                                B:
                                   Represent    C:
                                    Levels      D:                                                     Clustering
                                                E:
                                                F:
                                                G:


Figure 1: This figure shows the flow of our approach. We start with a set of level images; extract a set of unique sprites
from those images; re-represent the levels using a unique identifier associated with each unique sprite; train a model on those
represented levels; perform clustering on the sprites using the distance between the trained distributions as the metric, yielding
a set of clusters corresponding to tile types; and finally we represent the input levels using the extracted tile types.


and Riedl 2018) or by transferring a learned model from one                   Figure 1 shows the flow of our approach. We discuss each of
domain which has training data to another domain with more                    the above stages in more detail below.
limited training data (Snodgrass and Ontanon 2016a). How-
ever, they still require training data for some (ideally func-                Labeling
tionally similar) domain, and thus do not address the root of
                                                                              In this stage we first parse the input level images in order to
the problem.
                                                                              extract a set of unique x × y pixel sprites. We then treat each
   Some have explored the role of different structures and                    of those unique sprites as a temporary tile type, and use them
tile types in different game domains through interaction. In                  to re-represent the input levels in an intermediate tile-based
the General Video Game AI competition (Pérez-Liébana et                     representation, resulting in a set of tile-based levels that can
al. 2016) the various agents needed to analyze and determine                  be passed to the next stage of the pipeline.
what the different elements in the given levels were without
prior knowledge. The agents in this case were able to inter-
                                                                              Training
act with the provided level and build up a world model this
way. The most closely related work is that of Summerville et                  In this stage we train a statistical model on the newly created
al (Summerville et al. 2017) which tries to determine what                    tile levels. Many machine learning approaches can be used
the elements in a Super Mario Bros. level do by analyzing                     here in order to extract a generalized representation of the
gameplay traces and in-game events, and clustering player                     input levels, but in our experiments we leverage a Markov
and object interactions modeled as probabilistic events. No-                  random field approach (Cross and Jain 1983).
tice, each of these approaches relies on the use of agent inter-                 We train a Markov random field using a neighborhood of
actions in order to extract the function of the tiles, while we               the four surrounding sprites in the level, as shown in Fig-
are first interested in seeing how far we can go only analyz-                 ure 2. Using the Markov random field, we estimate P (c|t)
ing the structural elements of the level in order to determine                from the set of levels represented in the tile format described
the functional groupings.                                                     above, where c is a configuration of surrounding tile types at
                                                                              a given position in the level, and t is the tile type at the cen-
                           Approach                                           ter of that configuration. A similar modeling approach using
                                                                              Markov random fields has previously been used by Snod-
In this section we propose our approach for automatically                     grass and Ontañón 2016b to model and generate game lev-
determining a representative set of tile types for a domain                   els. The key difference here is that Snodgrass and Ontañón
given a set of level images. At a high level our approach                     estimated P (t|c) in order to capture the proper placement
works in three phases: first, we automatically label the im-                  of tile types within a level and replicate it during genera-
ages using the set of unique sprites found in the level images;               tion, whereas we estimate P (c|t) so that we can more eas-
next, we train a Markov random field (Cross and Jain 1983)                    ily compare which configurations and patterns occur around
model on the levels treating each of those unique sprites as                  specific tile types, thus allowing us to reason more directly
a temporary tile type; finally, we cluster the sprites using                  about how different tile types occur with different patterns
the distances between their learned probability distributions.                in the input levels.
            …"          …"          …"          …"
                                                                                                   S   -   -   - - -   -   -   -   -   -   -   -
           S1,3        S2,3        S3,3        S4,3   …"                                           S   -   -   - - -   -   -   -   -   -   -   -
                                                                                                   S   -   -   - - -   -   -   -   -   -   -   -
                                                                                                   S   -   -   - - ?   -   -   -   -   -   -   -
                                                                                                   S   -   -   - - -   -   -   -   -   -   -   -
                                                                                                   S   -   -   - - -   -   -   -   -   -   -   -
           S1,2        S2,2        S3,2        S4,2   …"                                           S   -   -   - - -   -   -   -   -   -   -   -
                                                                                                   S   -   -   B M B   ?   B   -   -   -   -   -
                                                                                                   S   -   -   - - -   -   -   -   -   -   -   -
                                                                                                   S   -   -   - - -   -   -   -   -   -   [   ]

           S1,1        S2,1        S3,1        S4,1   …"                                           S
                                                                                                   S
                                                                                                       -
                                                                                                       #
                                                                                                           -
                                                                                                           #
                                                                                                               - g -
                                                                                                               # # #
                                                                                                                       -
                                                                                                                       #
                                                                                                                           -
                                                                                                                           #
                                                                                                                               -
                                                                                                                               #
                                                                                                                                   -
                                                                                                                                   #
                                                                                                                                       -
                                                                                                                                       #
                                                                                                                                           p
                                                                                                                                           #
                                                                                                                                               P
                                                                                                                                               #
                                                                                                   S   S   S   S S S   S   S   S   S   S   S   S

              P(Sx,y | Sx-1,y , Sx+1,y , Sx,y-1 , Sx,y+1)
Figure 2: This figure shows the network structure used when       Figure 3: This figure shows a section of a Super Mario Bros.
training our Markov random field approach. The red cell in-       level represented using the complex manual tile set.
dicates the current tile and the blue cells indicated the sur-
rounding configuration or neighborhood.
                                                                  our experiments in order to explore the feasibility of our ap-
                                                                  proach by using a limited number of unique sprites. Many
Clustering                                                        current PCGML approaches that have been tested in this do-
In this stage we cluster the sprites based on the learned         main have leveraged a tile-based representation of the lev-
probability distributions corresponding to each sprite. To        els, and there have been several different tile representations
achieve the clustering, we leverage a hierarchical clustering     used with varying degrees of fidelity. In our experiments,
approach implemented in R (Maechler et al. 2013). We use          we compare our automatically extracted tile sets against two
a hierarchical clustering approach so that we can easily in-      manually-defined tile sets:
spect the clusters at varying levels of granularity.
   For our distance metric, we compute the total variation        • Simple Manual: This is the tile set used by the VGLC
distance (Verdú 2014) between the probability distributions        to represent levels in this domain. It consists of 11 tile
learned for the given sprites. The total variation distance can     types, and abstracts different enemy types to a single tile
be thought of as the maximum distance between two proba-            type, and represents above ground levels without treetops,
bility distributions for any one event. Specifically, we com-       bridges, or moving platforms. For the level we use in our
pute:                                                               experiments 7 of these tile types used.
              max (|P (c|ti ) − P (c|tj )|) ∀c ∈ C,               • Complex Manual: This is the tile set used by Snodgrass
where the P are the probability distributions trained by the        and Ontañón in their more recent work (Snodgrass and
MRF, and ti and tj are the tiles for which the distributions        Ontañón 2016b). This tile set consists of 45 tile types. It
are being compared, and C is the set of all possible sur-           distinguishes between the different enemy types, it dis-
rounding tile configurations.                                       tinguishes blocks based on their contents, and is able to
   This hierarchical clustering approach results in a dendro-       represent all above ground levels, castle levels, and un-
gram where each leaf corresponds to a unique sprite. Thus,          derground levels. Figure 3 shows a section of a level rep-
once the clustering is complete we can experiment with cut-         resented in this format. For the level we use in our exper-
ting the tree at different heights to investigate the clusters      iments 15 of these tile types are needed.
resulting from that cut. It is beneficial to be able to explore   The mappings of the set of unique extracted sprites to these
different granularities of clusters for our analysis, but other   sets of tile types can be seen in Tables 1 (left) and 2 (left),
common methods for estimating the ideal number of clusters        respectively.
can be employed in place of manual inspection (e.g., the av-
erage silhouette method (Kaufman and Rousseeuw 2009)).            Evaluation Methods
Additionally, other common clustering techniques can be
                                                                  Using the approach outlined previously, we extract a set of
employed here, such as k-medoids or DBSCAN.
                                                                  unique sprites from the input level image, and then cluster
                                                                  them in order to automatically determine different sets of
              Experimental Evaluation                             tile types. We evaluate the results of our clustering by in-
In this section we discuss our experimental design including      vestigating the cluster statistics and through manual inspec-
the domain explored, the evaluations metrics used, and the        tion of the clusters themselves. For the cluster statistics, we
results of our experiments.                                       compute the average silhouette of the clusters created by
                                                                  cutting the dendrogram created by the hierarchical cluster-
Domain                                                            ing method at several levels. This measures how well, on
We test our approach with the first level of Super Mario          average, each sprite fits within its own cluster versus other
Bros., a platforming game that has commonly been used             clusters. We also note the largest and smallest cluster sizes
as a testbed in PCG (Marino, Reis, and Lelis 2015;                for each clustering. This metric shows us the spread of the
Shaker et al. 2011; Mawhorter and Mateas 2010) and                clusters and can help inform users about what a desirable
PCGML (Dahlskog, Togelius, and Nelson 2014; Guzdial               number of clusters may be. For the manual inspection, we
and Riedl 2016; Snodgrass and Ontañón 2016b; Sum-               explore the differences between the found clusters and the
merville and Mateas 2016). We use only the first level in         manually defined tile types. For our evaluation we cut the
                              Simple Manual                                        Simple Clustering
           Tile Type          Sprites                              Cluster           Sprites


           Empty                                                   Cluster 1


           Enemy                                                   Cluster 2

           ?-Block                                                 Cluster 3
           Brick                                                   Cluster 4
           Solid                                                   Cluster 5
           Left Pipe                                               Cluster 6
           Right Pipe                                              Cluster 7

Table 1: This table shows the Simple Manual set tile types and their corresponding sprites from the training level (left), and the
Simple Clustering set of automatically extracted tile types determined via clustering (right). The arrangement of the clustering
results does not indicate a relation to the manual tile type.

                             Complex Manual                                       Complex Clustering
           Tile Type          Sprites                              Cluster          Sprites

           Empty                                                   Cluster 1

           Flagpole                                                Cluster 2

           Goomba                                                  Cluster 3

           ?-Block                                                 Cluster 4
           Brick                                                   Cluster 5
           Powerup                                                 Cluster 6
           Solid                                                   Cluster 7
           Extra Life                                              Cluster 8
           Top-Left Pipe                                           Cluster 9
           Top-Right Pipe                                          Cluster 10
           Coin Brick                                              Cluster 11
           Star Block                                              Cluster 12
           Left Pipe                                               Cluster 13
           Right Pipe                                              Cluster 14
           Koopa                                                   Cluster 15

Table 2: This table shows the Complex Manual set tile types and their corresponding sprites from the training level (left), and the
Complex Clustering set of automatically extracted tile types determined via clustering (right). The arrangement of the clustering
results does not indicate a relation to the manual tile type.


dendrogram to produce 7 clusters and 15 clusters, so that we         Results
can compare these clustering results to the manually defined
7 and 15 tile types that have been used previously to repre-         Tables 1 and 2 show the manually defined tile sets and the
sent the chosen level.                                               sets of sprites represented by each tile type (left) and the
                                                                     clusters found by our approach (right). In many cases our
                                                                     approach clusters tiles that are functionally similar. For ex-
      k      Avg. Silhouette    Max. Size      Min. Size           lar sprites together (e.g., bricks, pipes) it struggled with the
      2        0.3778221          46              7                grouping of others (most notably the background sprites:
      7        0.1788397          21              1                sky, cloud, hill, and bush). This is partially a shortcoming
      11       0.1791865          21              1                of the distance metric and clustering employed, as several
      15       0.1825074          20              1                of the background sprites appear in similar configurations
      20      0.09250035          16              1                as other sprites (e.g., the hills and the pipes). We aim to
      26      0.08664249          10              1                first explore more robust modeling and distance metric op-
                                                                   tions. However, as Guzdial et al. discuss (Guzdial, Sturte-
Table 3: Cluster statistics for the clusters found by cutting      vant, and Li 2016), we likely need both static and dynamic
the dendrogram for various numbers of clusters. 7 and 15           analysis (i.e., analysis of the structural and gameplay ele-
are used in the rest of our evaluation because the manually        ments, respectively) to get a complete understanding of the
defined tile sets have represent the chosen training level with    domain. Therefore, once we more fully explore our static
7 (simple) and 15 (complex) tile types.                            analysis options (i.e., distance metrics, clustering options,
                                                                   modeling approaches), we will incorporate dynamic anal-
                                                                   ysis. Dynamic analysis will help our model reason more
ample, in both settings, a cluster is found containing many        clearly about the functionality of the sprites, and cluster
of the brick and question mark tiles. Additionally, the pipe-      them more cohesively. Incorporating dynamic analysis can
top tiles are accurately grouped together in the simple setting    easily undermine the goal of our work (i.e., reducing re-
(albeit with a background tile), and are accurately split in the   liance on domain knowledge and domain specific scripts)
complex setting. This is encouraging as it shows that these        by requiring a specific agent for each domain. To avoid this
distinctions can be made automatically with only structural        potential tension, we will explore the use of reinforcement
information (to some extent), but there are clear issues. Clus-    learning agents, specifically those used in the General Video
ter 1 (in the simple setting) contains a mix of background         Game AI competition, that may be able to learn how to play
sprites and enemies, and in general background sprites are         the game without requiring much domain knowledge.
interspersed with many of the clusters. This may be because           The second limitation of our work is that we have thus
while the background sprites all perform similar functions,        far only applied it to one domain, and only a fraction of that
the individual sprites (e.g., cloud sections, bush sections,       domain. To address this, we will first apply our approach to
etc.) often only appear in specific configurations, and thus       a larger subset of Super Mario Bros. levels, including those
have a very sparse (and very distinct) probability distribu-       with different visual sprite sets, such as castle and under-
tion, which makes them more likely to get grouped in with          ground levels. We are also interested in exploring our ap-
other sprites. For example, in the simple setting the bush and     proach in domains unexplored by PCGML techniques thus
hill background sprites get clustered with the bottom pipe         far, such as Metroid, which do not have a defined set of tiles
sections. Notably, all of these sprites typically appear just      used by researchers that can be treated as the “ground truth”
above the ground sprites. This behavior is reflected in the        as we did in the Super Mario Bros. domain. This will help
complex setting as well. This suggests a shortcoming of the        us explore the robustness of our refinements and force us to
distance metric used for clustering and potentially an insuffi-    devise and explore more general methods of evaluation.
ciency in using only the positional information of the sprites.
In the future, more informative distance metrics could be                                 Conclusions
considered which may encompass frequencies of the tiles
                                                                   In this paper we presented a proof of concept approach for
appearances or perhaps the shapes on contiguous tiles simi-
                                                                   automatically extracting a set of representative tile types
lar to Guzdial and Riedl’s clustering approach (Guzdial and
                                                                   from a set of input level images without requiring domain
Riedl 2016).
                                                                   or design knowledge from the user. This approach is meant
   Tables 3 shows the average silhouette of the clusters, as
                                                                   as a step towards alleviating the reliance of PCGML users
well as the maximum and minimum cluster sizes. Recall
                                                                   on domain specific scripts and manual annotation when cre-
that the average silhouette of a clustering approximates the
                                                                   ating training data. Using our approach, we automatically
spread of the clusters, and a smaller silhouette means that
                                                                   extracted 2 sets of tile types and found some similarities be-
the elements within a cluster are more closely related. As
                                                                   tween the extracted tile sets and manually defined tile sets of
expected, with more clusters the average silhouette typically
                                                                   the same size. The found clusters also contained quite a bit
decreases. An interesting exception here is that the silhou-
                                                                   of noise and did not perfectly delineate the sprites by func-
ette when k = 15 (for the complex setting) is larger than
                                                                   tionality. In the future we will explore methods for defining
when k = 7 (for the simple setting). This indicates that
                                                                   cleaner clusters and further exploring how this automatic ap-
when k = 7 we get tighter, more similar clusters. This is
                                                                   proach performs more broadly, both in other domains and
somewhat reflected in our manual inspection of the cluster-
                                                                   paired with a variety of PCGML techniques.
ing results discussed above.

           Limitations and Future Work
As this was an initial step, there are two major limitations
that we hope to address in the future. First, while our clus-
tering approach was able to group some functionally simi-
                        References                                In Seventh Workshop on Procedural Content Generation at
                                                                  First Joint International Conference of DiGRA and FDG.
Cross, G. R., and Jain, A. K. 1983. Markov random field
texture models. Pattern Analysis and Machine Intelligence,        Summerville, A.; Behrooz, M.; Mateas, M.; and Jhala, A.
IEEE Transactions on (1):25–39.                                   2017. What does that?-block do? learning latent causal af-
                                                                  fordances from mario play traces. In Proceedings of the
Dahlskog, S.; Togelius, J.; and Nelson, M. J. 2014. Linear        first Whats Next for AI in Games Workshop at AAAI, volume
levels through n-grams. Proceedings of the 18th Interna-          2017.
tional Academic MindTrek.
                                                                  Summerville, A.; Snodgrass, S.; Guzdial, M.; Holmgård, C.;
Guzdial, M., and Riedl, M. 2016. Game level generation            Hoover, A. K.; Isaksen, A.; Nealen, A.; and Togelius, J.
from gameplay videos. In Twelfth Artificial Intelligence and      2018. Procedural content generation via machine learning
Interactive Digital Entertainment Conference.                     (PCGML). IEEE Transactions on Games.
Guzdial, M., and Riedl, M. O. 2018. Combinatorial creativ-        Verdú, S. 2014. Total variation distance and the distribution
ity for procedural content generation via machine learning.       of relative information. In ITA, 1–3. Citeseer.
”The First Knowledge Extraction from Games Workshop”.
Guzdial, M.; Sturtevant, N.; and Li, B. 2016. Deep static
and dynamic level analysis: A study on infinite mario. In
Experimental AI in Games Workshop, volume 3.
Kaufman, L., and Rousseeuw, P. J. 2009. Finding groups in
data: an introduction to cluster analysis, volume 344. John
Wiley & Sons.
Maechler, M.; Rousseeuw, P.; Struyf, A.; Hubert, M.; and
Hornik, K. 2013. cluster: Cluster Analysis Basics and Ex-
tensions.
Marino, J. R.; Reis, W. M.; and Lelis, L. H. 2015. An em-
pirical evaluation of evaluation metrics of procedurally gen-
erated Mario levels. In Eleventh Artificial Intelligence and
Interactive Digital Entertainment Conference.
Mawhorter, P., and Mateas, M. 2010. Procedural level gen-
eration using occupancy-regulated extension. In Computa-
tional Intelligence and Games (CIG), 2010 IEEE Sympo-
sium on, 351–358. IEEE.
Pérez-Liébana, D.; Samothrakis, S.; Togelius, J.; Schaul, T.;
and Lucas, S. M. 2016. Analyzing the robustness of general
video game playing agents. In Computational Intelligence
and Games (CIG), 2016 IEEE Conference on, 1–8. IEEE.
Shaker, N.; Togelius, J.; Yannakakis, G. N.; Weber, B.;
Shimizu, T.; Hashiyama, T.; Sorenson, N.; Pasquier, P.;
Mawhorter, P.; Takahashi, G.; et al. 2011. The 2010 mario
AI championship: Level generation track. TCIAIG, IEEE
Transactions on 3(4):332–347.
Snodgrass, S., and Ontanon, S. 2016a. An approach to
domain transfer in procedural content generation of two-
dimensional videogame levels. In Twelfth Artificial Intel-
ligence and Interactive Digital Entertainment Conference.
Snodgrass, S., and Ontañón, S. 2016b. Learning to generate
video game maps using Markov models. IEEE Transactions
on Computational Intelligence and AI in Games.
Summerville, A., and Mateas, M. 2015. Sampling hyrule:
Sampling probabilistic machine learning for level genera-
tion.
Summerville, A., and Mateas, M. 2016. Super Mario as a
string: Platformer level generation via LSTMs. Proceedings
of 1st International Joint Conference of DiGRA and FDG.
Summerville, A. J.; Snodgrass, S.; Mateas, M.; and
Ontañón, S. 2016. The VGLC: The video game level corpus.