1. Introduction

1613-0073

Jean-Baptiste Hervé

0 2 3

Oliver Withington

o.withington@qmul.ac.uk 0 1 3

Marion Hervé

0 3

Laurissa Tokarchuk

laurissa.tokarchuk@qmul.ac.uk 0 1 3

Christoph Salge

christophsalge@gmail.com 0 2 3

PCG Evaluation, Generative AI, Minecraft, GDMC

0 AIIDE Workshop on Experimental Artificial Intelligence in Games 1 Queen Mary University of London , London , UK 2 University of Hertfordshire , Hatfield , UK 3 Workshop Proce dings

With growing interest in Procedural Content Generation (PCG) it becomes increasingly important to develop methods and tools for evaluating and comparing alternative systems. There is a particular lack regarding the evaluation of generative pipelines, where a set of generative systems work in series to make iterative changes to an artifact. We introduce a novel method called Generative Shift for evaluating the impact of individual stages in a PCG pipeline by quantifying the impact that a generative process has when it is applied to a pre-existing artifact. We explore this technique by applying it to a very rich dataset of Minecraft game maps produced by a set of alternative settlement generators developed as part of the Generative Design in Minecraft Competition (GDMC), all of which are designed to produce appropriate settlements for a pre-existing map. While this is an early exploration of this technique we find it to be a promising lens to apply to PCG evaluation, and we are optimistic about the potential of Generative Shift to be a domain-agnostic method for evaluating generative pipelines.

1. Introduction Procedural content generation refers to the automatic

creation of pieces of content, usually called ‘artifacts’. Among common artifacts, we find game levels, equipment, or even 3d models. It is used to provide a lot of diferent assets, at any scale, without requiring to fully author. Academic game research has studied PCG from diferent perspectives. Recent publications include new techniques of generation [ 1, 2 ], diferent nature of artifacts [3, 4, 5, 6, 7, 8], or even design questions [9]. One area, in particular, is the automated evaluation of generated content [10, 11, 12, 13, 14]. However, the commonly used techniques, such as Expressive Range Analysis (ERA) [10], are not well suited for intermediary steps of the generation. They also focus on the entirety of an artifact, even if its sub-components and their interactions are also good indicators of the player’s perception [15].

These techniques also rely on the use of user-defined metrics, usually based on the user’s intuition or previous work. But most metrics don’t align with the perceived quality of generated content [13, 16, 17], nor do they have

LGOBE owithington.co.uk (O. Withington) 0000-0002-7007-5193 (O. Withington) any guarantees to capture meaningful variation. Furthermore, many metrics are hard to interpret by end-users, such as game designers [12]. The previous issues, combined with the availability of many diferent metrics also make the selection and visualization of metrics dificult.

1.1. Key Contributions and Overview

In this paper, we make three conceptual contributions to the area of PCG system evaluation, which are as follows: 1. Evaluation of intermediary PCG artifacts and the impact of generative steps - Allowing designers to interrogate the impact of individual steps in a generative process. 2. The application of dimensionality reduction to artifact metrics - Freeing designers from having to narrowly define the metrics that interest them when visualizing generative spaces.

3. ERA of locations within a single generated level,

rather than of levels in totality - Allowing for the analysis of 3D environments that more closely aligns with how they are experienced by players, i.e. from a specific location in the game space.

While each of these contributions has potential util

ity in isolation, they can also be combined into a single technique which we call Generative Shift Analysis (GSA).

This technique allows for the qualitative and quantitative evaluation of the efect that a generator has on a base artefact in aggregate, while also allowing a user to high

CEUR

ceur-ws.org

CEUR htp:/ceur-ws.org ISN1613-073

CEUR

Workshop Proceedings (CEUR-WS.org)

within the generated environment.

Attribution 4.0 International (CC BY 4.0).

© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License light the most extremely afected individual locations After introducing this technique we show how it can be usefully applied to gain insight from real world generative systems, in our case a set of alternative Minecraft settlement generators. We argue that GSA can be used as the basis for techniques which are better suited for evaluating PCG systems and the virtual environments they produce in the form that these systems actually exist in contemporary game development. composite [15]. Some of ERA limitations have been addressed by further study, ofering to expand it, through better representations of ER and ease of metrics selection [11, 19, 20]. But these techniques have yet to be adopted.

2.2. Dimensionality Reduction 2. Related work 2.1. Expressive Range Analysis

Within the field of PCG, a truly qualitative evaluation of artifacts is challenging [18, 14]. The range of possibilities of a given generator is quite complex to represent hence Expressive Range [10] Analysis (ERA) is commonly employed. The Expressive Range is bounded by selected metrics, which form its dimensions. All the possible artifacts (or at least large panel of them) are plotted according to the metrics, leading to an analysis of their distribution.

The analysis of the Expressive Range of a generator can be useful in order to understand its behavior and how the artifacts are spread among the dimensions. Therefore, the usefulness of an Expressive Range depends mostly on the relevance of the dimensions by which it is defined.

Usually, dimensions used are automatically computed metrics applied on the whole artifact. They do not necessarily need to capture something associated with quality, but there is often an underlying assumption that higher values in certain dimensions is preferred. More importantly, the metrics should capture meaningful diferences, so artifacts lying in diferent areas of the expressive range appear diferent to the relevant players. Metrics of evaluation can also be used by designers to tune the generators and optimize certain aspects of the generated artifacts.

However, ERA still has limitations. Firstly, a computed diference between two artifacts does not necessarily lead to a perceived diference for the player [ 14]. The chosen metrics themselves lack of embodiment, and several experiments have already been conducted to critically examine metrics commonly used and their relevance [13, 16, 17]. In most cases, the Expressive Range is deifned as a 2D space, mostly for ease of interpretation, 2.3. GDMC Competition but ultimately prevent in depth analysis of the interactions between several metrics [11]. As a consequence, The GDMC is a yearly competition in which teams subpicking metrics pairs that impart meaning can be time mit a settlement generator [25] for Minecraft, which is a consuming and adds additional complexity [19]. The use computer program that can add or remove blocks from a of the metrics is also not well defined, and depending of given Minecraft maps without human intervention. All their nature, a ”good” artifact might need to maximise a the generated settlements are then sent to the jury. The metric or target a specific value depending purpose [ 16]. jury includes experts in various fields, such as AI, Game Another limitation is the reliance on evaluating artifacts Design, or urbanism. Each judge scores the settlements independently and as atomic objects, while other results in each of the following categories: Adaptability, Funcsuggest generated artifacts are in reality experienced as tionality, Narrative, and Aesthetic. Adaptability is how well the settlement is suited for its location - how well One of the ideas explored in this paper is using a dimensionality reduction (DR) algorithm, principal component analysis (PCA), to compress and combine a set of quantitative metrics. DR algorithms are a family of approaches for compressing high dimensional data into lower dimensional space while maintaining as much of the information present in the original data. They are commonly used for exploratory data analysis as well as as a pre-processing step in deep learning implementations.

DR algorithms have found use in many diferent contexts in prior PCG research, most commonly as a visualisation tool. They can be applied directly to the encoded representations of game content, as in the work of Justesen et.al. ([21]) which used PCA to produce visualisations of the distribution of their generated levels. Withington and Tokarchuk applied PCA and other DR algorithms to generated levels to explore whether this could be used as a useful alternative to ERA without the need for metric calculation [19]. Chang and Smith applied a diferent DR algorithm, t-SNE, to images from playthroughs of game levels rather than the encoded levels themselves to visualise reachable play states as part of their Diferentia tool [22]. They can also be used as a generative step, as in the work of Summerville and Mateas who PCA as an intermediary step in their approach for procedurally generating Zelda levels ([23]).

While this is the first application of unaugmented DR algorithms to combine sets of metrics that we are aware of, it does bear significant resemblance to another work from Withington and Tokarchuk which used Convolutional Neural Networks (CNNs) to produce compressed representations of sets of metrics for evaluating game levels [24]. Our approach of using PCA on its own lacks the sophisticaiton of CNNs but has the advantage of being significantly simpler to calculate and implement.

3. Methodology

it adapts to the terrain, both on a large and small scale.

Functionality is about what afordances the settlement provides, both to the Minecraft player and the simulated In this section we explore and explain our process for villagers. It covers various aspects, such as food, produc- generating and synthesising isovist information from tion, navigability, security, etc. Narrative reflects how Minecraft maps, as well as our methods for using this well the settlement itself tells an evocative story about data from pre and post settlement generation to proits own history, and who its inhabitants are. Finally, Aes- duce useful insights about the properties of the generator. thetic is a rating of the overall look of the settlements. In While the method used here is very specific to this dothe competition, the rating of each category is computed main, we argue that our techniques or at least a subset of for each generator by averaging (mean) across all judge’s them could be useful when analysing almost any content scores. generator within games.

The GDMC has been used as a test bed for PCG evaluation studies [26, 15, 16]. Minecraft is an interesting 3.1. Data Gathering and Preparation test subject in that regard. It is mostly known for its open-ended nature and has been compared to LEGO on 3.1.1. Map Data Used - GDMC Entries a computer. Even though the game ofers a main objec- To evaluate the novel techniques and approaches that tive, it is mostly used as a sandbox game. Many players we explore and explain in this paper, we apply them to use the block mechanic to terraform the game world, the 2022 entries to the Generative Design in Minecraft create structures such as houses, castles, or cities, and competition. This dataset is highly useful for several play the game according to self-imposed goals and chal- reasons. Firstly the map data is publicly available (Found lenges. Since the art style and setting of Minecraft are at gendesignmc.engineering.nyu.edu/results), meaning very generic, the game afords free creation of almost it can be used both by us but also by other researchers any kind of artifact, with only the player’s imagination looking to modify our methods with the same data. The setting the limits. Automatic evaluation of all the com- code used to produce them is also publicly available for ponents of a Minecraft’s map is both challenging and most generators, meaning we can in future both explore relevant in the larger context of PCG evaluation [15]. the relationship between the architecture of the generator’s code and its output using this paper’s techniques, as 2.4. Isovists and Game Spaces well as exploring the extent to which our findings about individual generators extend to their use on alternative base maps.

One of the most important and unique aspects of this dataset though is the presence of the base maps. As discussed in Section 2.3, the way the GDMC is organised is that entry generators have to be designed to produce settlements on any pre-existing terrain, then a set of actual maps to be used as test beds is decided by the GDMC organisers and used for all entries. Firstly this means that we can directly compare the alternative generators as they all use the same base map to generate on. More excitingly however, it lets us do pre and post generation analysis on the settlements, meaning we can drill into what specifically a given generator changed about the map. As discussed earlier this more closely mirrors the way that PCG systems are implemented in the game industry i.e as a generative pipeline of processes that iterate on a single artefact rather than the one shot generation of complete artefacts.

Isovists have been developed with the intent of capturing perception of space [27]. Given a bounded environment, for each point , its isovist [27] is the set of all the points visible from . As a result, we can compute the isovist of every positions in the space. Each isovist is characterized by a range of scalar values such as their size, shape, intervisibility, etc [27, 28, 29, 30] These properties have been shown to have some correlation of how a given environment is experienced and appreciated [31, 32].

Space in general can be analyzed to understand and even predict human and social behavior [33, 34, 35]. Similar analysis have also be made for video games already [36, 37], connecting the 3D space to diferent layer of experience, such as narrative, gameplay, social interactions, etc.

Hervé and Salge [26] already applied the isovist theory to Minecraft, with results suggesting that they could be used as new way to evaluate PCG artifact. Rather than an holistic approach, isovists allow to focus on one specific spot of the map, and extract the experience for that given location. In the rest of this paper, we will be using the data coming from the Hervé and Salge study. 3.1.2. Isovist Calculation

As discussed in Section 2.4, the concept of an isovist

comes from the domain of real world space analysis.

The concept was brought over to the domain of game level evaluation in ‘Automated Isovist Computation for Minecraft’ [ 26] , and it is the method from that paper that we use to produce our base metrics in this work. For a more detailed explanation of how the isovist metrics are calculated please see A, or [26]. Hervé and Salge introduced 13 distinct metrics which can be calculated for a given standable location in a Minecraft map. These are all inspired by the concept of isovists and are intended to capture some aspect of the player’s experience of the space. These isovist metrics can then be calculated for either all or a subset of the standable locations within a specific area of a Minecraft map, giving us a set of locations and the associated isovist metrics for them. 3.1.3. Isovist Metric Compression or map determined by the top two PC values for the isovist metrics at the location. This is very similar to conventional ERA in the form popularised by Smith and Whitehead [10] except with two distinctions. Firstly, all of the data points are derived from locations within a single map rather than each point representing a complete artefact. Secondly, we are efectively visualising diversity in terms of all 13 of the isovist metrics due to the PCA preprocessing, rather than being limited to a choice of two metrics which traditional ERA is.

As in conventional ERA these plots give us an idea of the relative diversity present in alternative content generators by how widely distributed the isovist data from a map is in this space. It can also give us insight into the extent to which two generators are producing similar output by examining the similarity between distribution shapes in these heatmaps. Heatmap similarity is a strong indicator that the efect two generators are having on a base map is broadly alike, at least in terms of their isovist metric values. While we lose the benefit of conventional ERA of highlighting exactly what combinations of metric values are likely and unlikely to be produced, we argue this is ofset by the gain in being able to qualitatively understand diversity in terms of multiple metrics in the form of principal components.

To facilitate visual inspection and comparison between isovist sets we wanted to reduce the dimensionality of the 13 metric dataset down to two dimensions, similar to more conventional ERA except rather than each point representing a diferent artefact instead they represent a location within an artefact. There are several options for producing or selecting a 2D version of the data (See Section 2.1). In this work we compress the assembled set of metrics using PCA. PCA calculates the linear combinations of underlying features which explain the maximal variance in the data, meaning we can take and visualise the two two most explanatory components and visualise them in a 2D plot while maintaining a robust amount of variance in the underlying isovist metric data. 3.2.2. Principal Components as Map Overlays

Before applying PCA we pre-process the metric data The second technique we use for gaining insights into the by centering it, removing the mean of each metric and generators is overlaying values derived from applying scaling to unit variance. This is standard practice when PCA to the actual Minecraft maps themselves. By taking applying PCA to ensure no underlying variable has an an overhead view of the map and then colour coding locaoutsized impact due to having a large range of values. In tions within that map based on the principal component our results we report the actual contributions of each un- values found at that location we can further qualitatively derlying metric to the principal components themselves evaluate a generated space, and focus our investigation of to explore the extent to which all metrics are actually rep- it. If a designer wanted a generated space which provided resented. These principal components were calculated a consistent experience while traversing it as a player, for the assembled set of all generated maps they might look for smoothly changing values in their maps, whereas if they wanted a map with a high amount 3.2. Generator Insight Methods of notable locations which stood out from their surroundings, then they would hope to see many locations with In this subsection we describe the methods that we use radically diferent values to their surroundings. Most imto gather insights about individual maps and map genera- portantly however for our use case, they support useful tors, as well as methods highlighting noteworthy isovists and direct comparison between generated settlements within the generated settlements. We aim to describe the and the base map. Using this approach we can see both technical process for arriving at these insights, as well the extent to which a generator changed the base map, what useful information we gather from them about the but also where these changes were located. underlying generators. In this paper we produce these overlays by first taking an overhead view of the Minecraft map from in game. 3.2.1. Principal Component Expressive Range We then take the values for only the first principal comAnalysis ponent as this is the one that explains the most variance in the underlying metric data, and then localise these values on the map based on the location of the isovist that produced them. These can then be colour coded on a continuous scale based on their value. This is a direct

The first method we explore for gaining insight into our

roster of Minecraft settlement generators is to visualise the derived isovist principal components in a scatterplot, with the position for each location in the settlement evolution of one of the approaches used by Herve and Salge in the paper that introduced the isovist metrics to Minecraft [ 26], except they looked at the values for individual metrics. Note that because we are viewing the map from vertically above and only taking the highest isovist for each location, we averaged values for every coordinate pair, starting from a vertical threshold corresponding to the common ground level. This means that any influence a generator is having on the map below the top layer of blocks, such as generating cave networks beneath the surface, would likely not be visible. well, for example by not having too large an influence on an already desirable base map, or it could highlight that the generator is having too extreme an impact.

4. Experimental Details In this section we describe the set up of the pilot experiments that we ran to explore our concept of generative shift in the context of Minecraft settlement generation. 4.1. Isovists Calculated

3.2.3. Calculating Generative Shift

Our experiment runs on the 20 entries from 1(of 2) map

from the 2021 GDMC competition. The map we have focused on is the ’volcano map’ used in the 2021 GDMC as well as in the following year’s competition.

We compute isovists on surfaces where a player can stand, at a height matching the camera position for the given location and a radius of 256 blocks (the default view distance in Minecraft). For each visible coordinate, we check the type of block, if it is transparent, and if it is a location where a player can stand [26].

The default map, without any structure, has 92560 possible surfaces, and a single isovist takes several seconds to be computed, depending on its size. We therefore proceeded to the following subsampling of our data: for each height value (Y coordinate in Minecraft), we compute 1 isovist out of 10 possibles, picked randomly. This subsampling significantly reduces the computation time, and previous test showed it wasn’t impacting the global results [26].

The final and most innovative technique we explore is what we term Generative Shift Analysis (GSA), by which we mean the change in characteristics of a location before and after a stage in a generative pipeline, in our case the generation of a settlement. More specifically we are interested in highlighting both the general trend of the shift caused by a given generative step, as well as highlighting the locations which experienced the largest shift in their isovist metrics pre and post generation. This gives designers a view of locations that experienced the biggest change in their characteristics as a result of a generative step.

To calculate this we take the top two principal component data points for locations in a generated settlement, and pair them up the data from the matching locations in the base map. This gives us a set of pairs of 2D vectors, with one representing the combined isovist metric information for that location before a settlement was generated, and the second representing the same location afterwards. We can then calculate the 2D vector 4.2. Highlighted Maps that takes us between these two points, with its magni- While we have access to 20 diferent settlement generatude acting as a heuristic for how much a location was tors, calculating the generative shift for all 20 would dechanged by the settlement being generated in terms of liver a potentially overwhelming amount of information its isovist metrics. This distance is what we refer to as to dig through without further clarifying the strengths ‘generative shift’. We then rank locations by those which and weaknesses of the technique. As a result we opted to experienced the largest shift, allowing us to identify the focus on only two generators. Generators 6 and 15 were top X most transformed locations by a given generator. the two highest performing generators: Generator 6 had These locations can then be manually inspected in game. the highest score for the Aesthetic criteria and Generator

There is a significant amount of qualitative informa- 15 was the overall highest scoring entry across all evalution that we hope to gain from applying GSA to our data ation criteria. To get an idea of the character of the two set. As discussed in Section 2.1, an issue with the ma- generated settlements as well as the base map see Figure jority of techniques for understanding and visualising 1. generative spaces is that they only aford a high level abstracted view. Here instead we provide a justification for examining specific locations within a generated settlement, with a view to gaining insight into the generator as a whole. By seeing the extremes of what a generative step can do to the base artefact it gives a designer a concrete idea of how transformative a generative step can be overall. Depending on a designer’s goals this could either increase their confidence that the generator was working

We felt that if we were only going to evaluate a subsample of generators then it made sense to select high performing generators as any insights into how they function might be more generally useful than insights gained into less high performing systems. However this decision is somewhat arbitrary, and exploring the extent to which our generative technique approach works for more diverse generators is a natural extension of this work.

4.3. Location Selection

To reduce the compute and processing time required to pair up and calculate the generative shift between isovists from the same location in the base map and generated settlements, we opted to only find pairings for 2% of map locations and then calculate the generative shift for these only. This 2% was selected stochastically during the pairing process. This sub-sampling was only required in this case due to the specific process used for gathering isovist data which was inherited from [26]. This did not produce data that was easy to pair in both the base maps and generated settlements, and therefore required large numbers of euclidean distance calculations to find matched pairs. A revised data gathering process which simultaneously gathered isovist data from the same location in both maps could avoid this limitation.

While the 2% threshold was largely arbitrary and based on the resources available for our pilot experiment, the intention for GSA is that it will still give useful insights even when only sampling a fraction of possible locations. In fact for most 3D game domains which are structured as continuous spaces rather than the large cube voxels of Minecraft, some form of sub-sampling would always be required.

5. Results and Discussion This section discusses the strengths, weaknesses and

interesting aspects of the approach we have introduced and explored, as well as highlighting valuable directions for future inquiry that we intend to pursue ourselves or that we feel would be useful for the field. 4.4. Software On the application of PCA to sets of metrics, we found The code for extracting the isovist data from the this to be intuitive and easy to apply, and we believe it Minecraft maps was written in Python, and relies on the could be useful in other contexts in which synthesising GDMC’s framework (https://github.com/avdstaaij/gdpc) multiple metrics is desirable, not just when applying our in order to collect game’s data, and Cuda for computa- approach of Generative Shift. The metrics themselves tional acceleration (https://developer.nvidia.com/cuda- proved relatively compressible despite all being relevant, python). with the Top 2 PCs explaining over 65% of the total vari

The code for pairing up isovists, compressing their met- ance in the data (PC 1 explained 52% variance and PC 2 rics to a 2D vector using PCA, calculating their generative 13.3%). Being able to combine multiple phenotypic metshift and visualising it was written in Python and is avail- rics let us understand the changes caused by a generator able at: (https://github.com/KrellFace/gen_shift_analy- in a more holistic way and it directed us to a more diverse sis). To apply PCA to the isovist metrics we used the set of significantly altered locations (Figures 2) than we scikit-learn Python package (https://scikit-learn.org). believe would have been discoverable with a smaller set. It also had the unexpected benefit of confirming to an extent that all 13 of the isovist metrics were capturing diferent aspects of location diversity, as we can see in the relatively even contribution between diferent metrics in A direction for future study is exploring whether we can quantify the amount of generative shift that a designer wants to see produced by a system, and whether this could be used to guide a generative process. In future work we aim to explore whether such a concept could be operationalised in a generative system by using generative shift as a pseudo fitness function to produce optimal amounts of changes to a pre-existing artefact.

Another future direction we are excited about is one which explores the relationship between player experience and smoothness of change in isovist metric values.

This builds on from the map overlay heat-maps in Figure 2 and how they allow us to easily visualise how the compressed metric values shift geographically within the map. Our hypothesis is that maps with isovist metric values that only change gradually as a player navigates the space are likely to be more relaxing or perhaps even boring than maps which present the player with more sudden changes in isovist metrics when moving. This is something we aim to explore in future, possibly with user studies using this same data set.

6. Conclusion

In this paper we introduced Generative Shift Analysis, a novel approach for analysing generated 3D spaces through the use of combined isovist metrics, and the changes produced in them by a generative process that uses a pre-existing virtual environment as its base. While this is only an early exploration of these concepts we still conclude that this is a potentially valuable avenue for PCG research to explore and one that we are excited to expand upon.

Acknowledgements We thank the reviewers for providing insightful and detailed feedback. This work was supported in part by the EPSRC Centre for Doctoral Training in Intelligent Games & Games Intelligence (IGGI) [EP/S022325/1].

machine learning (pcgml), IEEE Transactions on tion of the hidden judging criteria in the generative Games 10 (2018) 257–270. design in minecraft competition (2023). [3] A. Khalifa, M. C. Green, D. Perez-Liebana, J. To- [16] J.-B. Hervé, C. Salge, Comparing pcg metrics with gelius, General video game rule generation, in: human evaluation in minecraft settlement genera2017 IEEE Conference on Computational Intelli- tion (2021). gence and Games (CIG), 2017, pp. 170–177. doi:10. [17] A. Summerville, J. R. Mariño, S. Snodgrass, S. On1109/CIG.2017.8080431. tañón, L. H. Lelis, Understanding mario: an eval[4] J. M. Font, T. Mahlmann, D. Manrique, J. Togelius, uation of design metrics for platformers, in: ProA card game description language, in: European ceedings of the 12th international conference on Conference on the Applications of Evolutionary the foundations of digital games, 2017, pp. 1–10.

Computation, Springer, 2013, pp. 254–263. [18] C. Lamb, D. G. Brown, C. L. A. Clarke, Evaluating [5] T. Mahlmann, Modelling and generating strategy computational creativity: An interdisciplinary tugames mechanics, IT University of Copenhagen, torial, ACM Comput. Surv. 51 (2018). URL: https: Center for Computer Games Research, 2013. //doi.org/10.1145/3167476. doi:10.1145/3167476. [6] J. Togelius, R. De Nardi, S. M. Lucas, Towards [19] O. Withington, L. Tokarchuk, Compressing and automatic personalised content creation for rac- Comparing the Generative Spaces of Proceduing games, in: 2007 IEEE Symposium on Com- ral Content Generators, in: 2022 IEEE Conputational Intelligence and Games, IEEE, 2007, pp. ference on Games (CoG), IEEE, Beijing, China, 252–259. 2022, pp. 143–150. URL: https://ieeexplore.ieee.org/ [7] A. Pantaleev, In search of patterns: Disrupting rpg document/9893615/. doi:10.1109/CoG51982.2022. classes through procedural content generation, in: 9893615.

Proceedings of the The third workshop on Proce- [20] O. Withington, L. Tokarchuk, The right variety: dural Content Generation in Games, 2012, pp. 1–5. Improving expressive range analysis with metric [8] J. Doran, I. Parberry, A prototype quest genera- selection methods, in: Proceedings of the 18th tor based on a structural analysis of quests from International Conference on the Foundations of four MMORPGs, ACM International Conference Digital Games, 2023, pp. 1–11.

Proceeding Series (2011). doi:10.1145/2000919. [21] N. Justesen, R. R. Torrado, P. Bontrager, A. Khal2000920. ifa, J. Togelius, S. Risi, Illuminating Generalization [9] G. Smith, What do we value in procedural content in Deep Reinforcement Learning through Procedugeneration?, in: Proceedings of the 12th Interna- ral Level Generation, arXiv:1806.10729 [cs, stat] tional Conference on the Foundations of Digital (2018). URL: http://arxiv.org/abs/1806.10729, arXiv: Games, 2017, pp. 1–2. 1806.10729. [10] G. Smith, J. Whitehead, Analyzing the ex- [22] K. Chang, A. Smith, Diferentia: Visualizing Increpressive range of a level generator, in: Pro- mental Game Design Changes, Proceedings of the ceedings of the 2010 Workshop on Procedu- AAAI Conference on Artificial Intelligence and Inral Content Generation in Games - PCGames teractive Digital Entertainment 16 (2020) 175–181. ’10, ACM Press, Monterey, California, 2010, pp. URL: https://ojs.aaai.org/index.php/AIIDE/article/ 1–7. URL: http://portal.acm.org/citation.cfm?doid= view/7427. doi:10.1609/aiide.v16i1.7427. 1814256.1814260. doi:10.1145/1814256.1814260. [23] A. Summerville, M. Mateas, Sampling Hyrule: [11] A. Summerville, Expanding expressive range: Eval- Multi-Technique Probabilistic Level Generation for uation methodologies for procedural content gen- Action Role Playing Games, Proceedings of the eration, in: Proceedings of the AAAI Conference AAAI Conference on Artificial Intelligence and Inon Artificial Intelligence and Interactive Digital En- teractive Digital Entertainment 11 (2021) 63–67. tertainment, volume 14, 2018. URL: https://ojs.aaai.org/index.php/AIIDE/article/ [12] M. Cook, J. Gow, S. Colton, Danesh: Helping bridge view/12817. doi:10.1609/aiide.v11i3.12817. the gap between procedural generators and their [24] O. Withington, L. Tokarchuk, Visualising Generaoutput (2016). tive Spaces Using Convolutional Neural Network [13] J. Mariño, W. Reis, L. Lelis, An empirical evalua- Embeddings, 2022. URL: http://arxiv.org/abs/2210. tion of evaluation metrics of procedurally gener- 17464, arXiv:2210.17464 [cs]. ated mario levels, in: Proceedings of the AAAI [25] C. Salge, M. C. Green, R. Canaan, J. Togelius, GenConference on Artificial Intelligence and Interac- erative design in minecraft (gdmc) settlement gentive Digital Entertainment, volume 11, 2015. eration competition, in: Proceedings of the 13th [14] K. Compton, So you want to build a generator International Conference on the Foundations of (2016). Digital Games, 2018, pp. 1–10. [15] J.-B. Hervé, C. Salge, H. Warpefelt, An examina- [26] J.-B. Hervé, C. Salge, Automated Isovist Com

A. Set-based definitions of isovist metrics

Given a point and its isovist , is called the centroid of . The lines connecting and the boundary of are referred to as radials , . is also defined by its the visible area ( ), and its perimeter ( ). It is worthwhile to note that is defined by ”real-surfaces” which are defined by Benedidkt as “opaque, material, visible surface, mean Radials

Drift Vista Length

∶= ( ∶= ,() ∶= ( , ) , ) (7) (8) (9) The visible head spaces , are basically all positions an avatar could be in and then see its head from its current position. − 2 are all the standable blocks supporting those headspaces. , provides a perimeter of blocks that limit our view and contains all blocks visible from the current position. , is a list of all walkable blocks, obtained by floodfilling from the standable block supporting the current position, within steps. We use usual Minecraft movement rules, that allow moving up by one block per lateral transverse, and dropping down to lower levels. Note how the features of avatar height, movement rules, and vision sensors could afect those basic sets. The following properties can now be computed by operating on those sets alone, without having to recompute them.

The function (.) counts how many diferent types of

blocks are in a set.

[1]

Togelius ,

G. N.

Yannakakis ,

K. O.

Stanley ,

Browne , Search-based procedural content generation: A taxonomy and survey , IEEE Transactions on Computational Intelligence and AI in Games 3 ( 2011 ) 172 - 186 . doi: 10 .1109/TCIAIG. 2011 . 2148116 .

[2]

Summerville ,

Snodgrass ,

Guzdial ,

Holmgård ,

A. K.

Hoover ,

Isaksen ,

Nealen ,

Togelius , Procedural content generation via