=Paper= {{Paper |id=Vol-3926/paper9 |storemode=property |title=Axes of Characterizing Generative Systems: A Taxonomy of Approaches to Expressive Range Analysis |pdfUrl=https://ceur-ws.org/Vol-3926/paper9.pdf |volume=Vol-3926 |authors=Josiah Boucher,Gillian Smith,Yunus Telliel |dblpUrl=https://dblp.org/rec/conf/exag/Boucher0T24 }} ==Axes of Characterizing Generative Systems: A Taxonomy of Approaches to Expressive Range Analysis== https://ceur-ws.org/Vol-3926/paper9.pdf
                         Axes of Characterizing Generative Systems: A Taxonomy of
                         Approaches to Expressive Range Analysis
                         Josiah Boucher1,* , Gillian Smith1,† and Yunus Doğan Telliel1,†
                         1
                             Worcester Polytechnic Institute (WPI)), 100 Institute Road, Worcester MA, U.S.A.


                                            Abstract
                                            Seeking to leverage Expressive Range Analysis (ERA)- a method of characterizing generative systems- for analysis of generative AI
                                            (GAI) systems and their outputs, this paper categorizes the approaches that have been used for ERA. We present a taxonomy with five
                                            axes of characterization that may be applied to ERA methodologies: content agnostic vs semantic; quantitative vs qualitative; product
                                            vs process; objective vs subjective; and automated vs manual. While ERA has traditionally been limited to the domain of Procedural
                                            Content Generation (PCG) in video game development, we recognize parallels between PCG and GAI and hope to see an expansion of
                                            the application of ERA into the domain of GAI. Serving this goal, these axes provide metrics through which to categorize, compare, and
                                            explore approaches.

                                            Keywords
                                            Expressive Range Analysis, Procedural Content Generation, Generative AI, Evaluating Generative Systems, Taxonomy



                         1. Introduction                                                                                               e.g. some decision points for input selection could include
                                                                                                                                       whether to handcraft inputs or sample the latent space, and
                         Expressive Range Analysis (ERA) is a method of character-                                                     whether the inputs should be restricted in a way that targets
                         izing the nature and shape of a generative model in terms                                                     a specific domain of outputs (i.e. a prompt to output a
                         of its outputs. What sorts of outputs is a generator capable                                                  haiku would be very different from one that might produce a
                         of? What impact does changing inputs of a generator have                                                      cooking recipe). Furthermore, ERA requires the application
                         on its outputs? How do biases manifest in the generative                                                      of metrics to large quantities of content outputted from the
                         outputs, and how do the inputs influence these biases? ERA                                                    system in question. What metrics are used and how they
                         is well suited for answering these questions [1]. ERA comes                                                   are determined is a major component of ERA, often directly
                         from the domain of Procedural Content Generation (PCG)                                                        determining the value provided by the analysis. Because
                         in video game development: the practice of leveraging algo-                                                   GAI outputs tend to occupy a broad domain of applications
                         rithmic methods to design and produce artifacts for use in                                                    and mediums, unique challenges arise when considering
                         games across a variety of contexts, such as levels or maps                                                    these metrics, further complicating the application of ERA
                         [2].                                                                                                          in this context.
                            We recognize common threads between PCG and genera-                                                           Responding to 1) these parallels between PCG and GAI,
                         tive AI (GAI) systems- defined here as the process of using                                                   and 2) the massive increase of scope presented by GAI, this
                         generative technologies such as large language models to                                                      paper presents a taxonomy for categorizing approaches to
                         produce content, often using natural language prompts as                                                      ERA. This taxonomy includes vocabulary to better distin-
                         human-provided input- including high-level functionality,                                                     guish examples from existing work in PCG, as well as a
                         motivations, use cases, and shortcomings of both domains                                                      compass to guide further exploration of using this method
                         (for details, see Section 2). Because of these similarities,                                                  in the rapidly expanding domain of GAI. The goal of this
                         we seek to expand and leverage ERA—which has not only                                                         taxonomy is to push the boundaries of what ERA may be
                         proven useful for recognizing bias within, and categorizing                                                   used for and how it may be applied.
                         generative outputs of PCG systems, but also resembles some
                         existing approaches for GAI analysis [3]—for analysis of GAI
                         system outputs, especially text-to-X, large language model                                                    2. Related Work
                         (LLM)-based applications (such as the text-to-text Chat GPT
                         [4], text-to-image Stable Diffusion [5], and text-to-speech                                                   This work operates in the intersection of procedural content
                         ElevenLabs [6]).                                                                                              generation and generative AI. Guzdial provides a valuable
                            The application of ERA in the domain of GAI is not a one-                                                  bridge between these domains with the lens of human-AI
                         to-one translation from how the method is used in PCG. Use                                                    interactive generation [8]. While Guzdial uses this lens to
                         of GAI tools commonly requires more complicated, varied,                                                      frame PCG and GAI as essentially the same process, we view
                         and linguistic inputs than PCG, which tends to operate from                                                   PCG as a broader term describing the process of content
                         numerical randomization as a starting point for much of its                                                   generation at its highest level, and GAI as a more descriptive
                         generation. Because these inputs have a major impact on                                                       term identifying a subset of generative practices that use
                         the outputs of GAI systems [7], selection of these inputs                                                     specific technologies. Framing GAI as a subset of PCG—or
                         is an important consideration for applying ERA to GAI—                                                        using Guzdial’s lens of human-AI interactive generation
                                                                                                                                       [8]—broadens the scope of understanding for both domains
                         11th Experimental Artificial Intelligence in Games Workshop, November                                         and allows the application of evaluation methods of PCG
                         19, 2024, Lexington, Kentucky, USA.                                                                           for analysis of GAI systems.
                         *
                           Corresponding author.                                                                                          Particularly, we are interested in leveraging expressive
                         †
                           These authors contributed equally.                                                                          range analysis for evaluation of GAI systems [1]. Withing-
                         $ jdboucher@wpi.edu (J. Boucher); gmsmith@wpi.edu (G. Smith);                                                 ton et al. list expressive range analysis as one of 12 "features"
                         ydtelliel@wpi.edu (Y. D. Telliel)
                          0009-0002-6865-9031 (J. Boucher); 0000-0002-2765-7702 (G. Smith);
                                                                                                                                       used for comparison of PCG systems [9]. We consider ERA
                         0000-0001-5651-7349 (Y. D. Telliel)                                                                           a promising method of GAI analysis compared to alternative
                                        © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                        Attribution 4.0 International (CC BY 4.0).
                                                                                                                                       PCG-evaluation practices due to its flexibility in applying

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
a broad range of analysis metrics to its evaluated systems,              • 2) Determine useful metrics for the goal of analysis
particularly highlighting its strength of identifying bias and             (i.e. linearity or leniency of a level) and apply them
system tendencies- including as influenced by user input                   to the data.
[1]. Withington et al. identifies weaknesses with current                • 3) Produce a visualization of the metered data space.
evaluation methods for PCG systems, suggesting they could
be mitigated by diverse research frameworks and promoting               Two challenges arise when considering this method for
the reuse of methodology where possible [9]. This paper             GAI-generators as opposed to PCG-generators. First, pro-
seeks to answer this call by broadening the applicability of        ducing a set of data to analyze tends to be more complicated.
ERA.                                                                GAI text-to-X applications require linguistic input, leading
   We seek to do so by presenting a taxonomic framework             to larger variance of the input space in terms of both quan-
for categorizing approaches used for ERA, drawing from              tity and meaning. Further complicating the issue, PCG tools
similar work in PCG such as Togelius et. al. [10] and Smith’s       are typically custom-made for specific use-cases [18], where
[2] taxonomies of PCG, as well as Withington et al.’s modern        GAI tools (such as ChatGPT [4] or Google’s Gemini [24])
taxonomy of available evaluation approaches [9].                    are commonly presented as general-purpose—while random
   Further outlining the connection between GAI and PCG,            inputs could be provided in place of handcrafted ones or the
we consider the motivations and use-cases for such sys-             latent space could be randomly sampled, these approaches
tems. PCG and GAI are both used to automatically generate           would provide undesirable outputs, lacking the targeted fo-
large amounts of content, and to increase variety of content        cus of a particular use-case. The general-purpose nature of
[11, 12]. PCG is also used for assistive tools, helping with        GAI systems adds noise to the output space when looking
tasks such as level creation [13], and support tools have           at these tools for specific use-cases, since outputs that don’t
emerged to make these systems easier to understand [14].            fit the specific use-case become irrelevant. Guzdial summa-
LLM applications like ChatGPT [4] and Stable Diffusion              rizes this aspect as GAI tool developers seeking to expand
[5] are promoted for their potential to reduce labor costs,         the possible valid outputs for a particular tool to include all
enable new business models, and increase access to content          possible valid outputs for every tool [8]. This noise presents
creation– Cook claims all of these as common motivations            additional challenges: how do you narrow the scope of the
for studying AI in games [15]. We also acknowledge the              output to only match the relevant use-case? What prompts
use of machine learning in PCG [16, 17] as a point that             do you provide as an input to produce a suitable set of data
highlights the connection between PCG and GAI.                      to categorize the generator as a whole? Guzdial’s human-
   We also recognize parallel claims of GAI and PCG                 centered input alignment [8] may prove a useful attribute
capabilities—and shortcomings thereof—that may provide              of consideration to respond to this challenge.
useful points of interest for continuing this investigation.            The second challenge that arises from considering ERA
Developing a useful PCG generator often takes just as much,         for use in analyzing GAI systems comes from the metrics of
or more, effort than hand-crafted alternatives [18], despite        analysis. ERA can be used alongside any metrics, depending
the allure of rapid content production. Furthermore, the            on what a generative system is being analyzed for. These
variety content generators are able to produce can be lim-          metrics must be applied to large quantities of data, in some
ited, as players are often capable of identifying patterns          cases encompassing the entire generative output of a system
between generated content [19]. While Generative AI of-             [1]. Furthermore, many examples of ERA employ automatic
fers potential to expand the type of content that can be            processing of data, but not all data and research goals bene-
generated—such as more complex generative audio and mu-             fit from automatable metrics- which tend to be quantitative
sic used to increase variety and interactivity of game music        in nature. The challenge here is twofold: how do you de-
compared to non-generative methods [20], as well as the             termine semantically meaningful metrics for increasingly
range of users that can make use of generators—we also              variable data? And, accounting for practical workload and
view the drawbacks of PCG as rich sites of investigation in         feasibility, how do you apply these metrics to increasingly
GAI. ERA has proven a useful method for identifying such            multitudinous data?
weaknesses in the domain of PCG, and we hope to see its                 We treat producing an intended result as the motivation
effectiveness applied to GAI as well.                               behind creating and using generators. While some genera-
   GAI technologies have prompted significant concern—i.e.          tors offer promises of increased speed and efficiency, those
regarding its social and environmental impact, underlying           claims are not always accurate [18]—despite this, genera-
politics of AI design, implementation, and advocacy [21].           tors are valued by many as useful tools. We recognize the
Additional concerns include embedded systemic biases, risk          purpose of ERA as evaluating the success of that motivation.
of plagiarism and misinformation, and human and environ-            Using ERA is akin to asking: has this this generator suc-
mental costs [22]. Jiang et al. find that artists, in particular,   cessfully captured what it was designed to capture? With
identify harms to professional reputation, intellectual prop-       this framing in mind, our motivation for this paper is to
erty, and financial risk [23]. Because of GAI’s recognized          facilitate the answering of that question in a greater variety
potential for harm, we value tools and methods for under-           of contexts. The challenge in the case of GAI is, in short:
standing and analyzing these systems and their outputs.             how?


3. The Problem Space                                                4. Framework
Summarizing the process of Expressive Range Analysis, this          Here, we present a taxonomy of ERA methods, seeking to
form of inquiry operates in three steps [1]:                        provide language and framing to better address the chal-
                                                                    lenges of applying ERA to a greater variety of contexts—
     • 1) Start with a set of images or similar data (i.e. lev-     especially GAI systems. We have defined five axes for char-
       els, maps, etc.) produced by the generator being             acterizing ERA methodologies: quantitative vs qualitative,
       analyzed.
product vs process, objective vs subjective, content agnostic     produce such an output. Kreminski et al. [25] arguably
vs semantic, and automated vs manual.                             present a process-focused approach, because their motiva-
   We considered three factors as a basis for defining these      tion for analysis seeks to evaluate the usability experience
axes: the origin and definition of ERA, the adopted practice      of a generative system. However, the metrics used are pri-
of this method, and its potential for further expansion and       marily focused on the product of the generator. A more
refinement. We considered the origin of ERA [1], identifying      process-centric example would include metrics that evalu-
decision points in how the method may be applied. We              ate procedural aspects such as the time it takes to generate
also considered how ERA has been adopted in research              an output, or the number of generative attempts before a
endeavors and sought to challenge assumptions that have           user finds a suitable option.
become commonplace. Finally, this taxonomy is also a result          Shaker et. al. [11] highlight generator reliability as one
of our efforts to address challenges we faced in applying         important piece of PCG evaluation. Looking at this aspect
ERA to GAI. We hope to further refine these axes and their        provides a mixed-method approach that tends to use product
definitions in future work.                                       to reveal something about the process.

4.1. Quantitative vs Qualitative                                  4.3. Objective vs Subjective
The metrics that are applied to the generator-produced            This axis is concerned with the nature of the analysis
data may be quantitative, qualitative, or varying degrees of      metric—is a given metric verifiable based on factual evi-
mixed-method. Many traditional ERA approaches utilize             dence, or does it vary with perspective, based in emotion or
quantitative metrics, as this approach is often more suitable     opinion? Smith and Whitehead’s two metrics for Launchpad
for automatically applying metrics to data and producing          again provide a useful example: Linearity, as described in
a descriptive visualization. Smith and Whitehead use two          their paper, is an objective metric that describes the factual
metrics for applying ERA to Launchpad- linearity and le-          "profile" of platforming levels (how well the geometry of
niency [1]. Both of these metrics are quantitative, because       the level fits a straight line) in the evaluated system [1].
they are described, measured, and depicted numerically.           Leniency, however, is identified as a subjective score based
Interestingly, Kreminski et al. [25] identify determining         on an intuitive sense of how lenient components of a level
quantitative metrics as an essential step of ERA, which is        are towards a player [1]. Subjective metrics are relatively
consistent with the examples of the method’s initial pre-         underrepresented in PCG, but may prove useful for evaluat-
sentation [1]- it is therefore unsurprising that Qualitative      ing GAI systems– especially as they are applied in creative
metrics are relatively under-utilized in ERA. This taxon-         domains. We consider user experience analysis a useful
omy challenges this assumption—which is present across            point of reference for how subjective metrics may be used
many ERA-focused research endeavors—instead suggesting            in data analysis and visualization [27, 28].
an expansion of valid data collection efforts.
   While there is not a strong foundation of qualitative met-     4.4. Content Agnostic vs Semantic
rics applied to ERA in PCG, some hypothetical qualitative
metrics could come from methods like user surveys or inter-       This axis is concerned with the data itself and the value
views, such as those found in play-testing efforts to evaluate    sought from analysis. Semantic approaches seek to find
elements of a game like challenge or engagement. These            meaning within the context of the inputs and outputs of the
metrics could be visually represented with tools such as          generator. Smith and Whitehead [1] present a semantic ap-
word clouds, affinity charts, or heat-maps that highlight the     proach, as their metrics are intended to allow comparison of
frequency of thematic elements within the data, for example.      the generated content. Lucas and Volz [29] provide another
                                                                  example of a semantic approach, as theirs is also intended
                                                                  to compare generated content.
4.2. Product vs Process
                                                                     Content agnostic approaches seek to find meaning in
Because our goal is to evaluate if a generator is successful      the generator, regardless of its inputs or outputs- though
in capturing what it was designed to do, it is interesting to     the inputs and outputs may be useful for analysis. While
consider both the product of a generator and the process          Kreminski et al. [25] evaluate the product of the genera-
of producing that output. The product of a generator is its       tor, their interests lie in the users of the generative system-
output- an image, the answer to a question, or a video game       answering questions such as how thoroughly they explore
level are all examples of this. Analyzing the product is useful   the generative range- rather than finding meaning from
for identifying what a generator is capable of- in terms of       the generative output itself, making their approach content-
quality, variety, etc.- and what sort of biases are present in    agnostic. Withington’s exploration of quality-diversity al-
the outputs. Smith and Whitehead [1] present a product-           gorithms [26] presents another content-agnostic example
focused approach, as their linearity and leniency metrics         because the focus is not on the details of specific outputs,
consider only the levels produced by the generative system.       but rather measuring the differences between them.
Withington’s [26] approach is also product-focused, since
this work considers the differences between outputs of a          4.5. Automated vs Manual
generator rather than considering the process of producing
those outputs.                                                    This axis is concerned with the process of applying metrics
   The process entails the experience of producing that out-      to the data. Automated processes are typically conducted
put. Common narratives of generative systems sell them as         computationally, making them especially suitable for quan-
faster and more efficient than manual alternatives. Look-         titative analysis metrics and often desirable for the promise
ing at the process allows us to evaluate those claims, using      of reduced processing time or effort. Smith and Whitehead
metrics such as the time it takes to get a desirable output       [1], Kreminski et al. [25], and Kybartas et al. [30], among
or how many iterations of prompts/generations it takes to
others, all present automated approaches for applying ERA        outputs. We present a taxonomic framework for categoriz-
metrics to their data.                                           ing existing and imagined ERA inquiries, hoping to allow
   Manual processes require human labor for applying anal-       more effective navigation of this space and leverage PCG
ysis metrics to each piece of data—while this tends to be        tools for study of GAI technologies and systems.
more time-consuming, it also allows closer scrutiny of el-          This taxonomy opens interesting possibilities for future
ements that are difficult to capture without direct human        applications of ERA. What do qualitative applications of
intervention. Manual processes are often unsuitable for          ERA look like? How would manually applying metrics to
the large quantities of data that ERA typically processes,       data compare to more commonly applied automated pro-
but methods such as crowd sourced photogrammetry—e.g.            cesses? These are relatively underexplored areas of ERA,
as used for identifying information about wildlife popula-       and the possibility of applying this method to generative
tions [31]—may hypothetically be leveraged to manually           systems makes these questions more compelling.
process such data. Human subject experiments, such as               The scope of this paper is limited to theory. This thread
those commonly used in narrative generation projects [32],       of research would benefit from a more complete, systematic
are another example of a manual approach.                        review of ERA projects that places existing work on the
                                                                 axes of this taxonomy and further inform the chosen axes.
                                                                 As an extension of this research, we also see value in per-
5. Discussion                                                    forming ERA on GAI tools using different combinations of
                                                                 axis placement—especially including qualitative and manual
Framing GAI under the lens of PCG- or both as the same
                                                                 approaches.
process, e.g. through the lens of human-AI interaction gen-
eration [8]- incorporates this new technology into an estab-
lished domain of research that has ERA as a design-focused       Acknowledgments
method for evaluation. In connecting GAI and PCG, though,
we have identified a need for both an improved vocabulary        This material is based upon work partially supported by the
for describing ERA and an expanded potential scope for ERA       National Science Foundation (NSF) under Grant No DGE-
that challenges existing methods. Thus this taxonomy fur-        1922761. Any opinions, findings, and conclusions or rec-
ther expands the potential range of analytical applications      ommendations expressed in this material are those of the
for ERA, providing language to better describe and imagine       authors and do not necessarily reflect the views of the NSF.
its use-cases and identify its historical gaps. This expansion
responds to existing weak-points in PCG evaluation—such
as those identified by Withington et al. [9]—by increasing       References
the diversity and re-usability of ERA as a framework for
                                                                  [1] G. Smith, J. Whitehead, Analyzing the expressive
generative analysis.
                                                                      range of a level generator, in: Proceedings of the 2010
   Further, this taxonomy allows for better description of the
                                                                      workshop on procedural content generation in games,
flexibility and space occupied by broadly applicable aspects
                                                                      2010, pp. 1–7.
of evaluation, such as those presented in Guzdial’s human-
                                                                  [2] G. Smith, Understanding procedural content gener-
AI interaction generation [8]. For example, Guzdial’s call for
                                                                      ation: a design-centric analysis of the role of pcg in
human-centered input alignment considers the relationship
                                                                      games, in: Proceedings of the SIGCHI Conference
between valid system inputs and user-preferences regard-
                                                                      on Human Factors in Computing Systems, 2014, pp.
ing those inputs. Using the vocabulary from our taxonomy,
                                                                      917–926.
this is a process-focused, content-agnostic, subjective ap-
                                                                  [3] N. Deckers, J. Peters, M. Potthast, Manipulating em-
proach, because meaning is found according to individual
                                                                      beddings of stable diffusion prompts, arXiv preprint
perceptions without concern for the generative output. Such
                                                                      arXiv:2308.12059 (2023).
a metric could be applied to data quantitatively or qual-
                                                                  [4] OpenAI, Chatgpt, https://chat.openai.com/, 2024.
itatively, using either an automated or manual approach.
                                                                  [5] S. AI, Stable diffusion, https://stability.ai/stable-image,
Guzdial’s adaptability similarly considers the process of
                                                                      2024.
generation and user perceptions [8], and has similar axis
                                                                  [6] ElevenLabs, Elevenlabs, https://elevenlabs.io/, 2024.
placement to human-centered input alignment– though it
                                                                  [7] P. Korzynski, G. Mazurek, P. Krzypkowska, A. Kurasin-
is objective rather than subjective, since adaptability was
                                                                      ski, Artificial intelligence prompt engineering as a new
a predetermined aspect of the generative process, rather
                                                                      digital competence: Analysis of generative ai technolo-
than a variable expressed by human interpretation. Novelty,
                                                                      gies such as chatgpt, Entrepreneurial Business and
however, is an objective, product-focused metric because it
                                                                      Economics Review 11 (2023) 25–37.
is based in the observed possible generative outputs.
                                                                  [8] M. Guzdial, Human-AI interaction generation: A con-
   It is our hope that using the vocabulary of this taxonomy
                                                                      nective lens for generative AI and procedural content
can provide some clarity to the research community on
                                                                      generation, in: Proceedings of the Thirty-Third In-
how they are using ERA for evaluation, as well as identify
                                                                      ternational Joint Conference on Artificial Intelligence
potential new approaches for ERA.
                                                                      (IJCAI-24), 2024.
                                                                  [9] O. Withington, M. Cook, L. Tokarchuk, On the eval-
6. Conclusions, Limitations, &                                        uation of procedural level generation systems, in:
                                                                      Proceedings of the 19th International Conference on
   Future Work                                                        the Foundations of Digital Games, 2024, pp. 1–10.
                                                                 [10] J. Togelius, G. N. Yannakakis, K. O. Stanley, C. Browne,
This paper explores an avenue for exploring the intersec-
                                                                      Search-based procedural content generation: A taxon-
tion of PCG and GAI, using expressive range analysis as a
                                                                      omy and survey, IEEE Transactions on Computational
common method for analyzing generative systems and their
                                                                      Intelligence and AI in Games 3 (2011) 172–186.
[11] N. Shaker, J. Togelius, M. J. Nelson, Procedural content   [29] S. M. Lucas, V. Volz, Tile pattern kl-divergence for
     generation in games (2016).                                     analysing and evolving game levels, in: Proceedings
[12] M. Hendrikx, S. Meijer, J. Van Der Velden, A. Iosup,            of the Genetic and Evolutionary Computation Confer-
     Procedural content generation for games: A survey,              ence, 2019, pp. 170–178.
     ACM Transactions on Multimedia Computing, Com-             [30] B. A. Kybartas, C. Verbrugge, J. Lessard, Tension space
     munications, and Applications (TOMM) 9 (2013) 1–22.             analysis for emergent narrative, IEEE Transactions on
[13] A. Liapis, G. N. Yannakakis, J. Togelius, Sentient              Games 13 (2020) 146–159.
     sketchbook: computer-assisted game level authoring         [31] S. A. Wood, P. W. Robinson, D. P. Costa, R. S. Bel-
     (2013).                                                         tran, Accuracy and precision of citizen scientist an-
[14] M. Cook, J. Gow, G. Smith, S. Colton, Danesh: In-               imal counts from drone imagery, PloS one 16 (2021)
     teractive tools for understanding procedural content            e0244040.
     generators, IEEE Transactions on Games 14 (2021)           [32] R. Sanghrajka, E. Lang, R. M. Young, Generating quest
     329–338.                                                        representations for narrative plans consisting of failed
[15] M. Cook, Optimists at heart: Why do we research                 actions, in: Proceedings of the 16th International
     game ai?, in: 2022 IEEE Conference on Games (CoG),              Conference on the Foundations of Digital Games, 2021,
     IEEE, 2022, pp. 560–567.                                        pp. 1–10.
[16] A. Summerville, Expanding expressive range: Evalua-
     tion methodologies for procedural content generation,
     in: Proceedings of the AAAI Conference on Artifi-
     cial Intelligence and Interactive Digital Entertainment,
     volume 14, 2018, pp. 116–122.
[17] A. Summerville, S. Snodgrass, M. Guzdial,
     C. Holmgård, A. K. Hoover, A. Isaksen, A. Nealen,
     J. Togelius,      Procedural content generation via
     machine learning (pcgml), IEEE Transactions on
     Games 10 (2018) 257–270.
[18] T. Short, T. Adams, Procedural generation in game
     design, CRC Press, 2017.
[19] E. Short, Bowls of Oatmeal and Text Gen-
     eration,              https://emshort.blog/2016/09/21/
     bowls-of-oatmeal-and-text-generation/, 2016.
[20] C. Plut, P. Pasquier, Generative music in video games:
     State of the art, challenges, and prospects, Entertain-
     ment Computing 33 (2020) 100337.
[21] E. M. Bender, T. Gebru, A. McMillan-Major,
     S. Shmitchell, On the dangers of stochastic parrots:
     Can language models be too big?, in: Proceedings
     of the 2021 ACM conference on fairness, account-
     ability, and transparency, Association for Computing
     Machinery, New York, NY, USA, 2021, pp. 610–623.
[22] J. E. Fischer, Generative ai considered harmful, in:
     Proceedings of the 5th International Conference on
     Conversational User Interfaces, Association for Com-
     puting Machinery, New York, NY, USA, 2023.
[23] H. H. Jiang, L. Brown, J. Cheng, M. Khan, A. Gupta,
     D. Workman, A. Hanna, J. Flowers, T. Gebru, Ai art
     and its impact on artists, in: Proceedings of the 2023
     AAAI/ACM Conference on AI, Ethics, and Society,
     Association for Computing Machinery, New York, NY,
     USA, 2023, pp. 363–374.
[24] G. AI, Gemini, https://gemini.google.com/, 2024.
[25] M. Kreminski, I. Karth, M. Mateas, N. Wardrip-Fruin,
     Evaluating mixed-initiative creative interfaces via ex-
     pressive range coverage analysis., in: IUI Workshops,
     2022, pp. 34–45.
[26] O. Withington, Illuminating super mario bros: quality-
     diversity within platformer level generation, in: Pro-
     ceedings of the 2020 Genetic and Evolutionary Com-
     putation Conference Companion, 2020, pp. 223–224.
[27] A. Bangor, P. T. Kortum, J. T. Miller, An empirical
     evaluation of the system usability scale, Intl. Journal
     of Human–Computer Interaction 24 (2008) 574–594.
[28] S. Djamasbi, Eye tracking and web experience, AIS
     Transactions on Human-Computer Interaction 6 (2014)
     37–54.