<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluating the impact of MDP-based level assembly on player experience</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Colan Biemer</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Seth Cooper</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Dynamic dificulty adjustment adjusts a game to be harder for skilled players and easier for less-skilled players. One way to do this is by changing the level that the player plays, which can be achieved by using a Markov Decision Process (MDP) to assemble smaller segments of levels. However, (1) such an approach has not been tested on real players and (2) there are open questions regarding how the MDP should be configured. In this paper, we evaluate MDP-based level assembly with two games-a platformer and a roguelike-with two player studies, and found that while an automatically generated MDP resulted in a similar player experience, it did not outperform a handcrafted MDP. This shows that MDP-based level assembly can be efective, but more work needs to be done on how to best generate MDPs for level assembly. Additionally, MDP-based level assembly was compared to a static level progression (i.e., the player plays a level until they beat it, and then they can play the next level). We found that the static level progression was too easy for one game and too hard for the other, helping show that a dynamic approach results in a more consistent player experience across players.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;level assembly</kwd>
        <kwd>player study</kwd>
        <kwd>Markov decision process</kwd>
        <kwd>platformer</kwd>
        <kwd>roguelike</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The majority of games that do not feature an open world use a static level progression (SLP), which is a
level progression where the player plays a level until they beat it, and then they can move on to the
next level. They continue to play until the player beats the game or quits. For designers, making an SLP
is a challenging task because some players are skilled and others are not. Making an SLP that works
for both is nearly impossible, and, as a result, some players will quit because the game is too easy and
others because it is too hard [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        One way to solve this problem is dynamic dificulty adjustment (DDA) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which adjusts parts of a
game to make it easier or harder based on the player. If they are struggling, a DDA system should make
the game easier, and, conversely, it should make the game harder if the game is not challenging enough.
Typically, this is achieved via stat adjustments, such as increasing the player’s health, increasing damage,
and so on [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>
        DDA can also be achieved by serving the player diferent levels [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ]. These levels can be selected
from a dataset, but they can also be procedurally generated [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] or stitched together from level segments
generated ofline [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. However, little work has used the latter approach of assembling levels from level
segments in the context of DDA.
      </p>
      <p>
        To approach this, we built on previous work from Biemer and Cooper [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which automatically built a
Markov Decision Process (MDP) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to assemble levels from level segments using a tool called Ponos.1
We extend their work by (1) testing if the approach works with real players by running two player
studies with two games (a platformer and a roguelike) with 160 participants for each study, and (2)
comparing the output of Ponos to an MDP built by hand. As a baseline, we included an SLP condition.
This was done to answer two research questions:
• RQ1: Does a reward based on expected dificulty and enjoyment for the automatically built MDP
result in a better player experience than using a simple reward based on depth?
• RQ2: Does an automatically built MDP, a handcrafted MDP, or a static level progression result in
a better player experience?
      </p>
      <p>
        For RQ1, we found that crafting a complex reward based on expected dificulty and enjoyment was
not necessary, when, instead, the structure of the MDP could be used to build a reward; this structure is
described in Section 3. For RQ2, we found that the SLP for the roguelike was too easy to beat, and the
SLP for the platformer was too dificult. In comparison, the MDP-based conditions had better results in
terms of how often players won and lost for both games. However, there was little to no impact on the
player experience, as defined by the mini Player Experience Inventory (mPXI) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. We were able to
determine, though, that the MDP-based conditions performed better than the SLP condition for both
games.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Work from Shu et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] showed how PCG can be guided with a player experience model [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. While
they did not test with players, their work is relevant because it raises a question: Instead of using
rewards like dificulty or enjoyment to guide level generation, why not use a player experience model?
The main reason we didn’t use a player experience model was accessibility. The hope is that someone,
someday, will use MDP-based level assembly in a commercial game. If the final answer to RQ2 is that
developers also need to build a player experience model, then MDP-based level assembly will likely be
out of scope because implementing a player experience model is a significant undertaking. Making it
work well for a specific game is even more work. Therefore, a player experience model is not accessible
for the majority of developers when the primary engineering focus is on gameplay, performance, and
bug fixes.
      </p>
      <p>
        One valid criticism of using a reward based on dificulty and enjoyment (see RQ1) is that neither is
concrete; they are concepts, and operationalizing them for a rewards table has many pitfalls. Shouldn’t
the MDP try to optimize the number of levels played by a player in a session [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or the likelihood that a
player will succeed when playing a level segment [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]? The former provides a real-world example with
a Match-3 game in production. They used a progression graph where the reward was based on how
many levels it would take to reach a given node from the current node, while also keeping track of the
number of trials; a trial is an attempt to beat a level. The reward table had the structure of , where
 represented the level and  the number of trials. The more trials failed, the more likely the player
was to enter the churn state, which represented the player giving up. An approach like this could be
used to create a more dynamic reward table. The approach could be augmented so that the MDP had a
new reward table that also considered the number of trials, but this wouldn’t be the correct approach
because players are not expected to replay the same level repeatedly until they win. Instead, , would
have to be converted to , but it is not clear how this could be accomplished.
      </p>
      <p>
        The reward of dificulty per level segment is subjective. What one player finds dificult, another may
ifnd easy. Static player rankings of dificulty are, therefore, inherently flawed, but the range of player
disagreement is not clear. For example, a flat level in Mario [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] where the player only has to move to
the right is “easier” than a flat level where the player has to jump once over a gap. One way to view
this is that dificulty is the result of in-game challenges [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. In this formulation, a challenge either
impedes progress through a level or ends a playthrough. One way to adjust a level’s dificulty is to
move challenges in the way of or out of the way of the player’s path [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. This, though, takes us back
to a static ranking of dificulty. This same line of reasoning applies to enjoyment, but enjoyment is far
more subjective. The hope, therefore, is that the player rankings of dificulty and enjoyment gathered
in previous work [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] capture the more objective aspects of dificulty and enjoyment.
      </p>
      <p>
        This notion of in-game challenges can be linked to required mechanics; for example, the player must
know how to jump to get past a gap obstacle. In theory, a game can be broken down into a list of
mechanics [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. A level may have multiple potential paths to beat it, but those paths can also be broken
down into a list of required mechanics. A player who has demonstrated knowledge of all the required
mechanics will be expected to beat a level with those mechanics. This notion is similar to the work
of Butler et al. [19], which built a system for building an automatic game progression. The system’s
goal was to introduce new mechanics while maintaining an appropriate level of dificulty. The goal of
their work is similar to ours, only we do not break down mechanics and instead rely on the behavioral
characteristics (BC) [20] of levels while building a progression based on an MDP.
      </p>
      <p>The work from Butler et al. [19] was not a work focused on DDA; it was a work on intelligent tutoring
systems (ITS) [21, 22, 23, 24]. The overlap between DDA and ITS has not, to our knowledge, been
directly analyzed with a useful visualization, such as a Venn diagram, but the overlap is considerable.
The main diference appears to be the goal. DDA aims to keep the player in a state of challenge, while
an ITS has the goal of player learning (e.g., learning English grammar [25], how to multiply and divide
[26], how to program [27], etc.).</p>
      <p>For DDA to work in the medium to long term, players have to learn new mechanics and ways to play
the game. One alternative reward that was not explored, but inspired by ITSs was one based on the
game’s mechanics [28]. Levels with mechanics that the player had not experienced would have higher
rewards than levels with mechanics that the player had already demonstrated. Competence could be
measured in the number of times the player successfully exhibited a given mechanic. This competence
score could be used to penalize levels where the player was unlikely to fail, similar to the work of Butler
et al. [19], and a dynamic reward table could be built.</p>
      <p>The reason that this mechanics-based reward was not used was, again, accessibility. Proving that
a mechanic is required to beat a level is a dificult and computationally expensive problem [ 29]. One
recent solution is to use a constraint solver, which shows that a level cannot be beaten without the
given mechanic [30]. Every layer added to the system, however, is a layer of complexity that developers
must maintain. A simpler alternative would be to build an approximation of the required mechanics,
which looks at level features (i.e., gaps) and elements (i.e., a Goomba in Mario). However, the goal for
this work is simplicity, but such an approach would be an interesting area for future work.</p>
    </sec>
    <sec id="sec-3">
      <title>3. MDP-Based Level Assembly</title>
      <p>
        This paper used the open-source tool Ponos to generate MDPs for level assembly. It works by using
Gram-Elites [31], an extension of MAP-Elites [32], to generate a set of level segments with n-grams
[33, 34] organized by their BCs [20]. A BC is an aspect of a level (e.g., density). The output of Gram-Elites
is a -dimension grid, where  is the number of BCs, and each cell in the grid can contain level segments.
This grid is then turned into a digraph based on neighbors, and the edges are validated with a linking
algorithm [35] to guarantee completability. The digraph is turned into an MDP [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], where states are
level segments and actions are edges from the digraph. The MDP assembles levels by concatenating
level segments together that are guaranteed to be completable, as assured by the linking algorithm.
      </p>
      <p>
        The resulting MDP can be used during a gameplay session. After beating or failing a level, the MDP’s
reward table is updated based on the player’s performance, and a new policy is built for level assembly.
For this to work, there are two special states in the MDP: start and end. start is where level assembly
begins. Edges from the start node are added when the player beats levels and removed when they
lose—see work from Biemer and Cooper [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for more details, but in this work, we allow the player to
lose twice in a row before edges are removed. The end state is reached when the player beats the level.
For this to work, the end state needs a positive reward, and the other states should have a negative
reward, because otherwise the MDP will never move towards the end state, since the reward horizon
would be infinitely positive.
      </p>
      <p>
        Of special note is the addition to the MDP of a reward table , or the designer reward table. This
table, , is how the designer influences the direction of level generation beyond the end state and the
structure of the grid, as defined by BCs. A main focus of this work is on how to best define . The
reward table is initialized to  ( ∀ ∈  : () ←  ()), and future reward updates are influenced
by  (() ←  () (), where  () is the number of times state the player has visited state ;
(A) DungeonGrams
(B) Recformer
Biemer and Cooper [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] utilized a surrogate player model in their work, but a player model was not
required for the approach to work, and we forgo it in the name of simplicity).
      </p>
      <p>
        One place where this work difers from the approach ofered by Biemer and Cooper [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] was how the
transition table  (′|, ) was updated. They updated it in a binary fashion based on the number of
times a level was beaten divided by the total number of times it was visited. In this work, we changed
the numerator to the sum of the percent completed. Meaning, if a player played state  once and
completed 90% of that level (i.e., they lost), then the probability of success would be (1 + 0.9)/(1 + 1),
(0.95) rather than (1 + 0)/(1 + 1), (0.5). The initialization of success was 99% for all edges based on
exploratory playthroughs.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Player Study Details</title>
      <p>Two games were used in this work. The first was DungeonGrams [ 36] (see Figure 1A). DungeonGrams
is a roguelike where the player traverses levels to reach a portal, always at the bottom right of each
level. The portal does not open until the player has activated every switch in the level. When the player
moves, they lose one point of stamina. The player can gain back stamina by collecting food blocks.
There can also be enemies in a level. Enemies chase the player when the player is within a certain
range of the enemy’s starting point. The second game was Recformer [37] (see Figure 1B), a platformer
where the player has to collect every coin in the level to beat the level. Meaning, unless there is a coin
at the far right of the level, the player does not have to go all the way to the right to win. The game
includes several enemies that move deterministically (vertically, horizontally, or in an oval). Finally,
there are turrets that fire a bullet at the player’s location every 2.5 seconds when the player is in range,
and lasers that fire upward every 2 seconds when the player is in range.</p>
      <p>To gather data on the impact of diferent MDP level progressions, we ran user studies with both
games. Besides the game the participant played, each study was exactly the same. Participants played
the game, filled out a survey (demographics, mPXI, and then three custom questions: “The game was
too hard,” “The game was too easy,” and “I felt bored while playing this game”) on Qualtrics, and then
they were shown a code that they could use to get their payment. 160 players were recruited for each
game via Prolific. Study methods had approval from our university’s Institutional Review Board.</p>
      <p>A server was built to serve static content (the game in the case of this work), store logs, and assign
conditions [38] with block randomization [39].2 The four conditions were r-mean, r-depth, static, and
hand (conditions are described below). Each condition was assigned 40 participants.</p>
      <p>Each participant was paid three dollars for an estimated time of fifteen minutes of participation
(estimated pay of $12 an hour). The breakdown of time was a required ten minutes of playing the game,
and a maximum expected time of five minutes to fill out the survey. Required time playing the target
game was enforced by a timer at the top of the screen, which counted down to zero. Once the timer hit
zero, the link to the survey was presented with instructions. After participants completed the survey,
2Server code available on GitHub: https://github.com/bi3mer/go-log-study-server.
they were shown the code to complete their Prolific study. The only way to reach the survey before the
ten minutes were up was to beat the game.</p>
      <p>
        Analysis of all Likert data (−3 strongly disagree to 3 strongly agree) was performed with the
KruskalWallis test [40]. This was done because ANOVA [41] assumes interval data, and Likert data is ordinal.
For any variables that had a -value &lt; 0.05, a post-hoc analysis was run with Dunn’s test [42] with a
correction via the Holm method [
        <xref ref-type="bibr" rid="ref19">43</xref>
        ] to find any statistically significant diferences between groups.
      </p>
      <p>
        Analysis of behavioral data during gameplay was initially conducted by testing whether ANOVA
[41] could be applied with Levene’s test [
        <xref ref-type="bibr" rid="ref20">44</xref>
        ] to see if the distribution was homoscedastic and the
Shapiro-Wilk test [
        <xref ref-type="bibr" rid="ref21">45</xref>
        ] to see if the data was approximately normally distributed. For every variable in
both studies, the variables examined resulted in at least one of the two tests failing. The Kruskal-Wallis
test [40] was run in place of ANOVA, and the same post-hoc methods from before were applied if the
-value was &lt; 0.05.
      </p>
      <p>For DungeonGrams, levels were built with 2 level segments because using additional level segments
would not guarantee the completability of assembled levels [35]. For Recformer, levels were built
with 3 level segments, and this was based on experimental playthroughs. Levels made from 2 level
segments felt too short to play and 4 too long; note that completability was not an issue for a game
like Recformer where there weren’t long-term dependencies like stamina in DungeonGrams[35]. Even
if 3 level segments had felt too long to play while testing, though, that was the minimum that would
have been selected due to a desire to study the player’s experience when the MDP director had more
influence over the levels assembled.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Player Study Conditions</title>
      <sec id="sec-5-1">
        <title>5.1. hand Condition</title>
        <p>A critical part of any procedural content generation system is controllability. Using Ponos afords the
designer less control over player experience when compared to an SLP. The hand condition addresses
this by using an MDP built entirely by hand—see Appendix A for details on the process of making an
MDP by hand. Figures 2 and 3 show the fully built progressions for DungeonGrams and Recformer. The
MDP built for Recformer had 138 nodes and 1161 edges. The path from the start node to the end node
was 26 level segments long. The MDP built for DungeonGrams was composed of 104 nodes and 342
edges. The path from the start node to the end node was 27 level segments long.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. r-mean Condition</title>
        <p>
          The r-mean condition used an MDP automatically generated by Ponos, with a reward which was the
average of a level’s expected dificulty and enjoyment based on player data from a previous study from
Biemer and Cooper [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Generation with Ponos starts with Gram-Elites [31]. The configurations for
both games are in Appendix B. One change made to Gram-Elites was that the configuration no longer
required a resolution for each BC. This allowed for binary BCs (e.g., an enemy is in the level) as well as
count-based BCs (e.g., number of enemies in a level).
        </p>
        <p>After Gram-Elites is linking [35]. This requires an additional configuration detail of the maximum
link length. The maximum link length for DungeonGrams was 2 and Recformer was 1. Neither game
requires structure completion [35], and so that part of the linking algorithm was not used. Once linking
was complete, the graph was pruned so every node was reachable from the start node and could reach
the end node.</p>
        <p>
          With the graph complete, the next step for Ponos was to build the designer reward table . For
DungeonGrams, the reward table  was built by using the data collected from a previous study [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ],
which analyzed diferent heuristics to approximate dificulty and enjoyment of level segments using a
linear regression model to predict player rankings of dificulty and enjoyment. We used the output of
both models and calculated the mean for the reward. We then transformed it to be between 0 and 1 and
subtracted one so that the reward was negative.
        </p>
        <p>
          However, we did not use the exact same model that Biemer and Cooper found to be best performing
for enjoyment. The best model for calculating enjoyment used path-nothing (see [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] for a description),
but it could not be used for Recformer, and thus was not used for either game to better align the two
studies. Further, the correlation coeficient for path-nothing was −0.017746 and had little impact on
the output. The dificulty model for DungeonGrams used as input the following computational metrics:
jaccard-nothing, proximity-to-enemies, stamina-percent-enemies, and density (see [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] for descriptions of
these) as input. Some of these, like jaccard-nothing, which utilized Jaccard similarity [
          <xref ref-type="bibr" rid="ref22">46</xref>
          ] on the paths
between two diferent versions of a level, were grid-based, but were not removed due to being more
impactful on calculating dificulty.
        </p>
        <p>The models built for DungeonGrams were not used for Recformer because the models were built for a
roguelike, and Recformer is a platformer. Enjoyment for Recformer was the output score of proximity to
enemy—this heuristic looks at the solution path and calculates how close enemies are to the player as
they navigate the level—using every frame in the simulation. path-nothing was not used for Recformer
because removing coins would make the level solved. The dificulty score was the mean of
proximityto-enemies and a modified version of density. The modification came about because a highly dense level
can be just as challenging as a highly sparse level in a platformer. U-curve density was calculated with:
_() = (_() − 0.5 * ()) 2/(0.5 * ()) 2.3 The mean of the
two scores was then calculated, and used to fill in  ().</p>
        <p>After the reward was set for every node, it was updated based on the node with the largest reward:
() ←  ()/() − 1. This puts all the rewards in the range of [−1, 0].</p>
        <p>The final MDP for DungeonGrams had 598 nodes and 2, 953 edges. The minimum path from the start
node to the end node was 26 nodes long. The final MDP for Recformer had 3, 362 nodes and 11, 575
edges. The minimum path from the start node to the end node was 23 nodes long.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. r-depth Condition</title>
        <p>The r-depth condition used the MDP built by Ponos for the r-mean condition. The reward, though,
was changed to be based on the node’s depth, or distance from the start node. The update used
the max depth node, which was the end node. Every designer reward was set with: () ←
3u-curve density was not used as a BC for gram-elites because it would allow for highly dense and sparse levels to be in the
same gram-elites cell.
ℎ()/(ℎ) − 1 . Also, note that  was static, so when edges were added or removed, 
wasn’t updated.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. static Condition</title>
        <p>The static condition did not use an MDP. Instead, it used an SLP. There were three options for how the
SLP could be built: (1) build it from scratch, similar to the hand condition, (2) use the hand condition
digraph, and pick a single path through it, or (3) use the MDP built for r-mean and r-depth, and pick a
single path through it.</p>
        <p>The first option would be ideal if finding professional game designers and having them build an SLP
from scratch were a viable option, but it was not for a multitude of reasons. The alternative would be
for one of us, the authors, to create the SLP, but that opens up the results to the criticism that if the SLP
condition performed poorly, it was because the designer did a poor job building the SLP—this criticism
also applies to the hand condition. This same criticism would apply if the second option of building the
SLP from the hand condition digraph was used. That leaves us with the third option, where the SLP
was built from the output of Ponos. There are still valid criticisms, but fewer than the two alternatives.</p>
        <p>Following the third option, one SLP for DungeonGrams and one for Recformer was built by running a
breadth-first search to find the shortest path from the start node to the end node, and the output was an
array of level segments. The SLP stored an index which marked where the player was in the SLP. If
the player beat the level, that index was incremented by the number of level segments played. If the
player lost, then the index would not be changed, and the player would play the same level again.</p>
        <p>The path length for Recformer was made up of 23 level segments. Meaning, the player had to beat 8
levels to win. For DungeonGrams, the path length was made up of 26 level segments, which resulted in
the player having to beat 13 levels to win.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Game Study: DungeonGrams</title>
      <p>For DungeonGrams, the median time per participant was 13 minutes and 24 seconds, including time
playing the game and filling out the survey. This yielded a median pay of $13.44 an hour. Two
participants played the game but did not complete the survey, and five participants completed the
survey without playing a level. All seven were dropped from the dataset. This left 39 participants for
the hand condition, 39 participants for the r-depth condition, 39 participants for the r-mean condition,
and 37 participants for the static condition. The median age range for hand was 25-34, and the median
age range of the other three conditions was 35-44. No participants beat the game for hand, r-depth, and
r-mean. However, 26 of the participants assigned to the static condition beat the game.</p>
      <p>
        Table 1 shows the results from the mPXI [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and the three custom survey items with a compact letter
display to distinct groupings found with post-hoc analysis. Only two items had statistically significant
results, and neither was on the mPXI; we could not find a diference in player experiences across all
conditions. The two that had statistically significant diferences dealt with (1) the game was too easy to
play, and (2) the game was too hard to play. The results for the first (“Too Easy”) had a p-value of less
than 0.05, but the post-hoc tests showed no major diference between the conditions.
      </p>
      <p>Participants mostly disagreed or slightly disagreed with the statement, “The game was too hard to
play.” Three groups were found in post-hoc analysis. The first contained hand and r-depth, the second
r-depth and r-mean, and the third r-depth and static. The comparisons show that players disagreed less
that hand was too hard as compared to r-depth and static, and disagreed more that static was too hard
as compared to hand and r-mean. One interpretation of this is that hand was harder than static, with
the other two falling somewhere in the middle.</p>
      <p>Table 2 shows the behavioral results from playing the game. For “Time per Player,” you may have
expected to see 600 seconds (10 minutes) for every condition. The ten-minute time limit began when
the webpage was opened; however, the metric does not count time spent on the main menu, the tutorial,
or the two-second transition scene (win/loss screen). In other words, “Time per Player” shows how long
the player played DungeonGrams with their assigned condition. static had about 100 seconds less time
Audiovisual Appeal ( = 0.821)</p>
      <p>Challenge ( = 0.869)
Ease of Control ( = 0.243)
Clarity of Goals ( = 0.804)
Progress Feedback ( = 0.860)</p>
      <p>Autonomy ( = 0.759)
Curiousity ( = 0.620)
Immersion ( = 0.365)
Mastery ( = 0.140)
Meaning ( = 0.818)
Too Easy ( = 0.046)
Too Hard ( &lt; 0.001)</p>
      <p>Bored ( = 0.283)
played than the other three conditions. This was because static was the only condition where players
beat the game, and those players did not play the full ten minutes. Responses for “Challenge” and “Too
Easy” did not reflect so many static players winning. “Too Hard,” though, did reflect this result, but only
minimally, and recall that static was grouped with r-depth.</p>
      <p>“Time per Level” was diferent for each condition. Players, on average, spent the most time per
level for the static condition and the least amount of time on the hand condition. The result was
unexpected because it was thought that players would spend less time on average playing levels in the
static condition, as they would have to spend less time analyzing the level the next time they played if
they lost. However, instead, the efect appears to have been that players in the dynamic conditions kept
playing while players in the static condition took more time for each level. Alternatively, more time
was spent per level because players in the static condition lost less.</p>
      <p>“Levels Played,” “Levels Lost”, “Lost by Enemy,” and “Lost by Stamina” all displayed a statistically
significant diference with the dynamic conditions being in one group and static in another. These
diferences can be attributed to the number of levels played, with fewer levels played by the static
condition players who beat the game.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Game Study: Recformer</title>
      <p>The median time per participant was 14 minutes and 40 seconds, including time spent playing the
game and filling out the survey, yielding a median pay of $12.27 an hour. The r-depth condition was
re-run separately from the initial study due to an error in the condition’s reward assignment, and no
participants from the initial study were included in the re-run. Eleven participants played the game
but did not complete the survey, and one participant completed the survey without playing a level. All
twelve were dropped from the dataset. This left 36 participants for the hand condition, 38 participants
for the r-depth condition, 34 participants for the r-mean condition, and 38 participants for the static
condition. The median age range for static was 25-34, and the median age range of the other three
conditions was 35-44. Five of the participants beat the game for the static condition, and one participant
beat the game for the hand condition; no player beat the game for the remaining conditions.</p>
      <p>Table 3 shows the results for the Likert questions for Recformer. For the majority, the null hypothesis
could not be rejected. Only “Ease of Control” and “Too Easy” had p-values that were less than 0.05.</p>
      <p>“Ease of Control” was represented by the statement: “It was easy to know how to perform actions
in the game.” It had two groups. In the first was hand and r-mean. In the second was r-depth, r-mean,
and static. The first group had slightly higher scores for ease of control, closer to “Agree.” There was a
statistically significant diference between static and hand as well as r-depth and hand.</p>
      <p>There were two groups for “Too Easy.” The first group was composed of hand, r-depth, and r-mean.
The second was composed of r-depth, r-mean, and static. The diference, then, was between hand
and static, with participants assigned the hand condition between neutral and slightly agreeing that
Recformer was too easy, and participants assigned the static condition slightly disagreeing.</p>
      <p>Table 4 shows the behavioral results for Recformer. Unlike in DungeonGrams where there was a
two-second transition between levels, Recformer had a win/loss screen where the player had to press
the space button to start the next level. As a result, the time of less than 600 seconds (the ten-minute
time limit) indicates how quickly players were pressing the spacebar to play the next level and the
amount of time they spent on the main menu before playing.</p>
      <p>The first row in Table 4 where the null hypothesis could be rejected was for “Time per Level.” There
were two groupings. The first was for all the dynamic conditions, and the other was for the static
condition. The dynamic conditions had a higher time per level, at around 17 seconds played per level;
recall that this result is the opposite of what we found for DungeonGrams where players spent more
time on levels when playing the static condition.</p>
      <p>Time per Player ( = 0.371)*
Time per Level ( &lt; 0.001)*</p>
      <p>Levels Played ( = 0.696)
Levels Won ( &lt; 0.001)
Levels Lost ( &lt; 0.001)
Death by Fall ( &lt; 0.001)
Death by Enemy ( = 0.208)*</p>
      <p>The next row with a statistically significant diference was “Levels Won,” which had three groups.
The first was made of r-depth and r-mean. The second contained only hand. The third and final group
only had the static condition. hand had the most levels won on average, r-depth and r-mean were in
the middle, and the static condition had an average of only 3.692 levels beaten per participant. The
majority of participants in the static condition got stuck and could not progress.4 And recall here that
“Challenge” from the mPXI showed no statistically significant diferences among the conditions, and
that participants in the static condition only slightly disagreed with the statement, “The game was too
easy to play.”</p>
      <p>For “Levels Lost,” there were three groups. The first group contained r-depth and r-mean. The second
had hand and r-mean. The final group had the static condition. static had the most levels lost by a
wide margin. r-depth and r-mean had less levels lost on average, and hand had the least. Recall that
“Levels Played” had no statistically significant diferences, so the diference can be better attributed to
the conditions. Interestingly, although static participants lost more than the other conditions, there was
no negative efect on the player’s experience.</p>
      <p>Finally, the last category with a p-value less than 0.05 was “Death by Fall.” static featured the most
deaths by falling (see the previous footnote for a short description of the third and fourth level in the
progression). r-depth and r-mean were in the second group with less falls than static. The third group
contained r-mean and hand, with the least amount of deaths by falling.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Discussion</title>
      <p>For RQ1, there were minor diferences between r-depth and r-mean, but there was only one statistically
significant diference between the two across both user studies, and that was for “Time per Level” in
DungeonGrams. The answer, then, is that something simple like depth is enough to result in a positive
player experience given a good structure for the digraph built by Ponos. However, that is not to say that
more work could not go into defining diferent reward functions that could have a larger efect on the
player experience.</p>
      <p>For RQ2, both games achieved fairly high scores across all categories of the mPXI. This could be
related to recruiting participants from a crowdsourcing platform, where participants were happy to
play any game instead of performing other tasks, such as labeling images. This could help explain why,
despite the apparent dificulty of the static condition in Recformer (where most participants only beat
3-4 levels), the median participant only slightly agreed that Recformer was neither too easy nor too
hard.</p>
      <p>For DungeonGrams, there were no statistically significant diferences in the results for the mPXI. For
Recformer, only “Ease of Control” was diferent among the conditions, and the diference was minor,
4The third and fourth level in the static condition feature simple jumps from high platforms to low platforms, with one
complicated one where the player has to avoid a vertical enemy. Overall, neither level should be challenging for an experienced
player. However, the static condition does not adapt to player failure, so the challenge was too early in the game.
with both groups found to be near the “Agree” response. However, there is more to a player’s experience
than what they report.</p>
      <p>One such factor was whether the player beat the game. For DungeonGrams, 26 players beat the game,
and they were all in the static condition. Based on this, the SLP for DungeonGrams was too easy. For
Recformer, five players managed to beat the static condition, but the majority could not make it past
the third and fourth levels. These results taken together show that the experience of playing the static
condition was highly variable. If something was wrong, and there was for both games, nothing could
be done while the player was playing to alleviate the problem.</p>
      <p>Of course, a weakness in the argument is that the SLP was generated automatically. Perhaps if the
static condition had been built by a designer for both games, the results would have been better balanced.
Or, maybe the SLP should have been generated from the hand condition instead. Nevertheless, recall
that r-depth and r-mean both used the same digraph as the one used to generate the SLP for static.
The levels that were too easy and too hard were also possible to serve to the player, and the objective
experience of too many wins or losses could have occurred, but that didn’t happen. For DungeonGrams,
players won almost as many levels as the static condition, but also lost just as many because they were
challenged earlier and more often. For Recformer, the participants clearly struggled with playing a
platformer, but were still able to get almost double the wins and make progress.</p>
      <p>With the static condition no longer in consideration, this leaves us with the three MDP-based
conditions. There is a diference for DungeonGrams on the survey item “Too Hard” between hand and
r-depth, with r-depth disagreeing that the game was too hard and hand slightly disagreeing. Otherwise,
the automatically generated conditions resulted in comparable player experiences to the handcrafted
progressions.</p>
      <p>
        Because the DungeonGrams study did not yield any major diferences, this leaves us with the Recformer
study. From the results, it is clear that the digraph built by Ponos was flawed. One way to see that there
was a problem is via the win rates. The win rate for hand was 49.65%, whereas the win rate for r-depth
was 28.66% and for r-mean was 33.60%. While it is not clear what the ideal win rate is, the win rates
for both r-mean and r-depth are low when compared to the DungeonGrams study, where the win rate
for hand was 41.41%, for r-depth was 42.82%, and for r-mean was 41.49%. However, what the ideal
win-rate is is something only the designer can answer. For example, the win-rate for Recformer would
be high for a game like Super Meat Boy [
        <xref ref-type="bibr" rid="ref23">47</xref>
        ].
      </p>
      <p>We can not definitively answer RQ2 since the player experiences, as defined by the mPXI, were
statistically similar across all eleven categories. However, we can say that the MDP-based approach
outperformed SLPs.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Conclusion</title>
      <p>The player experience was similar across all four conditions for both player studies. This result was a
surprising one. For example, one would expect that the static condition for DungeonGrams, which had
a win rate of 75.60%, would have a lower Challenge score than conditions with a lower win rate—the
other three had win rates around 40%—but this was not the case. Instead, the scores appeared to be
more based on the experience of playing the game rather than in terms of success and failure.</p>
      <p>Despite this, we were able to partly answer both research questions. For RQ1, we found that a simple
reward based on depth resulted in a similar experience to a more complex reward that used dificulty
and enjoyment. For RQ2, the static condition was the worst-performing: players beat most levels and
almost never lost in DungeonGrams, but made almost no progress while playing Recformer. With static
removed, that left the three MDP-based conditions, but no further conclusions could be made regarding
which was best. As part of future work, we plan to further investigate RQ2 using new games to better
understand the impact of MDP-based level assembly on player experience.</p>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Grammarly to: Grammar and spelling check,
Paraphrase and reword. After use, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.
in: Proceedings of the 13th International Conference on the Foundations of Digital Games, 2018,
pp. 1–8.
[19] E. Butler, E. Andersen, A. M. Smith, S. Gulwani, Z. Popović, Automatic game progression design
through analysis of solution features, in: Proceedings of the 33rd annual ACM conference on
human factors in computing systems, 2015, pp. 2407–2416.
[20] G. Smith, J. Whitehead, Analyzing the expressive range of a level generator, in: Proceedings of
the 2010 Workshop on Procedural Content Generation in Games, 2010, pp. 1–7.
[21] A. T. Corbett, K. R. Koedinger, J. R. Anderson, Intelligent tutoring systems, in: Handbook of
human-computer interaction, Elsevier, 1997, pp. 849–874.
[22] A. C. Graesser, M. W. Conley, A. Olney, Intelligent tutoring systems. (2012).
[23] D. S. McNamara, G. T. Jackson, A. Graesser, Intelligent tutoring and games (itag), in: Gaming for
classroom-based learning: Digital role playing as a motivator of study, IGI Global, 2010, pp. 44–65.
[24] H. S. Nwana, Intelligent tutoring systems: an overview, Artificial Intelligence Review 4 (1990)
251–277.
[25] M. I. Alhabbash, A. O. Mahdi, S. S. A. Naser, An intelligent tutoring system for teaching grammar
english tenses, European Academic Research 4 (2016) 1–15.
[26] S.-C. Shih, C.-C. Chang, B.-C. Kuo, Y.-H. Huang, Mathematics intelligent tutoring system for
learning multiplication and division of fractions based on diagnostic teaching, Education and
Information Technologies 28 (2023) 9189–9210.
[27] C. J. Butz, S. Hua, R. B. Maguire, A web-based intelligent tutoring system for computer
programming, in: IEEE/WIC/ACM International Conference on Web Intelligence (WI’04), IEEE, 2004, pp.
159–165.
[28] M. C. Green, L. Mugrai, A. Khalifa, J. Togelius, Mario level generation from mechanics using scene
stitching, in: 2020 IEEE Conference on Games (CoG), IEEE, 2020, pp. 49–56.
[29] A. M. Smith, E. Butler, Z. Popovic, Quantifying over play: Constraining undesirable solutions in
puzzle design., in: FDG, 2013, pp. 221–228.
[30] S. Cooper, M. Bazzaz, Literally unplayable: On constraint-based generation of uncompletable
levels, in: Proceedings of the 19th International Conference on the Foundations of Digital Games,
2024, pp. 1–8.
[31] C. Biemer, A. Hervella, S. Cooper, Gram-Elites: N-gram based quality-diversity search, in:
Proceedings of the 16th International Conference on the Foundations of Digital Games, FDG ’21,
Association for Computing Machinery, New York, NY, USA, 2021, pp. 1–6. doi:10.1145/3472538.
3472599.
[32] J.-B. Mouret, J. Clune, Illuminating search spaces by mapping elites, arXiv preprint
arXiv:1504.04909 (2015).
[33] S. Dahlskog, J. Togelius, M. J. Nelson, Linear levels through n-grams, in: Proceedings of the 18th
International Academic MindTrek Conference: Media Business, Management, Content &amp;amp;
Services, 2014, pp. 200–206.
[34] D. Jurafsky, Speech &amp; language processing, Pearson Education India, 2000.
[35] C. F. Biemer, S. Cooper, On linking level segments, in: 2022 IEEE Conference on Games (CoG),
2022, pp. 199–205.
[36] S. Cooper, C. F. Biemer, Dungeongrams, 2025. URL: https://github.com/crowdgames/
dungeongrams.
[37] C. F. Biemer, Recformer, 2025. URL: https://github.com/bi3mer/recformer.
[38] M. Kang, B. G. Ragan, J.-H. Park, Issues in outcomes research: an overview of randomization
techniques for clinical trials, Journal of athletic training 43 (2008) 215–221.
[39] J. Efird, Blocked randomization with randomly selected block sizes, International journal of
environmental research and public health 8 (2011) 15–20.
[40] P. E. McKight, J. Najab, Kruskal-wallis test, The corsini encyclopedia of psychology (2010) 1–1.
[41] L. St, S. Wold, et al., Analysis of variance (ANOVA), Chemometrics and intelligent laboratory
systems 6 (1989) 259–272.
[42] A. Dinno, Nonparametric pairwise multiple comparisons in independent groups using dunn’s test,</p>
    </sec>
    <sec id="sec-11">
      <title>A. Making a Handcrafted MDP</title>
      <p>
        The process to build an MDP by hand began by building level segments. Levels are typically built with a
level editor [
        <xref ref-type="bibr" rid="ref24 ref25">48, 49</xref>
        ]. For level segments in this work, we expected a grid-based layout in two dimensions.
This means that all that is required is a simple text file with rows of characters, where each character
represents a diferent entity in the game, similar to the Video Game Level Corpus [
        <xref ref-type="bibr" rid="ref26">50</xref>
        ]. A text editor
(i.e., vi, vim, or nvim) was more than enough.
      </p>
      <p>The next step was to connect the level segments into a digraph, where each level segment was a
node. Editing a digraph by hand with a file format like JSON was possible, but it became error-prone as
the size of the graph increased. To handle this problem, a graph editor was built called GDM-Editor.5
GDM stands for graph-based decision making, and is a tool that was built for making graph-based
MDPs.6 See Figures 2 and 3 for examples of the graph editor built for this work. The editor includes
basic functionalities like adding and removing edges, updating rewards, and reading from a directory
for when the designer makes new level segments.</p>
      <p>One problem found while using the editor was that it became cumbersome to use if the designer
wanted to create multiple-level segments of a similar type to connect to another group of level segments
of a similar type. For example, if the goal was to have five level segments connect to five other level
segments, you would have to make 25 edges. By allowing a node to have multiple-level segments, this
problem was solved. However, this comes with the sacrifice that each level segment was assigned the
same reward. In terms of the final representation, the multi-level node decomposes into  individual
nodes with the same reward. Further, the designer has to do extra work to make sure that all 25 possible
edges are valid edges in terms of completeness.</p>
      <p>The process of making level segment progressions by hand for each game was iterative, and the
primary goal was to introduce a concept and then ramp up the challenge. Then, this was repeated until
every concept had been introduced. To make sure that the MDP did not result in a linear progression,
like the SLPs, concepts (e.g., a horizontal enemy in Recformer and a movement pattern in DungeonGrams)
were introduced in pairs or trios where the player could learn about one, and then learn about another
with the already learned concept in the next level segment. This, though, led to some repetitive patterns
and made it so players who reached the end level segments had challenging level segments that made
up a whole level, resulting in unideal pacing. Therefore, some “easy” level segments were placed in
5Code Available on GitHub: https://github.com/bi3mer/GDM-Editor
6Code Available on GitHub: https://github.com/bi3mer/GDM
between particularly challenging level segments that served as a break, but also as a jumping of point
for players because they were likely to beat the easy level segments.</p>
      <p>B. Gram-Elites Configurations
• DungeonGrams
• Recformer</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Czikszentmihalyi</surname>
          </string-name>
          ,
          <article-title>Flow: The psychology of optimal experience</article-title>
          , New York: Harper &amp; Row,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hunicke</surname>
          </string-name>
          ,
          <article-title>The case for dynamic dificulty adjustment in games</article-title>
          ,
          <source>in: Proceedings of the 2005 ACM SIGCHI International Conference on Advances in computer entertainment technology</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>429</fpage>
          -
          <lpage>433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Thue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bulitko</surname>
          </string-name>
          ,
          <article-title>Procedural game adaptation: Framing experience management as changing an MDP</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment</source>
          <volume>8</volume>
          (
          <year>2012</year>
          )
          <fpage>44</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kolen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Aghdaie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Zaman</surname>
          </string-name>
          ,
          <article-title>Dynamic dificulty adjustment for maximized engagement in digital games</article-title>
          ,
          <source>in: Proceedings of the 26th International Conference on World Wide Web Companion</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>465</fpage>
          -
          <lpage>471</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>González-Duque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Palm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Risi</surname>
          </string-name>
          ,
          <article-title>Finding game levels with the right dificulty in a few trials through intelligent trial-and-error</article-title>
          ,
          <source>in: 2020 IEEE Conference on Games (CoG)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>503</fpage>
          -
          <lpage>510</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gonzalez-Duque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Palm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Risi</surname>
          </string-name>
          ,
          <article-title>Fast game content adaptation through bayesian-based player modelling</article-title>
          ,
          <source>in: 2021 IEEE Conference on Games (CoG)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>01</fpage>
          -
          <lpage>08</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jennings-Teats</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wardrip-Fruin</surname>
          </string-name>
          ,
          <article-title>Polymorph: dynamic dificulty adjustment through level generation</article-title>
          ,
          <source>in: Proceedings of the 2010 Workshop on Procedural Content Generation in Games</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mugrai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khalifa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          ,
          <article-title>Mario level generation from mechanics using scene stitching</article-title>
          , arXiv:
          <year>2002</year>
          .02992 [cs] (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Biemer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <article-title>Level assembly as a markov decision process</article-title>
          ,
          <source>in: Proceedings of the Experimental AI in Games Workshop</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Norvig</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence: a modern approach</article-title>
          , 3rd edition ed.,
          <string-name>
            <surname>Pearson</surname>
          </string-name>
          , Upper Saddle River,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Haider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Harteveld</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Birk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Mandryk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Seif</given-names>
            <surname>El-Nasr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Nacke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gerling</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>Vanden Abeele, minipxi: Development and validation of an eleven-item measure of the player experience inventory</article-title>
          ,
          <source>Proceedings of the ACM on Human-Computer Interaction</source>
          <volume>6</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shu</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Yannakakis</surname>
          </string-name>
          ,
          <article-title>Experience-driven pcg via reinforcement learning: A super mario bros study</article-title>
          ,
          <source>in: 2021 IEEE Conference on Games (CoG)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Yannakakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Spronck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Loiacono</surname>
          </string-name>
          , E. André,
          <article-title>Player modeling</article-title>
          ,
          <source>in: Artificial and Computational Intelligence in Games</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Nintendo</surname>
          </string-name>
          , Super Mario Bros.,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.-V.</given-names>
            <surname>Aponte</surname>
          </string-name>
          , G. Levieux,
          <string-name>
            <given-names>S.</given-names>
            <surname>Natkin</surname>
          </string-name>
          ,
          <article-title>Measuring the level of dificulty in single player video games</article-title>
          ,
          <source>Entertainment Computing</source>
          <volume>2</volume>
          (
          <year>2011</year>
          )
          <fpage>205</fpage>
          -
          <lpage>213</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Berseth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Haworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kapadia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          ,
          <article-title>Characterizing and optimizing game level dificulty</article-title>
          ,
          <source>in: Proceedings of the 7th International Conference on Motion in Games</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>153</fpage>
          -
          <lpage>160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <article-title>Solution path heuristics for predicting dificulty and enjoyment ratings of roguelike level segments</article-title>
          ,
          <source>in: Proceedings of the 19th International Conference on the Foundations of Digital Games</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>M. C. Green</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Khalifa</surname>
            ,
            <given-names>G. A.</given-names>
          </string-name>
          <string-name>
            <surname>Barros</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Nealen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Togelius</surname>
          </string-name>
          ,
          <article-title>Generating levels that teach mechanics</article-title>
          ,
          <source>The Stata Journal</source>
          <volume>15</volume>
          (
          <year>2015</year>
          )
          <fpage>292</fpage>
          -
          <lpage>300</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>S.</given-names>
            <surname>Holm</surname>
          </string-name>
          ,
          <article-title>A simple sequentially rejective multiple test procedure</article-title>
          ,
          <source>Scandinavian journal of statistics</source>
          (
          <year>1979</year>
          )
          <fpage>65</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Gastwirth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. R.</given-names>
            <surname>Gel</surname>
          </string-name>
          , W. Miao,
          <article-title>The impact of levene's test of equality of variances on statistical theory and practice (</article-title>
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Shapiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Wilk</surname>
          </string-name>
          ,
          <article-title>An analysis of variance test for normality (complete samples)</article-title>
          ,
          <source>Biometrika</source>
          <volume>52</volume>
          (
          <year>1965</year>
          )
          <fpage>591</fpage>
          -
          <lpage>611</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>P.</given-names>
            <surname>Jaccard</surname>
          </string-name>
          ,
          <article-title>Nouvelles recherches sur la distribution florale</article-title>
          ,
          <source>Bull. Soc. Vaud. Sci. Nat</source>
          .
          <volume>44</volume>
          (
          <year>1908</year>
          )
          <fpage>223</fpage>
          -
          <lpage>270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [47]
          <string-name>
            <surname>Team</surname>
            <given-names>Meat</given-names>
          </string-name>
          , Super Meat Boy,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>B.</given-names>
            <surname>Cowan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kapralos</surname>
          </string-name>
          ,
          <article-title>A simplified level editor</article-title>
          ,
          <source>in: 2011 IEEE International Games Innovation Conference (IGIC)</source>
          , IEEE,
          <year>2011</year>
          , pp.
          <fpage>52</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>M.</given-names>
            <surname>Guzdial</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          , S.-Y. Chen,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Reno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Riedl</surname>
          </string-name>
          , Friend, collaborator, student, manager
          <article-title>: How design of an ai-driven game level editor afects creators</article-title>
          ,
          <source>in: Proceedings of the 2019 CHI conference on human factors in computing systems</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Summerville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Snodgrass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mateas</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Ontanón,</surname>
          </string-name>
          <article-title>The vglc: The video game level corpus</article-title>
          ,
          <source>arXiv preprint arXiv:1606.07487</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>