Scoring Integrated Meaningful Play in Story-Driven Games with Q-Learning


                            Alan Bettis1 , Danielle Newberry2 and Hector Munoz-Avila1
                                                1
                                                    Lehigh University, 2 Middlebury College


                            Abstract                                      latter stages. On the other extreme, if the game has multi-
                                                                          ple endings, but “locks” the player once the initial choice is
  Integrated meaningful play is the idea that player’s choices
                                                                          made, then its I-score is 1. If a game has an I-score close to
  should have a long-term effect on the game. In this paper we
  present I-score (for integrated), a scoring function for scoring        0.5 it indicates that the game is balanced between these two
  integrative game play as a function of the game’s storylines.           extremes providing more choice to the player.
  The I-scores are in the range [0,1]. In games with I-scores                Game’s storylines can be represented as directed graphs,
  close to one, player’s early choices determine the game’s end-          where each node is an state in the storyline (e.g., beginning
  ing; choices made later in the game do not change the final             of the game, each of the possible endings) and edges repre-
  ending of the game. In contrast, games with I-scores close to           sent transitions between states as a result of player’s choices
  zero, players’s choices can change the ending until the very            (Riedl and Young 2006). The basis for computing the I-
  end. Games with scores closer to 0.5 provide a more balanced            score is the use of Reinforcement Learning (RL) techniques
  player choice whereby the game’s ending still can be changed
                                                                          with the aim at finding the “true” value (defined in RL is a
  despite early decisions, but not so much that the ending could
  be changed at any point.                                                metric of the suitability of every state to reach a goal based
                                                                          on the rewards) of every state based on the reward function.
                                                                          We then find the distances between those true state values to
                        Introduction                                      compute the overall I-score of the graph.
Some games are famous for their storyline and how the play-                  The premise of our work is that integrated meaningful
ers’ choices play out in the final ending of the game. For                play assess the relationship between player choices and the
instance, Bioware’s Knights of the Old Republic is well-                  game ending. As the value of being in a state is intrinsi-
known for a variety of aspects including how the player’s                 cally linked to player choice, Reinforcement learning finds
choices throughout the game will result in a variety of end-              the value of each state.
ings (and cinematics).                                                       We present an empirical evaluation on both synthetic
   Having multiple endings is not a requirement for a game                graphs, representing different topologies of player choices,
to be considered to have a great storyline. Many first-person             as well as actual games and discuss their resulting I-scores.
shooters such as Valve’s Half-Life are linear, yet its storyline
is considered memorable. Nevertheless, the long term effect
that player’s choice has on the ending of a game is one of the
                                                                                                Preliminaries
most intriguing aspects of game design. Researchers have                  Our work links two different concepts: Q-learning and
introduced evaluative definitions assessing player choices.               Meaningful Play. We introduce these concepts here.
Salen, Tekinbaş, and Zimmerman (2004) define the inte-                      In reinforcement learning, the agent aims to maximize its
grated component of meaningful play to indicate if the play-              future rewards while interacting with the environment from
ers’ choices have long term repercussions on the game. The                a starting time t: rt +rt+1 +...+rT (Sutton and Barto 2018).
idea is that by enabling long term repercussions of player’s              For our purposes we focus on episodic tasks, where the agent
choices in the game, the game empowers the players and                    terminates its execution after reaching a concluding state.
makes their choices have more meaning.                                       Tabular reinforcement learning algorithms maintain a two
   In this paper, we propose the I-score (for integrated) met-            dimensional table Q : S × A → R, where S represents the
ric. It scores how much player’s choice impacts game end-                 states that the agent can visit, A represents the actions that
ing. In games with a I-score close to 0, the player’s can                 the agent can take, and R represents the real numbers. The
change the ending at any point in the game including at the               entry Q(s, a) indicates the estimated reward the agent gains
Copyright c 2020for this paper by its authors. Use permitted un-          from taking action a in state s. The estimated value Q(s, a)
der Creative Commons License Attribution 4.0 International (CC            is updated after the agent takes an action and receives a re-
BY 4.0).                                                                  ward using the following formula:
                                                                    choice with the goal of getting to a desired ending state.
                                                                    Then we use the Manhattan distance to compare all pos-
   Q(s, a) ← (1 − α)Q(s, a) + α[r + γ max
                                       0
                                          Q(s0 , a0 )]              sible routes from the same starting point to a desired end
                                              a
                                                                    point to determine the variance of paths, i.e. the importance
            0
   Where s is the state reached after executing action a in         of certain choices. Second, we weigh user’s earlier choices
state s. α is the step-size parameter determining how much          higher than future ones to account for the fact earlier choices
weight to give the previous estimate Q(s, a) and how much           have the opportunity to have a more integrated impact than
weight to give the update (i.e., the [r + γ max
                                             0
                                                Q(s0 , a0 )] part   choices made near the end of a game.
                                             a
of the equation). γ is the learning rate indicating how much
weight to give to prior estimates of Q(s0 , a0 ).                       Scoring the Integration Level of a Game
   Reinforcement learning algorithms can use a variety of           We present a procedure computing the I-score, a scoring
techniques such as -greedy: with a probability 1 −  where         function I: GAM E → [0, 1] such that if a score is closer
the agent chooses the greedy action pi(s) and with a prob-          to 0.0, this indicates a high degree of overlap between the
ability  it pics a random action. Based on the estimates,          paths from the beginning of the game to each ending of the
Q(s, a), the so-called greedy policy can be defined:                game. Conversely, if a score is 1.0, the paths from the start
                                                                    of the game to each ending have more separation at the end.
                    π(s) = max
                            0
                               Q(s, a0 )                               For our study we developed synthetic graphs to repre-
                               a
                                                                    sent choice driven games with varying amounts of choices
   To guarantee that future reward is maximized, the policy
                                                                    for players. The synthetic topologies have a single starting
enforces that for every state s the action a’ with the highest
                                                                    node connected to four succeeding nodes. There are four
current estimate Q(s, a0 ) is chosen.
                                                                    levels, and a final ending level each with four nodes. The
   In Rules of Play: Game Design Fundamentals Salen et              term “branch” will refer to the direct paths made from the
al. proposes a game design framework for not only com-              first decision nodes to the respective ending nodes vertical
puter games, but all games — from board games to sporting           to them.
games. Meaningful play is highlighted as a core concept of
                                                                       Figure 1 shows two of these topologies. The starting node
game design and integral to a game’s success. Salen et al.
                                                                    represents the starting state in a video game. After a player
define meaningful play as a relationship within a game be-
                                                                    makes an in-game decision, the game state transitions to the
tween action and outcome. There are two types of meaning-
                                                                    corresponding connected node.
ful play: integrated and discernable. Integrated meaningful
                                                                       We begin with both extremes. In Figure 1 (a), each node
play means that a player’s action has some long term conse-
                                                                    in each level is connected to every subsequent level’s node
quences, i.e. the decision or play is woven or integrated into
                                                                    allowing for a transition into any other branch. Most player
the game as a whole. Discernable meaningful play refers to
                                                                    choices made during the duration of the game have no im-
when a player has immediate or discernable consequences
                                                                    pact on the ending of the game. Instead, only the final player
to a given action. This work posits that successful games in-
                                                                    decision determines the game’s ending state. We refer to this
clude game actions and outcomes that are both discernable
                                                                    topology as “non-integrated” because aside from the last
and integrated (Salen, Tekinbaş, and Zimmerman 2004).
                                                                    choice, all other decisions made by the player have no af-
   The notion of meaningful play provides a conceptual as-
                                                                    fect on the final outcome of the game. Its I-score is given
sessment of the design of a video game. It evaluates the re-
                                                                    by I(non-integrated) = 0.1585, an unsurprisingly low score.
lationship between actions taken by a player and the out-
                                                                    Only the player’s final decision has a direct impact on the
comes of those actions. It says that in order for the interac-
                                                                    ending, but this decision is not integrated because it is made
tion between players’ actions and the actions’ outcomes to
                                                                    at the end of the game.
be meaningful, they must be discernable and integrated.
                                                                       For our opposite synthetic graph, we construct a topology
   Discernable means that the immediate outcome of the
                                                                    such that after a player makes their first choice, they become
action is observable by the player. For example, when the
                                                                    locked into a specific branch with no possibility of chang-
player press the “w” key, they observes their avatar moving
                                                                    ing their outcome. Thus, the very first choice the player
forward. Integrated means that the players’ actions have a
                                                                    makes determines the outcome. We refer to this topology
long-term effect on the course of the game. For example, in
                                                                    as “rigid-integrated”. It is shown in Figure 1 (a). It has an
games like Knights of the Old Republic, players actions such
                                                                    score I(rigid-integrated) = 1.
as killing or saving a particular NPC, will assign “good” or
“bad” karma points and these in turn will determine how the
storyline in the game will end.                                     Underlying assumptions on game structure. As a first
   This work solely focuses on assigning a score to the             work computing a numeric measurement of integrated
amount of integrated meaningful play in story driven games.         meaningful play, we make some assumptions about the
Our algorithm determines the integrated impact of every             graphs of the games we have chosen to test. First, we as-
player choice made using two main processes: (1) We calcu-          sume that the graph is a directed acyclic graph (DAG). That
late overall “meaningfulness” of a choice; (2) we weigh the         is, from any game’s state there is no sequence of player’s
obtained choice value to account for integration level. First,      choices that will lead to that same state. Many games do
to calculate the impact of a choice we use Q-learning to de-        have such cycles, but we have chosen games that do not
termine how much reward is obtained by making a given               have such cycles for simplicity. Second, we assume that the
                                                                   Computing Weighting Function W
                                                                   Integrated play is defined as the effect a given action has
                                                                   later on in a game. Therefore, it stands to reason that earlier
                                                                   actions have more opportunity to have a long-lasting impact
                                                                   on the game. To reflect this, we assign weights to actions
                                                                   based on how early in the game they occur, with more weight
                                                                   given to earlier actions.
                                                                      We calculate a set of n weights such that they sum to 1,
                                                                   where n is the number of transitions in the longest route in
                    (a)                (b)                         the graph. We then apply these weights in descending order,
                                                                   with the higher weights being applied closer to the graph’s
Figure 1: (a) Non-integrated topology; regardless of the           start node and smaller weights being closer to the graph’s
choices made the player can always reach any of the end-           endings.
ings; it has an I-score of 0.15; (b) Rigid-integrated topol-          Algorithm 1 describes the procedure for calculating the
ogy; once the player makes a choice in the first decision, the     set of weights. It first initializes an empty set of weights
outcomes are fixed for the remaining of the game; it has an        (Line 2). Then, it calculates the slope of a line according
I-score of 1.                                                      to graphHeight, the length of the longest route from start
                                                                   to any ending in the graph (Lines 3-10). Our initial unpro-
                                                                   cessed weights are given by i*slope. Once this is done, the
graphs have multiple ending states. Games such as Knights          algorithm calculates the average difference, toAdd, between
of the Old Republic have multiple endings depending on the         the sum of all unprocessed weights and 1 (Line 7). Because
player choices. Multiple endings are not a requirement for         the unprocessed weights are much smaller than 1, we add
a well-received narrative-based game, but they provide an          toAdd to each i*slope (Lines 8-10). Lastly, we reverse the
intuitive way to compare routes for scoring a game.                weights to put them into descending order.
   While many successful narrative-based games only have
one ending, or include cycles in their structure, this paper       Algorithm 1 Weight Calculation
describes a new proof-of-concept procedure. As such, we             1: procedure GET W EIGHTS(graphHeight)
leave these other types of game structures for future work.         2:    new List weightSet
                                                                    3:    slope ← 1/graphHeight2
           Computing Integration Score                              4:    sum ← 0
We present a procedure to compute the I-score of a game. It         5:    for i ← 1; i ≤ graphHeight; i + + do
is a composite of four elements:                                    6:        sum ← sum + (i ∗ slope)
                                                                    7:    toAdd ← (1 − sum)/(graphHeight)
          I(Game) = (M ◦ N ◦ W ◦ Q)(Game)                           8:    for i ← 1; i ≤ graphHeight; i + + do
                                                                    9:        weight ← ((i ∗ slope) + toAdd)
Computing the Q-values                                             10:        weightSet.append(weight)
For the games we are considering, a story can be seen as           11:    weightSet.reverse()
a DAG, where nodes are precise points in the story and             12:    return weightSet
edges are players’ choices advancing the story. Our goal
is to analyze the overall level of integration in a story. To
do so, we use Q-learning to calculate the reward value of             Once we have a set of linearly descending weights, we
every state-action pair by defining a matrix of dimensions         can apply them to the Q-tables to construct a set of weighted
1 . . . N odes × 1 . . . N odes. Our scoring system accepts this   Q-tables that we will call “I-tables.” This is done by con-
matrix as input, with all cells initialized to 1. Additional pa-   ducting a depth-first search of each Q-table and multiplying
rameters include the position of the start node, represent-        each Q-value by the weight corresponding to the Q-value’s
ing the beginning of the story, and the positions and num-         height in the graph.
ber of ending nodes, representing all possible story endings.         If two or more branching paths in the graph of differing
We give a reward of 100 to the desired ending state in or-         lengths re-converge, the “height” of the node could be the
der to incentivize our Q-learning agent to reach this destina-     height of either prior node plus one. However, this is not
tion. We then run Q-learning on the resulting matrix for each      consequential in the overall score, as the weights maintain
ending state. After 100,000 iterations, the resulting Q-tables     their descending order before the branching and after the
show the expected rewards for each choice in each given            re-convergence. Therefore, choices are still weighed higher
story point. For each ending, we get a separate Q-table with       when found earlier, preserving integration.
the updated reward values. Let this procedure be defined by           Integrated play means that a decision has long-lasting
the function Q(Game), which accepts an adjacency matrix            consequences, and a decision made earlier on has more op-
representation of a game as input and outputs a series of n        portunity to have long-lasting impact. Therefore, this func-
Q-tables, where n is the number of available endings in the        tion allows our scoring system to detect integration in games
game’s story.                                                      by placing higher value on decisions made earlier in the
story. Let this procedure be defined by the function W (Q −
tables), which accepts a set of n Q-tables as input and out-
puts a set of n I-tables.

Computing the Normalized Scores N
To ensure that our final score is normalized between 0 and
1, we run softmax on a vector containing all non-zero values
in a given I-table, producing a set of normalized values. We
can then take our normalized set and overwrite the original
values with their normalized values.                                     (a)               (b)                (c)
   In order to normalize all values in a given weighted Q-
table, all non-zero values in the table are compiled into a       Figure 2: (a) The two-on-two graph allows for crossover in
one-dimensional vector. As non-zero values signify connec-        every other level alternating between the two endings.; it has
tions, they are the only values of interest in this operation.    an I-score of 0.9708; (b) The rigid-top topology, top half is
We then use softmax to normalize the values in this vector,       identical to the rigid-integrated and the bottom half is iden-
and replace the old non-zero values in the table with the cor-    tical non-integrated graph; it has an I-score of 0.1590; (c)
responding new ones from softmax (Bridle 1990). Let this          The rigid-bottom topology, inverts the connectivity of the
procedure be defined by the function N (Q−T ables), which         rigid-top. It has a score of 0.9995
accepts a set of n weighted Q-tables as input and outputs a
set of n normalized, weighted Q-tables.
                                                                  makes decisions from time to time when prompted. The
Computing the Manhattan Scores M                                  player can check the protagonist’s cell phone at any time,
Once this is done for each table, we can calculate the Man-       and respond to messages from other characters using one of
hattan distances in a pairwise fashion between each I-table,      a finite number of options. For simplicity, dialogue routes
getting a sense of how different each route is in the process.    that only lead to achievements are ignored.
Once these distances are averaged, we have our I-score.              The graph constructed from Steins;Gate is shown in Fig-
   Finally, we take the Manhattan difference between each         ure 3. There are many branching points in the story with 6
pair of the game’s I-tables. If two I-tables for different end-   endings of varying difficulty to achieve, the true ending be-
ings have different paths, then the Manhattan distance will       ing the most difficult. Each time a critical decision is made,
be higher, reflecting the greater amount of integrated play as    the player is denied access to one ending, but can still access
a result of the game. Conversely, if a nearly identical path      all the remaining endings from the new branch. Steins;Gate
can be taken to two different endings, the Manhattan dis-         received an I score of 0.6829.
tance will be lower. The final average is divided by 2 so that       Next, we investigate a browser-based puzzle game called
a different choice between two I-tables is not counted twice.     No One Has to Die.2 In each level of this game, the player
Let this Manhattan distance procedure be defined by a func-       must manipulate elements on the board to save characters
tion M (Q − tables), which accepts n weighted, normalized         from a fire, but the puzzles are always designed such that
I-tables as input and outputs a numeric score from 0 to 1.        one character must be sacrificed. The ending of the game
                                                                  is dependent upon which characters remain alive. Once all
                    Empirical Evaluation                          endings have been seen, the “true route” is unlocked, which
Synthetic Graphs                                                  is ignored as it is linear, and, thus not of interest for this
                                                                  study.
In addition to non-integrated and the rigid-integrated graphs        No One Has to Die’s graph is much simpler than
we created other synthetic graphs as shown in Figure 2. They      Steins;Gate, with 5 major endings based on who lives or
combine various levels of inter-connectivity.                     dies. There are 3 choices the player will make in a given
                                                                  play through of the game. The first choice forces the player
Game Graphs                                                       down one of two paths, each of which can either access one
We choose four story-driven video games on which to test          of a different pair of endings, or rejoin on a single ending.
our scoring system because we were able to formulate their        This game was assign a I-score of 0.6782.
topologies from online sources. For each game, we construct          In the game Shadow the Hedgehog,3 players move
a simplified graph where each choice is a transition from one     through levels collecting rings and engaging in combat with
node to another. Choices related solely to non-story elements     enemies much as they would in the Sonic the Hedgehog se-
of the game such as achievements are ignored for simplicity.      ries. The key difference from typical Sonic the Hedgehog
From these graphs, we construct adjacency matrices to pass        games is that each level in Shadow the Hedgehog contains
as input to our scoring algorithm.                                multiple objectives, and which objective the player com-
   The first and most complex game that we inspect is             pletes leads them to a different next stage. Based on which
Steins;Gate, a game of the visual novel genre. 1 Game play
                                                                     2
primarily takes the form of dialogue trees where the player          https://levelskip.com/puzzle/No-one-has-to-die-Walkthrough
                                                                     3
                                                                     https://en.wikipedia.org/wiki/Shadow the Hedgehog (video
   1
       https://en.wikipedia.org/wiki/Steins;Gate                  game)
                                           Figure 3: Graph of the Steins;Gate game.


route the player completed, they will receive one of 10 dif-      other games we have discussed so far, allowing the player to
ferent endings. Similarly to No One Has to Die, there is a        move between several routes almost freely, and only deny-
“true ending” route unlocked once all other endings have          ing access to certain endings after the third or fourth choice.
been seen, but this is once again linear and ignored for the      Shadow the Hedgehog received an I-score of 0.8354. This
purposes of this paper. The graph of Shadow the Hedgehog          score closer to 1 is indicative of higher difference between
is constructed with each node representing a level, and each      routes and more “strictness.” While there is less impact on
edge representing one of the possible objectives to complete.     the player’s early choices, the player decisions begin to mat-
This game is notably non-committal in comparison to the           ter more and more as they reach the end.
   The last game we investigate is Zero Escape: Virtue’s Last                           Game                                 Score
Reward,4 the second installment in the “Zero Escape” series                           Steins;Gate                            0.6829
of games. Zero Escape is a series of visual novel games in                        No One Has to Die                          0.6782
which the player solves puzzles to advance the story, and                       Shadow the Hedgehog                          0.8354
occasionally makes decisions that change the route. Among            Zero Escape: Virtue’s Last Reward (Unlocked)            0.9763
the games we cover in this paper, this game is unique in              Zero Escape: Virtue’s Last Reward (Locked)             0.9983
that it prohibits access to certain endings at first through var-                   Non-Integrated                           0.1585
ious means. Information required to solve certain puzzles                          Rigid-Integrated                          1.0000
is found on different routes, making progression essentially                           Rigid-Top                             0.1590
impossible without having been down the correct routes.                             Rigid-Bottom                             0.9995
Other endings are fully locked, the player being met with a                          Two-on-Two                              0.9708
screen reading “To be Continued...” if they reach the ending
prematurely. This is meant to enforce a degree of linearity to                        Table 1: Resulting I-scores
the storytelling.
   For simplicity, we construct two different graphs: A
“complete” graph with prologue and all endings “unlocked”,             Both unlocked and locked Zero Escape: Virtue’s Last Re-
and a “locked” graph with no prologue segment and only              ward yields I-scores in the range of 0.9 indicating a close to
initially unlocked endings. Both graphs have a very similar         rigid topology. Shadow the Hedgehog has a slightly lower
structure to that of No One Has to Die, albeit with more end-       score of 0.8354 suggesting that this game has slightly more
ings and length.                                                    divergence between the branches of its topology. In a pre-
   The complete graph has 20 accessible endings, with the           view of Shadow the Hedgehog, Gooseberry commented
first decision the player makes denying access to over half         “gamers will decide in a race against time by choosing be-
of the endings. With the large amount of denial from end-           tween any number of routes by which to unveil the secrets
ings, the complete graph receives a I-score of 0.9763. The          of his past. Multiple endings will reflect critical decisions at
similarity to the synthetic rigid-integrated graph is expected,     every stage of the game. Gamers have ultimate control over
as there is no overlap between routes in this game.                 Shadow’s past, with the potential to rewrite his history time
   Meanwhile, the locked graph has a nearly identical struc-        and time again.” 5 The game’s high score reflects the gamer’s
ture, but only allows access to 9 endings. This proves to have      “ultimate control” early game decisions do not immediately
very little impact on the I-score, as this graph receives a         lock player’s out of endings, late decisions made in the game
score of 0.9983. The increase in score can be attributed to         still have the power to swap branches.
the reduced number of endings, meaning that each choice                Steins;Gate and No One Has to Die have similar scores in
denies a player access to a higher percentage of the total          the range of 0.6. Both of these games have complex topolo-
number of endings.                                                  gies that contain a mixture of paths. We hypothesize that
                                                                    games between these two extremes, with a I-score close to
Experimental Setup                                                  0.5 have more interesting story lines. Daily Star reviewers of
                                                                    Steins;Gate remarked, “As you proceed, once trivial choices
Our Q-learning parameters are as follows: α = 0.90, γ =
                                                                    you make reveal themselves to have a direct effect on the
0.75, and  = 0.20. For each game, we run Q-learning on the
                                                                    game’s finale. Multiple endings both good and bad are up
game’s weighted adjacency matrix 100,00 times. We repeat
                                                                    for grabs hindering even seemingly simple decisions with a
this process 10 times and average the outputs.
                                                                    heavy weight, even more so as you debate the consequences
                                                                    of altering time and the potential butterfly effect it’ll have on
Experimental Results                                                the characters you’ve fabricated deep ties with.”6 This quote
Table 1 shows the I-scores; the synthetic graph’s I-scores          exhibits how meaningful play is highly impactful on player
are very near to either 0 or 1– the ends of the spectrum. Both      experience. Even trivial choices end up having an effect on
the non-integrated and the rigid-top graphs receive very low        the ending. When a player must put thought into their game
scores demonstrating that graphs with lots of crossover be-         decisions it can make for a more engaging, thought provok-
tween branches thereby the choices have little impact. In           ing, interactive game experience. A review of No One Has
fact, the non-integrated graph allows for complete crossover        to Die posits, “The main highlight is the plot and the music.
– a player in states can always move to any state in the fol-       Both had me intrigued, seriously absorbing. I always love it
lowing level of the topology, yielding an I-score near 0. On        when alternate paths somehow link together. It is definitely
the other hand, the rigid-integrated, rigid-bottom, and two-        worth finding all the endings!” 7 This quote emphasises the
on-two graphs have very high scores. This illustrates the lim-      converging game paths. The reviewer suggests that players’
ited amount of crossover between branches in their topolo-          choices creates a more enjoyable game experience.
gies. The rigid-integrated graph receives a perfect score of
                                                                       5
1 indicating that there is absolutely no crossover between               https://cheatcc.com/xb/rev/shadowhedgehogreview.html
                                                                       6
branches.                                                                https://www.dailystar.co.uk/tech/reviews/steins-gate-elite-
                                                                    ps4-review-16905592
   4                                                                   7
     https://en.wikipedia.org/wiki/Zero Escape: Virtue’                  http://gamescheat22.blogspot.com/2013/04/no-one-has-to-
s Last Reward                                                       die-game-review.html
                      Related Work                                measure integrated meaningful play for storylines. As such,
Player enjoyment is difficult aspect of a game to quantify,       we made some underlying assumptions about the kinds of
thus there is a need for a validated and generalized model        graphs in the storylines: (1) we assume there are no loops in
that can be used to design and evaluate game enjoyment            the storyline: that is, there is a sequence of player’s choices
(Bostan and Öğüt 2009; Sweetser and Wyeth 2005).               from a state decisions that will lead to that same state; (2)
   Fabricatore, Nussbaum, and Rosas (2002) used single            we assume that games have multiple ending states. While
player games to construct a qualitative design model aim-         some games do violate these assumptions, for the purposes
ing at understanding what players want in a video game.           of this paper we are only considering games that do not.
Participants had a playing session with one of 39 selected        Our empirical results show that games that ”lock” the player
games (picked by their popularity). During the play session       on early choices have a I-score close to 1 whereas those
participant’s activities and opinions were logged. This study     where choices do not matter as much have scores closer to 0.
resulted in the formation of a hierarchical design model us-      Games that contain a balance of the two have I-scores close
ing entities, scenario, and goals as three aspects of the game    to 0.5. In future work, we would like to examine games such
that were found to be deterministic of the player experience.     as Bioware’s Mass Effect 3, where endings are also influ-
                                                                  enced by internal scoring functions. Our code for this pa-
   In Desurvire, Caplan, and Toth (2004) a study was con-
                                                                  per is available at https://github.com/abettis56/meaningful-
ducted to assess the usefulness and validity of various
                                                                  play-ml.
heuristics by tracking participants navigation and feedback
                                                                     Acknowledgements. This research was supported by
in a game shell. The user study provided evidence to us-
                                                                  NSF REU grant 1757787, NSF grant 1909879, ONR grants
ing heuristics to evaluate games for playability. They deter-
                                                                  N00014-18-1-2009 and N68335-18-C-4027. Any opinion,
mined game heuristic categories such as game story (e.g.,
                                                                  findings, and conclusions or recommendations expressed in
character development) and game play (e.g., challenges
                                                                  this material are those of the author(s) and do not necessarily
users must overcome to move forward throughout a game).
                                                                  reflect the views of the funding agencies.
   Sweetser and Wyeth (2005) report on a model to eval-
uate player enjoyment. Various design heuristics proposed
in previous research are integrated into a model to evaluate
                                                                                          References
game enjoyment using flow. Flow occurs when the challenge         Bostan, B., and Öğüt, S. 2009. Game challenges and dif-
level matches the skill level of the user. The proposed model     ficulty levels: lessons learned from rpgs. In International
GameFlow adapts the idea of flow experiences to games.            simulation and gaming association conference, 1–11.
The model consists of components such as: concentration,          Bridle, J. S. 1990. Probabilistic interpretation of feedfor-
challenge and skills. Bostan and Öğüt (2009) further develop   ward classification network outputs, with relationships to
a GameFlow model specifically for RPGs to aid in maximiz-         statistical pattern recognition. In Soulié, F. F., and Hérault,
ing player experience. User’s freedom of choice and mean-         J., eds., Neurocomputing, 227–236. Berlin, Heidelberg:
ingful play are deemed important aspects of a games story-        Springer Berlin Heidelberg.
line. A game’s difficulty curve should keep player engage-        Desurvire, H.; Caplan, M.; and Toth, J. A. 2004. Using
ment high and minimize frustration by increasing game dif-        heuristics to evaluate the playability of games. In CHI’04
ficulty as the player progresses through the game.                extended abstracts on Human factors in computing systems,
   Purdy et al. (2018) study a collection of story features to    1509–1512.
assess an storyline’s quality. For instance, the temporal or-     Fabricatore, C.; Nussbaum, M.; and Rosas, R. 2002. Playa-
dering feature looks at the plausibility that one sentence fol-   bility in action videogames: A qualitative design model.
lows from its previous sentence in the story. Our study is        Human-Computer Interaction 17(4):311–368.
orthogonal to story’s quality. We are looking at measuring
the integrated level of a game between the player’s choices       Mawhorter, P. A.; Mateas, M.; and Wardrip-Fruin, N. 2015.
and the outcome of the game.                                      Generating relaxed, obvious, and dilemma choices with dun-
                                                                  yazad. In Eleventh Artificial Intelligence and Interactive
   Mawhorter, Mateas, and Wardrip-Fruin (2015) present a
                                                                  Digital Entertainment Conference.
system capable of generating choices to achieve poetic ef-
fects. The system assumes that the player has intrinsic goals     Purdy, C.; Wang, X.; He, L.; and Riedl, M. 2018. Predict-
such as avoiding injury and diffuse threats to innocent NPCs.     ing generated story quality with quantitative measures. In
It can generate three kinds of choices in relation to the         AIIDE, 95–101.
player’s goals. For instance, in dilemma choices, player’s        Riedl, M. O., and Young, R. M. 2006. From linear story gen-
choices will lead to achieve one of the player’s goals but        eration to branching story graphs. IEEE Computer Graphics
will cause other goals to fail. The focus of these criteria is    and Applications 26(3):23–31.
on individual players’ choices whereas our focus is on how        Salen, K.; Tekinbaş, K. S.; and Zimmerman, E. 2004. Rules
the player’s choices affect the outcome of the game.              of play: Game design fundamentals. MIT press.
                                                                  Sutton, R. S., and Barto, A. G. 2018. Reinforcement learn-
                       Conclusions                                ing: An introduction. MIT press.
We introduce the I-score, a metric that evaluates the in-         Sweetser, P., and Wyeth, P. 2005. Gameflow: a model for
tegrated level of a game with respect to its possible end-        evaluating player enjoyment in games. Computers in Enter-
ings. This is the first work attempting to provide a numerical    tainment (CIE) 3(3):3–3.