Scoring Integrated Meaningful Play in Story-Driven Games with Q-Learning Alan Bettis1 , Danielle Newberry2 and Hector Munoz-Avila1 1 Lehigh University, 2 Middlebury College Abstract latter stages. On the other extreme, if the game has multi- ple endings, but “locks” the player once the initial choice is Integrated meaningful play is the idea that player’s choices made, then its I-score is 1. If a game has an I-score close to should have a long-term effect on the game. In this paper we present I-score (for integrated), a scoring function for scoring 0.5 it indicates that the game is balanced between these two integrative game play as a function of the game’s storylines. extremes providing more choice to the player. The I-scores are in the range [0,1]. In games with I-scores Game’s storylines can be represented as directed graphs, close to one, player’s early choices determine the game’s end- where each node is an state in the storyline (e.g., beginning ing; choices made later in the game do not change the final of the game, each of the possible endings) and edges repre- ending of the game. In contrast, games with I-scores close to sent transitions between states as a result of player’s choices zero, players’s choices can change the ending until the very (Riedl and Young 2006). The basis for computing the I- end. Games with scores closer to 0.5 provide a more balanced score is the use of Reinforcement Learning (RL) techniques player choice whereby the game’s ending still can be changed with the aim at finding the “true” value (defined in RL is a despite early decisions, but not so much that the ending could be changed at any point. metric of the suitability of every state to reach a goal based on the rewards) of every state based on the reward function. We then find the distances between those true state values to Introduction compute the overall I-score of the graph. Some games are famous for their storyline and how the play- The premise of our work is that integrated meaningful ers’ choices play out in the final ending of the game. For play assess the relationship between player choices and the instance, Bioware’s Knights of the Old Republic is well- game ending. As the value of being in a state is intrinsi- known for a variety of aspects including how the player’s cally linked to player choice, Reinforcement learning finds choices throughout the game will result in a variety of end- the value of each state. ings (and cinematics). We present an empirical evaluation on both synthetic Having multiple endings is not a requirement for a game graphs, representing different topologies of player choices, to be considered to have a great storyline. Many first-person as well as actual games and discuss their resulting I-scores. shooters such as Valve’s Half-Life are linear, yet its storyline is considered memorable. Nevertheless, the long term effect that player’s choice has on the ending of a game is one of the Preliminaries most intriguing aspects of game design. Researchers have Our work links two different concepts: Q-learning and introduced evaluative definitions assessing player choices. Meaningful Play. We introduce these concepts here. Salen, Tekinbaş, and Zimmerman (2004) define the inte- In reinforcement learning, the agent aims to maximize its grated component of meaningful play to indicate if the play- future rewards while interacting with the environment from ers’ choices have long term repercussions on the game. The a starting time t: rt +rt+1 +...+rT (Sutton and Barto 2018). idea is that by enabling long term repercussions of player’s For our purposes we focus on episodic tasks, where the agent choices in the game, the game empowers the players and terminates its execution after reaching a concluding state. makes their choices have more meaning. Tabular reinforcement learning algorithms maintain a two In this paper, we propose the I-score (for integrated) met- dimensional table Q : S × A → R, where S represents the ric. It scores how much player’s choice impacts game end- states that the agent can visit, A represents the actions that ing. In games with a I-score close to 0, the player’s can the agent can take, and R represents the real numbers. The change the ending at any point in the game including at the entry Q(s, a) indicates the estimated reward the agent gains Copyright c 2020for this paper by its authors. Use permitted un- from taking action a in state s. The estimated value Q(s, a) der Creative Commons License Attribution 4.0 International (CC is updated after the agent takes an action and receives a re- BY 4.0). ward using the following formula: choice with the goal of getting to a desired ending state. Then we use the Manhattan distance to compare all pos- Q(s, a) ← (1 − α)Q(s, a) + α[r + γ max 0 Q(s0 , a0 )] sible routes from the same starting point to a desired end a point to determine the variance of paths, i.e. the importance 0 Where s is the state reached after executing action a in of certain choices. Second, we weigh user’s earlier choices state s. α is the step-size parameter determining how much higher than future ones to account for the fact earlier choices weight to give the previous estimate Q(s, a) and how much have the opportunity to have a more integrated impact than weight to give the update (i.e., the [r + γ max 0 Q(s0 , a0 )] part choices made near the end of a game. a of the equation). γ is the learning rate indicating how much weight to give to prior estimates of Q(s0 , a0 ). Scoring the Integration Level of a Game Reinforcement learning algorithms can use a variety of We present a procedure computing the I-score, a scoring techniques such as -greedy: with a probability 1 −  where function I: GAM E → [0, 1] such that if a score is closer the agent chooses the greedy action pi(s) and with a prob- to 0.0, this indicates a high degree of overlap between the ability  it pics a random action. Based on the estimates, paths from the beginning of the game to each ending of the Q(s, a), the so-called greedy policy can be defined: game. Conversely, if a score is 1.0, the paths from the start of the game to each ending have more separation at the end. π(s) = max 0 Q(s, a0 ) For our study we developed synthetic graphs to repre- a sent choice driven games with varying amounts of choices To guarantee that future reward is maximized, the policy for players. The synthetic topologies have a single starting enforces that for every state s the action a’ with the highest node connected to four succeeding nodes. There are four current estimate Q(s, a0 ) is chosen. levels, and a final ending level each with four nodes. The In Rules of Play: Game Design Fundamentals Salen et term “branch” will refer to the direct paths made from the al. proposes a game design framework for not only com- first decision nodes to the respective ending nodes vertical puter games, but all games — from board games to sporting to them. games. Meaningful play is highlighted as a core concept of Figure 1 shows two of these topologies. The starting node game design and integral to a game’s success. Salen et al. represents the starting state in a video game. After a player define meaningful play as a relationship within a game be- makes an in-game decision, the game state transitions to the tween action and outcome. There are two types of meaning- corresponding connected node. ful play: integrated and discernable. Integrated meaningful We begin with both extremes. In Figure 1 (a), each node play means that a player’s action has some long term conse- in each level is connected to every subsequent level’s node quences, i.e. the decision or play is woven or integrated into allowing for a transition into any other branch. Most player the game as a whole. Discernable meaningful play refers to choices made during the duration of the game have no im- when a player has immediate or discernable consequences pact on the ending of the game. Instead, only the final player to a given action. This work posits that successful games in- decision determines the game’s ending state. We refer to this clude game actions and outcomes that are both discernable topology as “non-integrated” because aside from the last and integrated (Salen, Tekinbaş, and Zimmerman 2004). choice, all other decisions made by the player have no af- The notion of meaningful play provides a conceptual as- fect on the final outcome of the game. Its I-score is given sessment of the design of a video game. It evaluates the re- by I(non-integrated) = 0.1585, an unsurprisingly low score. lationship between actions taken by a player and the out- Only the player’s final decision has a direct impact on the comes of those actions. It says that in order for the interac- ending, but this decision is not integrated because it is made tion between players’ actions and the actions’ outcomes to at the end of the game. be meaningful, they must be discernable and integrated. For our opposite synthetic graph, we construct a topology Discernable means that the immediate outcome of the such that after a player makes their first choice, they become action is observable by the player. For example, when the locked into a specific branch with no possibility of chang- player press the “w” key, they observes their avatar moving ing their outcome. Thus, the very first choice the player forward. Integrated means that the players’ actions have a makes determines the outcome. We refer to this topology long-term effect on the course of the game. For example, in as “rigid-integrated”. It is shown in Figure 1 (a). It has an games like Knights of the Old Republic, players actions such score I(rigid-integrated) = 1. as killing or saving a particular NPC, will assign “good” or “bad” karma points and these in turn will determine how the storyline in the game will end. Underlying assumptions on game structure. As a first This work solely focuses on assigning a score to the work computing a numeric measurement of integrated amount of integrated meaningful play in story driven games. meaningful play, we make some assumptions about the Our algorithm determines the integrated impact of every graphs of the games we have chosen to test. First, we as- player choice made using two main processes: (1) We calcu- sume that the graph is a directed acyclic graph (DAG). That late overall “meaningfulness” of a choice; (2) we weigh the is, from any game’s state there is no sequence of player’s obtained choice value to account for integration level. First, choices that will lead to that same state. Many games do to calculate the impact of a choice we use Q-learning to de- have such cycles, but we have chosen games that do not termine how much reward is obtained by making a given have such cycles for simplicity. Second, we assume that the Computing Weighting Function W Integrated play is defined as the effect a given action has later on in a game. Therefore, it stands to reason that earlier actions have more opportunity to have a long-lasting impact on the game. To reflect this, we assign weights to actions based on how early in the game they occur, with more weight given to earlier actions. We calculate a set of n weights such that they sum to 1, where n is the number of transitions in the longest route in (a) (b) the graph. We then apply these weights in descending order, with the higher weights being applied closer to the graph’s Figure 1: (a) Non-integrated topology; regardless of the start node and smaller weights being closer to the graph’s choices made the player can always reach any of the end- endings. ings; it has an I-score of 0.15; (b) Rigid-integrated topol- Algorithm 1 describes the procedure for calculating the ogy; once the player makes a choice in the first decision, the set of weights. It first initializes an empty set of weights outcomes are fixed for the remaining of the game; it has an (Line 2). Then, it calculates the slope of a line according I-score of 1. to graphHeight, the length of the longest route from start to any ending in the graph (Lines 3-10). Our initial unpro- cessed weights are given by i*slope. Once this is done, the graphs have multiple ending states. Games such as Knights algorithm calculates the average difference, toAdd, between of the Old Republic have multiple endings depending on the the sum of all unprocessed weights and 1 (Line 7). Because player choices. Multiple endings are not a requirement for the unprocessed weights are much smaller than 1, we add a well-received narrative-based game, but they provide an toAdd to each i*slope (Lines 8-10). Lastly, we reverse the intuitive way to compare routes for scoring a game. weights to put them into descending order. While many successful narrative-based games only have one ending, or include cycles in their structure, this paper Algorithm 1 Weight Calculation describes a new proof-of-concept procedure. As such, we 1: procedure GET W EIGHTS(graphHeight) leave these other types of game structures for future work. 2: new List weightSet 3: slope ← 1/graphHeight2 Computing Integration Score 4: sum ← 0 We present a procedure to compute the I-score of a game. It 5: for i ← 1; i ≤ graphHeight; i + + do is a composite of four elements: 6: sum ← sum + (i ∗ slope) 7: toAdd ← (1 − sum)/(graphHeight) I(Game) = (M ◦ N ◦ W ◦ Q)(Game) 8: for i ← 1; i ≤ graphHeight; i + + do 9: weight ← ((i ∗ slope) + toAdd) Computing the Q-values 10: weightSet.append(weight) For the games we are considering, a story can be seen as 11: weightSet.reverse() a DAG, where nodes are precise points in the story and 12: return weightSet edges are players’ choices advancing the story. Our goal is to analyze the overall level of integration in a story. To do so, we use Q-learning to calculate the reward value of Once we have a set of linearly descending weights, we every state-action pair by defining a matrix of dimensions can apply them to the Q-tables to construct a set of weighted 1 . . . N odes × 1 . . . N odes. Our scoring system accepts this Q-tables that we will call “I-tables.” This is done by con- matrix as input, with all cells initialized to 1. Additional pa- ducting a depth-first search of each Q-table and multiplying rameters include the position of the start node, represent- each Q-value by the weight corresponding to the Q-value’s ing the beginning of the story, and the positions and num- height in the graph. ber of ending nodes, representing all possible story endings. If two or more branching paths in the graph of differing We give a reward of 100 to the desired ending state in or- lengths re-converge, the “height” of the node could be the der to incentivize our Q-learning agent to reach this destina- height of either prior node plus one. However, this is not tion. We then run Q-learning on the resulting matrix for each consequential in the overall score, as the weights maintain ending state. After 100,000 iterations, the resulting Q-tables their descending order before the branching and after the show the expected rewards for each choice in each given re-convergence. Therefore, choices are still weighed higher story point. For each ending, we get a separate Q-table with when found earlier, preserving integration. the updated reward values. Let this procedure be defined by Integrated play means that a decision has long-lasting the function Q(Game), which accepts an adjacency matrix consequences, and a decision made earlier on has more op- representation of a game as input and outputs a series of n portunity to have long-lasting impact. Therefore, this func- Q-tables, where n is the number of available endings in the tion allows our scoring system to detect integration in games game’s story. by placing higher value on decisions made earlier in the story. Let this procedure be defined by the function W (Q − tables), which accepts a set of n Q-tables as input and out- puts a set of n I-tables. Computing the Normalized Scores N To ensure that our final score is normalized between 0 and 1, we run softmax on a vector containing all non-zero values in a given I-table, producing a set of normalized values. We can then take our normalized set and overwrite the original values with their normalized values. (a) (b) (c) In order to normalize all values in a given weighted Q- table, all non-zero values in the table are compiled into a Figure 2: (a) The two-on-two graph allows for crossover in one-dimensional vector. As non-zero values signify connec- every other level alternating between the two endings.; it has tions, they are the only values of interest in this operation. an I-score of 0.9708; (b) The rigid-top topology, top half is We then use softmax to normalize the values in this vector, identical to the rigid-integrated and the bottom half is iden- and replace the old non-zero values in the table with the cor- tical non-integrated graph; it has an I-score of 0.1590; (c) responding new ones from softmax (Bridle 1990). Let this The rigid-bottom topology, inverts the connectivity of the procedure be defined by the function N (Q−T ables), which rigid-top. It has a score of 0.9995 accepts a set of n weighted Q-tables as input and outputs a set of n normalized, weighted Q-tables. makes decisions from time to time when prompted. The Computing the Manhattan Scores M player can check the protagonist’s cell phone at any time, Once this is done for each table, we can calculate the Man- and respond to messages from other characters using one of hattan distances in a pairwise fashion between each I-table, a finite number of options. For simplicity, dialogue routes getting a sense of how different each route is in the process. that only lead to achievements are ignored. Once these distances are averaged, we have our I-score. The graph constructed from Steins;Gate is shown in Fig- Finally, we take the Manhattan difference between each ure 3. There are many branching points in the story with 6 pair of the game’s I-tables. If two I-tables for different end- endings of varying difficulty to achieve, the true ending be- ings have different paths, then the Manhattan distance will ing the most difficult. Each time a critical decision is made, be higher, reflecting the greater amount of integrated play as the player is denied access to one ending, but can still access a result of the game. Conversely, if a nearly identical path all the remaining endings from the new branch. Steins;Gate can be taken to two different endings, the Manhattan dis- received an I score of 0.6829. tance will be lower. The final average is divided by 2 so that Next, we investigate a browser-based puzzle game called a different choice between two I-tables is not counted twice. No One Has to Die.2 In each level of this game, the player Let this Manhattan distance procedure be defined by a func- must manipulate elements on the board to save characters tion M (Q − tables), which accepts n weighted, normalized from a fire, but the puzzles are always designed such that I-tables as input and outputs a numeric score from 0 to 1. one character must be sacrificed. The ending of the game is dependent upon which characters remain alive. Once all Empirical Evaluation endings have been seen, the “true route” is unlocked, which Synthetic Graphs is ignored as it is linear, and, thus not of interest for this study. In addition to non-integrated and the rigid-integrated graphs No One Has to Die’s graph is much simpler than we created other synthetic graphs as shown in Figure 2. They Steins;Gate, with 5 major endings based on who lives or combine various levels of inter-connectivity. dies. There are 3 choices the player will make in a given play through of the game. The first choice forces the player Game Graphs down one of two paths, each of which can either access one We choose four story-driven video games on which to test of a different pair of endings, or rejoin on a single ending. our scoring system because we were able to formulate their This game was assign a I-score of 0.6782. topologies from online sources. For each game, we construct In the game Shadow the Hedgehog,3 players move a simplified graph where each choice is a transition from one through levels collecting rings and engaging in combat with node to another. Choices related solely to non-story elements enemies much as they would in the Sonic the Hedgehog se- of the game such as achievements are ignored for simplicity. ries. The key difference from typical Sonic the Hedgehog From these graphs, we construct adjacency matrices to pass games is that each level in Shadow the Hedgehog contains as input to our scoring algorithm. multiple objectives, and which objective the player com- The first and most complex game that we inspect is pletes leads them to a different next stage. Based on which Steins;Gate, a game of the visual novel genre. 1 Game play 2 primarily takes the form of dialogue trees where the player https://levelskip.com/puzzle/No-one-has-to-die-Walkthrough 3 https://en.wikipedia.org/wiki/Shadow the Hedgehog (video 1 https://en.wikipedia.org/wiki/Steins;Gate game) Figure 3: Graph of the Steins;Gate game. route the player completed, they will receive one of 10 dif- other games we have discussed so far, allowing the player to ferent endings. Similarly to No One Has to Die, there is a move between several routes almost freely, and only deny- “true ending” route unlocked once all other endings have ing access to certain endings after the third or fourth choice. been seen, but this is once again linear and ignored for the Shadow the Hedgehog received an I-score of 0.8354. This purposes of this paper. The graph of Shadow the Hedgehog score closer to 1 is indicative of higher difference between is constructed with each node representing a level, and each routes and more “strictness.” While there is less impact on edge representing one of the possible objectives to complete. the player’s early choices, the player decisions begin to mat- This game is notably non-committal in comparison to the ter more and more as they reach the end. The last game we investigate is Zero Escape: Virtue’s Last Game Score Reward,4 the second installment in the “Zero Escape” series Steins;Gate 0.6829 of games. Zero Escape is a series of visual novel games in No One Has to Die 0.6782 which the player solves puzzles to advance the story, and Shadow the Hedgehog 0.8354 occasionally makes decisions that change the route. Among Zero Escape: Virtue’s Last Reward (Unlocked) 0.9763 the games we cover in this paper, this game is unique in Zero Escape: Virtue’s Last Reward (Locked) 0.9983 that it prohibits access to certain endings at first through var- Non-Integrated 0.1585 ious means. Information required to solve certain puzzles Rigid-Integrated 1.0000 is found on different routes, making progression essentially Rigid-Top 0.1590 impossible without having been down the correct routes. Rigid-Bottom 0.9995 Other endings are fully locked, the player being met with a Two-on-Two 0.9708 screen reading “To be Continued...” if they reach the ending prematurely. This is meant to enforce a degree of linearity to Table 1: Resulting I-scores the storytelling. For simplicity, we construct two different graphs: A “complete” graph with prologue and all endings “unlocked”, Both unlocked and locked Zero Escape: Virtue’s Last Re- and a “locked” graph with no prologue segment and only ward yields I-scores in the range of 0.9 indicating a close to initially unlocked endings. Both graphs have a very similar rigid topology. Shadow the Hedgehog has a slightly lower structure to that of No One Has to Die, albeit with more end- score of 0.8354 suggesting that this game has slightly more ings and length. divergence between the branches of its topology. In a pre- The complete graph has 20 accessible endings, with the view of Shadow the Hedgehog, Gooseberry commented first decision the player makes denying access to over half “gamers will decide in a race against time by choosing be- of the endings. With the large amount of denial from end- tween any number of routes by which to unveil the secrets ings, the complete graph receives a I-score of 0.9763. The of his past. Multiple endings will reflect critical decisions at similarity to the synthetic rigid-integrated graph is expected, every stage of the game. Gamers have ultimate control over as there is no overlap between routes in this game. Shadow’s past, with the potential to rewrite his history time Meanwhile, the locked graph has a nearly identical struc- and time again.” 5 The game’s high score reflects the gamer’s ture, but only allows access to 9 endings. This proves to have “ultimate control” early game decisions do not immediately very little impact on the I-score, as this graph receives a lock player’s out of endings, late decisions made in the game score of 0.9983. The increase in score can be attributed to still have the power to swap branches. the reduced number of endings, meaning that each choice Steins;Gate and No One Has to Die have similar scores in denies a player access to a higher percentage of the total the range of 0.6. Both of these games have complex topolo- number of endings. gies that contain a mixture of paths. We hypothesize that games between these two extremes, with a I-score close to Experimental Setup 0.5 have more interesting story lines. Daily Star reviewers of Steins;Gate remarked, “As you proceed, once trivial choices Our Q-learning parameters are as follows: α = 0.90, γ = you make reveal themselves to have a direct effect on the 0.75, and  = 0.20. For each game, we run Q-learning on the game’s finale. Multiple endings both good and bad are up game’s weighted adjacency matrix 100,00 times. We repeat for grabs hindering even seemingly simple decisions with a this process 10 times and average the outputs. heavy weight, even more so as you debate the consequences of altering time and the potential butterfly effect it’ll have on Experimental Results the characters you’ve fabricated deep ties with.”6 This quote Table 1 shows the I-scores; the synthetic graph’s I-scores exhibits how meaningful play is highly impactful on player are very near to either 0 or 1– the ends of the spectrum. Both experience. Even trivial choices end up having an effect on the non-integrated and the rigid-top graphs receive very low the ending. When a player must put thought into their game scores demonstrating that graphs with lots of crossover be- decisions it can make for a more engaging, thought provok- tween branches thereby the choices have little impact. In ing, interactive game experience. A review of No One Has fact, the non-integrated graph allows for complete crossover to Die posits, “The main highlight is the plot and the music. – a player in states can always move to any state in the fol- Both had me intrigued, seriously absorbing. I always love it lowing level of the topology, yielding an I-score near 0. On when alternate paths somehow link together. It is definitely the other hand, the rigid-integrated, rigid-bottom, and two- worth finding all the endings!” 7 This quote emphasises the on-two graphs have very high scores. This illustrates the lim- converging game paths. The reviewer suggests that players’ ited amount of crossover between branches in their topolo- choices creates a more enjoyable game experience. gies. The rigid-integrated graph receives a perfect score of 5 1 indicating that there is absolutely no crossover between https://cheatcc.com/xb/rev/shadowhedgehogreview.html 6 branches. https://www.dailystar.co.uk/tech/reviews/steins-gate-elite- ps4-review-16905592 4 7 https://en.wikipedia.org/wiki/Zero Escape: Virtue’ http://gamescheat22.blogspot.com/2013/04/no-one-has-to- s Last Reward die-game-review.html Related Work measure integrated meaningful play for storylines. As such, Player enjoyment is difficult aspect of a game to quantify, we made some underlying assumptions about the kinds of thus there is a need for a validated and generalized model graphs in the storylines: (1) we assume there are no loops in that can be used to design and evaluate game enjoyment the storyline: that is, there is a sequence of player’s choices (Bostan and Öğüt 2009; Sweetser and Wyeth 2005). from a state decisions that will lead to that same state; (2) Fabricatore, Nussbaum, and Rosas (2002) used single we assume that games have multiple ending states. While player games to construct a qualitative design model aim- some games do violate these assumptions, for the purposes ing at understanding what players want in a video game. of this paper we are only considering games that do not. Participants had a playing session with one of 39 selected Our empirical results show that games that ”lock” the player games (picked by their popularity). During the play session on early choices have a I-score close to 1 whereas those participant’s activities and opinions were logged. This study where choices do not matter as much have scores closer to 0. resulted in the formation of a hierarchical design model us- Games that contain a balance of the two have I-scores close ing entities, scenario, and goals as three aspects of the game to 0.5. In future work, we would like to examine games such that were found to be deterministic of the player experience. as Bioware’s Mass Effect 3, where endings are also influ- enced by internal scoring functions. Our code for this pa- In Desurvire, Caplan, and Toth (2004) a study was con- per is available at https://github.com/abettis56/meaningful- ducted to assess the usefulness and validity of various play-ml. heuristics by tracking participants navigation and feedback Acknowledgements. This research was supported by in a game shell. The user study provided evidence to us- NSF REU grant 1757787, NSF grant 1909879, ONR grants ing heuristics to evaluate games for playability. They deter- N00014-18-1-2009 and N68335-18-C-4027. Any opinion, mined game heuristic categories such as game story (e.g., findings, and conclusions or recommendations expressed in character development) and game play (e.g., challenges this material are those of the author(s) and do not necessarily users must overcome to move forward throughout a game). reflect the views of the funding agencies. Sweetser and Wyeth (2005) report on a model to eval- uate player enjoyment. Various design heuristics proposed in previous research are integrated into a model to evaluate References game enjoyment using flow. Flow occurs when the challenge Bostan, B., and Öğüt, S. 2009. Game challenges and dif- level matches the skill level of the user. The proposed model ficulty levels: lessons learned from rpgs. In International GameFlow adapts the idea of flow experiences to games. simulation and gaming association conference, 1–11. The model consists of components such as: concentration, Bridle, J. S. 1990. Probabilistic interpretation of feedfor- challenge and skills. Bostan and Öğüt (2009) further develop ward classification network outputs, with relationships to a GameFlow model specifically for RPGs to aid in maximiz- statistical pattern recognition. In Soulié, F. F., and Hérault, ing player experience. User’s freedom of choice and mean- J., eds., Neurocomputing, 227–236. Berlin, Heidelberg: ingful play are deemed important aspects of a games story- Springer Berlin Heidelberg. line. A game’s difficulty curve should keep player engage- Desurvire, H.; Caplan, M.; and Toth, J. A. 2004. Using ment high and minimize frustration by increasing game dif- heuristics to evaluate the playability of games. In CHI’04 ficulty as the player progresses through the game. extended abstracts on Human factors in computing systems, Purdy et al. (2018) study a collection of story features to 1509–1512. assess an storyline’s quality. For instance, the temporal or- Fabricatore, C.; Nussbaum, M.; and Rosas, R. 2002. Playa- dering feature looks at the plausibility that one sentence fol- bility in action videogames: A qualitative design model. lows from its previous sentence in the story. Our study is Human-Computer Interaction 17(4):311–368. orthogonal to story’s quality. We are looking at measuring the integrated level of a game between the player’s choices Mawhorter, P. A.; Mateas, M.; and Wardrip-Fruin, N. 2015. and the outcome of the game. Generating relaxed, obvious, and dilemma choices with dun- yazad. In Eleventh Artificial Intelligence and Interactive Mawhorter, Mateas, and Wardrip-Fruin (2015) present a Digital Entertainment Conference. system capable of generating choices to achieve poetic ef- fects. The system assumes that the player has intrinsic goals Purdy, C.; Wang, X.; He, L.; and Riedl, M. 2018. Predict- such as avoiding injury and diffuse threats to innocent NPCs. ing generated story quality with quantitative measures. In It can generate three kinds of choices in relation to the AIIDE, 95–101. player’s goals. For instance, in dilemma choices, player’s Riedl, M. O., and Young, R. M. 2006. From linear story gen- choices will lead to achieve one of the player’s goals but eration to branching story graphs. IEEE Computer Graphics will cause other goals to fail. The focus of these criteria is and Applications 26(3):23–31. on individual players’ choices whereas our focus is on how Salen, K.; Tekinbaş, K. S.; and Zimmerman, E. 2004. Rules the player’s choices affect the outcome of the game. of play: Game design fundamentals. MIT press. Sutton, R. S., and Barto, A. G. 2018. Reinforcement learn- Conclusions ing: An introduction. MIT press. We introduce the I-score, a metric that evaluates the in- Sweetser, P., and Wyeth, P. 2005. Gameflow: a model for tegrated level of a game with respect to its possible end- evaluating player enjoyment in games. Computers in Enter- ings. This is the first work attempting to provide a numerical tainment (CIE) 3(3):3–3.