Tackling Generation of Combat Encounters in Role-playing Digital Games Matouš Kozma1 , Vojtěch Černý1 , and Jakub Gemrot1 Faculty of Mathematics and Physics, Charles University Ke Karlovu 3, Praha 2, 121 16, Czech Republic mattkozma@hotmail.com, {cerny,gemrot}@gamedev.cuni.cz Abstract: Procedural content generation (PCG) has been be described as how to select a group of monsters that a used in digital games since the early 1980s. Here we fo- player should face while keeping the game engaging for cus on a new problem of generating personalized combat the player. encounters in role playing video games (RPG). A game One of the current widely accepted psychological mod- should provide a player with combat encounters of ade- els of engagement in games is flow [3, 4]. Csikszentmiha- quate difficulties, which ideally should be matching the lyi defines flow as:"A state in which people are so involved player’s performance in order for a game to provide ade- in an activity that nothing else seems to matter; the expe- quate challenge to the player. In this paper, we describe rience is so enjoyable that people will continue to do it our own reinforcement learning algorithm that estimates even at great cost, for the sheer sake of doing it.” [5]. It difficulties of combat encounters during game runtime, is thought that a player can enter the flow state only if a which can be them used to find next suitable combat en- game supplies a challenge that is adequate to the player’s counter of desired difficulty in a stochastic hill-climbing ability [6]. Wrt. player skill, if the challenge is too high, manner. After a player finishes the encounter, its result is a player may experience anxiety, fear of failure or frustra- propagated through the matrix to update the estimations tion, while if the challenge is too low, a player may expe- of not only the presented combat encounter, but also simi- rience routine, boredom or loss of interest (Fig. 1). lar ones. To test our solution, we conducted a preliminary study with human players on a simplified RPG game we have developed. The data collected suggests our algorithm can adapt the matrix to the player performance fast from little amounts of data, even though not precisely. Keywords: Procedural generation, Video games, Role- playing games, Encounter generation, Difficulty adapta- tion Figure 1: Relation of the player’s skill to the challenge 1 Introduction provided by the game, depicting the flow channel area. Many games today implement some form of procedural Focusing on RPG combat encounters, the challenge of content generation (PCG) [1, 2], where some parts of a their design is how to compose a group of monsters play- game are generated by an algorithm. PCG is a colloquial ers should face in order for them to stay in the flow chan- term referring to creation of digital content through rules nel (Fig. 1). This is problematic as the player’s ability is instead by hand. In games, it has been used for generating, not known when the game starts. Whereas we can fathom e.g. dungeon level layouts (Rogue by Michael Toy and how difficult a given combat encounter is, e.g., by statis- Glenn Wichman, 1980) to entire galaxies (Elite Danger- tics over monsters combat values (total health, damage per ous by Frontier Developments, 2015). This can greatly re- second, etc.), we do not know how proficient a player is at duce the work required from designers or artists. It can im- first or how fast they can improve in playing the game. We prove the replay value of the game, as, e.g., the levels, dun- propose to solve this problem using reinforcement learn- geons, characters, or even dialogues can be generated dur- ing, i.e., to create a system that given the observation of ing playtime to vary a player’s game-play experience every player’s performance in the game can generate next com- run. This paper deals with the PCG of combat encoun- bat encounters of appropriate difficulty. ters for role-playing games (RPGs). Combat encounter in The rest of the paper is structured as follows. In Section RPG refers to any situation in which a player, who is in 2, we examine work related to ours. In Section 3, we for- control of virtual characters usually called heroes, faces an mulate the combat encounter generation problem (CEGP) opposition of enemy non-player characters (NPC) usually and requirements on its solutions, namely the ability to called monsters, which result in a combat between char- adapt to concrete game instances and the fact the solu- acters. The problem of combat encounter generation can tion must be able to adapt to the player fast. In Section Copyright ©2021 for this paper by its authors. Use permitted under 4, we detail our approach tackling CEGP. In Section 5, we Creative Commons License Attribution 4.0 International (CC BY 4.0). present an implementation of our approach for an example RPG game we have created to test it. In Section 6, we de- 3 The Problem scribe and discuss the results of an experiment with human subjects that was conducted to test our approach. Finally Here we define several terms commonly used in RPG in Section 7, we conclude the paper with notes on future games, which we will use throughout the paper; defined work. terms are in italics. Some terms are formulated in an open- ended manner as different games will differ in details. 2 Related Works For the purpose of this paper, we will understand a game as a double (W, S) of a game world W and a game state S. A game world as a double (L, P) of game locations L While procedural content generation in games is a well- and actor prototypes P. A game state consists of spawned studied field, it seems that existing research does not focus actors only. on generating combat encounters. The works we found about content generation in RPGs focus either on generat- Game locations L is a set of all possible positions in ing levels [7, 8], or quests [9, 10]. the game world, which actors may occupy. In some games locations could be a grid of discrete tiles, whereas in others We tried looking at other genres for inspiration, yet we it could be a more complex 3D structure. The number were unable to find any relevant research. It seems that of these atomic locations can be in thousands per game even in genres such as platformers[11] or roguelikes[12] level for grid-based games, or infinite in case of 3D worlds the main focus is on generating the level, not on generating where location is represented by floating point numbers. monsters or combat encounters in general. An actor prototype, p ∈ P, is a prototype of a character that can be spawned as an actor into the game. An actor 2.1 Dynamic Difficulty prototype is a tuple of all the attribute values that describe the class of an actor, e.g., its health, damage per second, The encounters we generate must be appropriately diffi- size, skills, etc. These attributes are game specific, but we cult for the player. Therefore, techniques for runtime ad- assume health and damage per second to be present at least justment of difficulty might be relevant to us. to allow for combat between actors. Xue et al.[13] describe the results of adding dynamic Actor, a = (pa , sa , la , ca ), is a spawned actor prototype difficulty to a match-three game. For our purposes, the that can be present within the game and a part of the game most important conclusion is that we should attempt to state. It consists of actor prototype pa definition, its cur- maximize engagement throughout the entire game, not fo- rent state sa of pa attributes (initial values set according to cus on creating encounters of some specific difficulty. pa ), location la ∈ L an actor occupies in the world and a Missura and Gärtner[14] introduced an algorithm that controller ca , which is either a human or an AI. We label could split players into groups and then generate chal- the set of all possible actors as A. lenges specifically for that group. However, this approach A hero(r) ::= {r ∈ A | cr = human} is an actor a player required a lot of data about many players playing the game, currently controls. A party(R) ::= P({r ∈ R | hero(r)}) is which we do not have, so this approach could not be used. a set of heroes the player may control. All such parties are denoted as ρ. 2.2 Algorithm Evaluation An enemy(e) ::= {e ∈ A | ce = AI} is an actor a player does not control and the party has to fight. enemies(E) ::= As the goal of the algorithm will be generating encounters P({e | enemy(e)}) is then a group of enemies a player may suitably difficult for the players, its evaluation must in- face in the game, all such enemy groups are denoted as ε. volve real people playing the encounters generated by the A combat area, O = {l ∈ L | ∀l1 , l2 ∈ O : path(l1 , l2 ) ⊆ algorithm. We have chosen to also measure the player’s O}, represents some continuous area in the game world feeling of flow [15] as the measure of quality for the algo- where some combat encounter may take place. rithm. A combat encounter is a triple, c = (Ec , Rc , Oc ), that In psychology, flow is a state where the person is fully represents a situation where a party must defeat some en- concentrating on some task. It is also known as being “in emies. It consists of enemies Ec the party Rc has to fight the zone”. A person in a flow state loses track of time and inside some combat area Oc . The set of all possible combat she focuses completely on the activity she is doing. To encounters is labelled as C. In particular, actor prototypes enter the flow state, the person must feel competent at the are not the only part of c as also their runtime states are task and the task must be appropriately difficult for her. included. As such, C can be thought of as game states in To measure the player’s feeling of flow, we use the Flow the space of combat; as actors are exchanging damage, this Short Scale [16, 17]. This survey measures the flow on a 7- space is traversed until either side is eliminated (their total point scale and a person’s perceived difficulty of the task. health reaches zero). However, these data were not conclusive and we do not re- We model the player y = (sy ), as a single number sy ∈ port them and focus on difficulty estimation of encounters [0; 1] denoting the skill of the player. 0 means no skill, only. which can be thought of as not issuing any actions to heroes during any combat encounter. 1 refers to an opti- seen as training data, provided during the gameplay. How- mal play, i.e., player can achieve the best outcome possible ever, it is not realistic to expect a player to play even ten regardless the encounter given. We denote set of all kinds tutorial encounters just to calibrate the model of the player, of player abilities as Y . which is still very few still given the size of C. We define the difficulty of an encounter c ∈ C as a func- tion D : C ×Y →< 0; 1 >, where its value states how many percent of party health will be lost during encounter as Video game context. RPG games are (typically) played played through by the player of given skill. 0 means the in real-time and there are usually several to tens of com- party Rc will defeat enemies Ec at Oc without any health bat encountered prepared by game designers in each game losses and 1 means the party Rc will certainly be elimi- level. Therefore, the algorithm solving the CEGP must nated by the enemies Ec regardless player actions. Impor- compute fast in order not to interrupt the gameplay and it tantly, there can be encounters that cannot be won even should not consume an unnecessary amount of memory. by optimal players (enemies are too strong for the party to defeat). 4 Methodology The combat encounter generation problem (CEGP) is than an optimization problem: given a set of all possible Our current approach to solving the CEGP is to use re- combat encounters C for a given party R, a player y, and inforcement learning for estimating the difficulty function a desired difficulty d, find cmin ∈ C such as ∀c ∈ C : |d − D, while heuristically reducing the size of the combat en- D(cmin , y)| ≤ |d − D(c, y)|. counters set C. If we can contract the set, it would make both the estimation of D and the search for suitable c ∈ C 3.1 Problem Complexity much easier. For looking up an encounter of some required difficulty d, we are softening the requirement of optimal- The CEGP consists of two sub-problems: 1) how to eval- ity and use stochastic hill-climbing to look for a combat uate the difficulty D; 2) even if D is known, how to search encounter c of difficulty that differs up-to δ from d, i.e., for appropriate c ∈ C fast. The challenge of solving them |d − D(c, y)| < δ . stems from the problem formulation itself (e.g., the size of C) and from the video game context, which poses addi- tional requirements on the CEGP solution. 4.1 Estimating Encounter Difficulties Looking at a combat encounter definition c = (Ec , Rc , Oc ), Difficulty evaluation. The skill of a player is not known we disregard Oc (simplification) and maintain a sparse dif- beforehand. As the result, we cannot compute precise val- ficulty matrix M indexed by enemies and a party. The ma- ues of D even via simulation. Moreover, player’s skill trix then holds estimated difficulties of a combat encoun- can change throughout the game as the player improves in ters Mer given enemies Ee and the party Rr for the player playing the game. Therefore, once observed difficulty of y that is currently playing the game. a concrete c may become irrelevant in future as the player The size of the matrix is huge. Adapting only single ma- skill changes. Finally, if the game is played in real-time, trix element after each encounter would not result in fast the skill may fluctuate as player’s reflexes improve. adaptation to the player. Therefore, we propose to adapt multiple values from the matrix given a single observation using some difficulty adjustment function. Problem size. Considering solving the CEGP for A difficulty adjustment function, which we denote by the game Dragon Age: Origins (Bioware, 2009) ∆(dm , Em , Rm , do , Eo , Ro , α) should return a new difficulty featuring 140 enemy actor prototypes (according to value for an encounter between enemies Em and the party https://dragonage.fandom.com) trying to generate an op- Rm , which was thought to be of dm difficulty, given the ob- timal encounter that consists of about 7 enemies in the served difficulty do from just finished encounter between game, we would need to go through 1407 possible groups enemies Eo and the party Ro ; α is a learning factor. of enemies. Moreover, every enemy group can be spawned Given the observation of do for some encounter into the game to a different locations, which could af- c(Eo , Ro , Oo ), we update each matrix element with Mer = fect difficulty of the encounter greatly. E.g., if a group of ∆(Mer , Ee , Rr , do , Eo , Ro , α). archers is spawned behind a strong knight that is blocking To be able to construct ∆, we define two heuristic func- the path towards them, it would pose an extra challenge tions, which computes similarities of enemies and parties. to the player than if the group of archers could be easily Enemies similarity function is defined as SE : ε ×ε → [0; 1] reached. The size of C is thus enormous. and party similarity function is defined as SR : ρ × ρ → [0; 1]. 0 value means total difference, while 1 means equal- Training data. The only direct way to model skill of a ity of elements. player playing an instance of the game is through observa- We then compute ∆(dm , Em , Rm , do , Eo , Ro , α) = dm + tion of his combat encounters. Such observations can be (do − dm ) ∗ max{0, 1 − SE (Em , Eo ) − SR (Rm , Ro )} ∗ α. The 1: procedure U PDATE M ATRIX (M, do , Eo , Ro , α) 1: procedure G EN E NCOUNTER (R, O, d, M, δ , itermax ) 2: for each (Em , Rm ) ∈ M do 2: i=0 3: dm = M[Em , Em ] 3: Ebest = Ecurr = {pick random enemy for O} 4: M[Em , Em ] = ∆(dm , Em , Rm , do , Eo , Ro , α) 4: δbest = δcurr = |M[Ecurr , R] − d| 5: end for 5: while δ ≤ δbest & i < itermax do 6: end procedure 6: if M[Ecurr , R] − d ≤ 0 then 7: remove random enemy from Ecurr Figure 2: Pseudocode of the method updating difficulty 8: else matrix M after each combat encounter result. Inputs are: 9: add random enemy for O into Ecurr M - difficulty matrix, do - observed difficulty of some en- 10: end if counter that just finished, Eo - initial enemies of the en- 11: δcurr = |M[Ecurr , R] − d| counter, Ro - party that entered the encounter, α - learning 12: if δcurr ≤ δbest then rate. 13: Ebest = Ecurr 14: δbest = δcurr 15: end if function updates the estimation of its error scaled accord- 16: i++ ing to the similarity of enemies and the party of the matrix 17: end while element to real enemies and party participating in combat 18: return c(Ebest , R, O) that was just observed (and multiplied by the learning rate 19: end procedure α). Updating matrix is then done by updating all of its ele- Figure 3: Pseudocode of the method for generating an en- ments (Fig. 2) using ∆ function. This is the place where counter. Inputs are: R - current party, O - combat area, for similarity functions are used the most as they allow us to which we generate the encounter, d - required difficulty, update multiple elements of the matrix. M - the difficulty matrix, δ - difficulty tolerance, itermax - maximum number of search iterations 4.2 Difficulty Matrix Initialization 5 Implementation Ideally, initialization of M should be based on real-data by letting players play the game to obtain observations. In order to test our approach, we implemented a simple However, the amount of required observations is usually PC game of the RPG genre, which is mouse-controlled. so high that it is probably unreasonable to require it even Its design is inspired by gameplay mechanics of well- from big video game making companies. Instead, we pro- known, if a bit dated, RPG games such as Baldur’s gate pose to implement a simple reactive player that can play (BioWare, 1998) and Planescape Torment (Black Isle Stu- through combat encounters and use it to obtain first dif- dios, 1999). A player is leading a party of three preset ficulty estimations for the matrix. We do this only for a heroes through a dungeon, which is a procedurally gen- selected subset of enemies and parties. erated maze consisting of rooms, which constitute com- bat areas where encounters with enemies take place. We tried to design the game to be fun and to have a certain 4.3 Generating an Encounter gameplay depth. Heroes represent three RPG archetypes: knight (high health, melee attack only), ranger (low health, Having matrix M estimating D for player y, enemies and strong ranged attack) and cleric (healer and area con- party similarity functions SE , and SR we can implement troller). Each hero, apart form the attack action, has three stochastic hill-climbing algorithm for finding a combat en- different skills they can use to affect the fight. A player counter to present as a challenge to the player. can level heroes between encounters, i.e., improve their The algorithm (Fig. 3) is getting the party R, the combat health and dealt damage per second. For enemies, there area O and required encounter difficulty d as an input. It is are 4 archetypes. Some of them having multiple instances supposed to find a suitable set of enemies E in order to re- for variety. Each archetype is designed to behave differ- turn an encounter c(E, R, O), whose difficulty differs from ently in the combat in order to prolong the learning curve a required d by an amount, which is smaller or equal to for the player. Additional information can be found in a δ . It works in a stochastic hill-climbing manner. It keeps previous master’s thesis [18]. Due to the time and budget adding/removing random enemies to/from the E trying to constraints, we used some free and some paid sound and match required d. For estimation of D, it queries passed graphical assets of the pixel-art style (Fig. 4). difficulty matrix M. If particular element M[E, R] is not The combat. Every time the party enters a new room, present within M, it uses similarity functions SE , SR to pick i.e., opens the door, an encounter is generated inside given the most similar element of M. For practical real-time con- the difficulty assigned by the designer (us) to the room. straints, the algorithm does not perform more then itermax Then, the party enters, the door closes behind it, and the iterations before terminating. party has to fight spawned enemies until one of the sides Figure 5: Visualization of initial values of difficulty matrix of the game. The x-axis is the party ordered according to their estimated strength, the y-axis represents enemies or- dered according to their estimated strength. Colors depict different difficulty categories: green is [0;0.5), yellow is [0.5;1.5), red is [1.5;2.5), black is [2.5;3]. initial estimation of encounter difficulties. Our only hy- pothesis therefore was that the algorithm would be able to Figure 4: Cropped game screenshot with labels overlay adapt initial difficulty matrix to the ability of players, i.e., depicting important game elements. it would provide better difficulty estimates. is eliminated. The combat happens in real-time, though 6.2 Design the player is able to pause it anytime to analyze the sit- Each participant was given the build of the game to play. uation and reconsider their heroes’ actions. Enemies are The data about encounters was collected by the game itself driven by a simple reactive AI, that chooses their attack and stored within online database. The game consist of according to a script associated with the enemy archetype. three levels. Heroes also attack automatically if player does not inter- 1. Tutorial level. A short and easy level with 4 combat vene, i.e., orders them to use some skill, changes their tar- encounters, which were fixed and created by us. The level get or moves them around the room. existed for two reasons. First, it was designed to teach the The difficulty range. As the party consist of 3 heroes, player the basic mechanics of the game. Second, while we modified the range of difficulty to be [0;3]; a difficulty the algorithm was not generating enemies in this level, it was then the summed portions of health points lost during was still estimating the difficulty of these static encounters the encounter. and was updating the matrix. Therefore, after the tutorial Difficulty matrix. To generate the initial sparse matrix, level the matrix could have matched the player skill more we generated 83572 scenarios, in each of which a random closely. party was fighting a random group of monsters. In these 2. Static levels. Two levels with encounters fixed and scenarios the party was controlled by a very simple script created by us. Unlike the tutorial level, the difficulty ma- doing random actions. This proved to be a better approx- trix was not being modified during this phase. imation of player behavior than a script making tactical 3. Generated levels. Two levels with encounters being decisions, which was too strong and produced estimations generated according to the matrix. The matrix got updated suitable for a skillful player rather then a beginner. The after every combat encounter. results of these scenarios were recorded and used as the Participants were split into two even groups A and B initial matrix to be modified at runtime. We set the learn- randomly. Group A was called static-first as their mem- ing rate α = 0.75 for UpdateMatrix method (Fig. 2). bers played the game in the sequence 1-2-3. Group B was Visualization of the initial difficulty matrix is provided called generated-first as their members played the game in Fig. 5. in the sequence 1-3-2. The difference wrt. to difficulty estimation was that for static-first group, there was a gap 6 Experiment in the matrix adaptation between levels 1 and 3. Here we were interested to see if there is going to be difference be- To test our combat encounter generator we conducted a tween groups as participants of the static-group should be pilot experiment with human subjects playing the game. more experienced when they enter the third level. 6.3 Data Collected 6.1 Hypothesis We collected both subjective categorical data from partic- As there is no prior work to compare our results to, we ipants (both quantitative and qualitative) and objective nu- designed the experiment to be exploratory only. We were merical data concerning combat encounters, collected au- interested to see if the algorithm can adapt initial estima- tomatically by the game. Originally, we were also inter- tion of combat difficulties to the player and improve the ested how would the players rate the game and collected we measured errors of difficulty prediction as the differ- Table 1: Participants profile. ence between predicted difficulty and observed difficulty Group Size Male Female (negative value means the encounter was more difficulty A 5 4 1 then it should have been). Data for groups A and B re- B 6 5 1 spectively are summarized by Fig. 6 and Fig. 7. data on flow via the Flow Short Scale [16, 17] calibrated questionnaire [18], however these data was not conclusive. Regarding the encounters, for each encounter in levels 1 and 3, the game recorded the enemies spawned, esti- mated difficulty by the initial version of the matrix and the current version of matrix and the result of the encounter, which is the final observed difficulty. We also generated a visualization of the difficulty matrix state after every ma- trix update for every participant. 6.4 Participants The experiment was carried out during COVID-19 out- break in May, 2020. Therefore, all experiments happened online in an uncontrolled environment, typically partici- pants’ homes. We had 22 participants out of which a total of 11 participants were not able to finish the level 3 (ei- ther because of a bug in a game, lack of time or not being able to understand all game mechanics) and thus we de- Figure 6: Difficulty prediction errors for group A; left part cided to discard their data from the analysis. In total, we depicts value for adapted matrix and the right part for the analysed data from 11 participants (Table 1); all but two initial state of the matrix. participants reported they play games more than 2 hours per week (majority more then 10), but these two reported that they played games regularly in the past. We consider both groups to consist of experienced game players only. Participants were split into groups randomly, they were of similar age (20-25). 6.5 Results and Discussions Even though all the participants finished level 3, they played a different amount of combat encounters. This was the result of the difficulty we preset for some rooms, where some participants have their parties eliminated and were thus forced to replay the level. Together, we assembled 260 combat encounter observa- tions. The first 11 observations are irrelevant as they come from the first combat in level 1 where the matrix has been in the initial setting. In total, we analyse 249 combat en- counters for which we can observe the differences between the adapted matrix (referred to as ADAPTED in graphs) and its initial version (referred to as INIT in graphs). We Figure 7: Difficulty prediction errors for group B; left part mainly analysed with estimation errors, i.e., difference be- depicts value for adapted matrix and the right part for the tween difficulty prediction of a matrix and the real diffi- initial state of the matrix. culty observed within the game . To visualize the data, we are using R version 4.0.5 For both groups, mean values between ADAPTED and together with the package ggstatsplot v0.8.0 and its INIT differ. Interestingly, mean of INIT matrix errors for ggstatsplot for group comparisons. group A is higher then in group B. This is probably the First, we were interested to see whether the algorithm effect of "static-first" ordering in group A. Majority of was able to provide better difficulty estimations. For that, data collected for group A happens only after participants played level 2. As the result, they are more skilled and INIT matrix is more off then for group B. Then we looked how errors of ADAPTED matrices be- tween groups looked like (Fig. 8). Figure 9: Difficulty prediction differences between ADAPTED and INIT matrices for both groups respec- tively. Positive number means the INIT matrix pro- vided better prediction whereas negative number means ADAPTED matrix provided better prediction. Figure 8: Difficulty prediction errors of ADAPTED matri- ces between groups. Table 2: Errors by participant. Columns: participant group Interestingly they are not much different, which is a bit and ID, number of encounters observed, mean and std. surprising. Given the previous observation, that INIT ma- dev. of ADAPTED matrix errors, mean and std. dev. of trix has a larger error for group A, we expected the same INIT matrix errors, how many times was ADAPTED ma- would be true for ADAPTED matrices between groups A trix prediction better then INIT matrix prediction. and B. However, that’s not the case. It seems that missing Group, #Enc Err. Err. INIT Err. observation from level 2 in group A did not influence the ID ADAPTED A