MCTS pruning in Turn-Based Strategy Games Yu-Jhen Hsu, Diego Perez-Liebana School of Electronic Engineering and Computer Science Queen Mary University of London, UK Abstract branching factors drop the performance of search algorithms (Justesen et al. 2017; Baier and Cowling 2018). The diffi- Large action spaces is one of the most problematic aspects of culty of developing an evaluation function makes the situa- turn-based strategy games for all types of AI methods. Some of the state-of-the-art algorithms such as Online Evolutionary tion even worse (Justesen et al. 2017; Browne et al. 2012), as Planning and Evolutionary Monte Carlo Tree Search (MCTS) the search algorithms require them to guide decision mak- have tried to deal with this problem, but they required a fixed ing. Although Monte Carlo Tree Search (MCTS) can han- number of actions in each turn. In general strategy games, this dle large branching factors, it fails when branching factors assumption can’t be held, as the number of actions that can be reach the sizes mentioned above (Baier and Cowling 2018; executed in a turn is flexible and will vary during the game. Chaslot et al. 2008). In the default tree policy of MCTS, all This paper studies pruning techniques and the insertion of do- nodes are required to be visited before exploitation happens, main knowledge to deal with high branching factors in a new making non or rarely visited less informative for making a turn-based strategy game: Tribes. The experiments show that, good decision (Gelly and Wang 2006). with the help of these techniques, MCTS can increase its per- formance and outperform the rule-based agents and Rolling New search algorithms have been developed to tackle Horizon Evolutionary Algorithms. Moreover, some insights strategy games, such as Online Evolutionary Planning into the tree shape and the behaviour of MCTS with and with- (OEP) (Justesen et al. 2017) and Evolutionary MCTS out pruning techniques are provided. (EvoMCTS) (Baier and Cowling 2018), obtaining decent performance in HeroAIcamdy. Instead of using a single ac- tion as genomes or nodes, they use a sequence of actions as 1 Introduction the representation. The difference between OEP and EvoM- Turn-based multi-action strategy games raise the interest CTS is the way to obtain an action set over the turn. OEP in the AI domain due to, among other things, their huge uses crossover and mutation operators to choose the best ac- branching factors. These kinds of games contain a consid- tion set. EvoMCTS combines MCTS and an evolutionary al- erable number of actions within a turn, which the player ex- gorithm to mutate action sets to guide search in a tree struc- ecutes in sequence. The order of actions is important as pre- ture. These two algorithms choose a fixed number of actions vious actions influence latter ones. The branching factor can for a turn, and each action is assigned to a specific unit. This significantly increase up to billions next states. Compared to assumption makes it hard to adapt for a game like Tribes Go and Chess, the branching factor is much higher in turn- (or most strategy games) for three reasons. First, the num- based strategy games (Perez et al. 2020). Taking Bot Bowl ber of actions might change after executing an action during (Justesen et al. 2019) as an example, in which the goal is to the same turn. Secondly, the branching factor generally in- manage a football team where each unit can execute several creases during the game. Lastly, instead of managing units, actions, the branching factor can be as high as 1051 in a given in Tribes agents require to manage not only unit actions but turn. The branching factor for Tribes (Perez et al. 2020) and also cities and general faction actions. These factors increase HeroAIcamdy (Justesen, Mahlmann, and Togelius 2016) are the difficulty to estimate the length of a fixed size genome. 1031 and 108 , lower than Bot Bowl but still much higher Using a single action rather than a complete turn as a rep- than Go and Chess. Tribes is a strategy game that introduces resentation for MCTS is one possible solution. It decreases extra management complexities, as it requires controlling the branching factor and makes the tree deeper (Baier and not only the units but also cities, tech trees and resources. Cowling 2018). However, vanilla MCTS without improve- Some recent progress has been made by analysing how huge ment fails as the tree is still shallow (Justesen et al. 2017; branching factors impact the performance of agents. High Baier and Cowling 2018). Numerous enhancements for Copyright c 2020for this paper by its authors. Use permitted un- MCTS have been proposed to deal with this issue. One pos- der Creative Commons License Attribution 4.0 International (CC sible solution focuses on pruning, reducing the branching BY 4.0). factors by removing some weak actions based on evaluation functions or domain knowledge that increases the converge 2.2 Pruning techniques for MCTS speed to find good action. Pruning is another technique used to deal with huge branch- This paper aims to investigate classic pruning techniques ing factors. The benefits of pruning are that poor actions can and analysing the tree structure and behaviours of MCTS be ignored, focusing more on promising actions, resulting in for Tribes. This paper compares the performance of MCTS a much deeper tree and potentially a quicker convergence. with pruning with the original MCTS agent, Rolling Hori- Moreover, it can be used without heuristic functions and can zon Evolutionary Algorithms (RHEA) agent and rule-based be applied in any domains (Browne et al. 2012). Some prun- agent which are agents that outperformed MCTS in the orig- ing also uses domain knowledge to remove some trap states inal Tribes paper (Perez et al. 2020). Moreover, the different (Ramanujan, Sabharwal, and Selman 2010), which are states MCTS pruning techniques are analysed by their tree shape which have a high reward but are leading to loss. in terms of the influence of turn and branching factor. Pruning techniques can be divided into two categories: hard and soft pruning. The hard version prunes the tree by 2 Background and related work eliminating actions with the lowest evaluated score. Soft pruning prunes actions after certain iterations, but the elimi- 2.1 Monte Carlo Tree Search (MCTS) nated actions will be chosen in a later iteration. The benefit of soft pruning is alleviating the risk of removing the good MCTS (Browne et al. 2012) is a tree search algorithm that actions (Browne et al. 2012) while hard pruning allows the explores the most promising parts of the state space by bal- tree to go deeper (Justesen et al. 2017). Progressive widen- ancing exploitation and exploration, and it has been success- ing (PW) (Coulom 2007) or Progressive Unpruning (Chaslot fully applied to many games. The tree, being asymmetric, et al. 2008) is an example of soft pruning, showing big suc- contains nodes representing the state and edges representing cess in the game Go. N nodes are pruned based on evalu- the actions. There are four main steps in the default MCTS: ation functions after the iteration number exceeds a thresh- selection, expansion, simulation and back-propagation. In old. After several simulations, nodes are unpruned and can the selection step, the tree policy (Upper Confidence Bounds be chosen in the selection step. In this way, all actions will - UCB1 (Auer, Cesa-Bianchi, and Fischer 2002) - in the be visited, given enough budget. This technique provides a most used version) is applied to guide the search from the similar effect to First-Play urgency, encouraging early ex- root until finding a node with non-visited children. An action ploitation and allowing the tree to go deeper. While pruning chosen at random is added in the expansion step followed techniques can be used without heuristics, the performance by the simulation step, performing a random rollout (Monte of progressive widening depends on the heuristic function. Carlo simulation) until reaching a terminal state. While it is (Teytaud and Teytaud 2009) show that progressive widening one of the main steps, some recent studies (Perez et al. 2020; without the help of heuristic has a little impact on improving Baier and Cowling 2018) have shown that skipping the roll- the strength of the agents in the game Havannah. out and directly evaluating the expanded node’s state can provide better results. The final step, back-propagation, eval- 2.3 Rolling Horizon Evolutionary Algorithms uates the terminal state and then updates all nodes that are (RHEA) visited in this iteration. MCTS ends when it runs out of bud- RHEA is an evolutionary algorithm that achieves good re- get and returns the most visited child of the root. sults in several real-time games(Perez et al. 2013). RHEA There are different enhancements used to improve the per- learns online, during the game, in a different way to other formance of MCTS under large branching factors. These evolutionary algorithms that learn offline. It first generates methods either reduce the branching factor by domain random individuals, each one representing a sequence of ac- knowledge or handle the requirement of visiting all nodes tions. It then evaluates the final state arrived after execut- before exploitation. First-Play urgency (Gelly and Wang ing, from the current state, all the actions in the individ- 2006) encourages early exploitation by assigning the ini- ual sequentially. Evolutionary operators such as selection, tial value to non-visited nodes. It reduces the need for vis- crossover and mutation are used to generate new individuals iting every node before exploitation under UCT and allows based on old ones. The first action in the individual with the the tree to go deeper. Move groups (Childs, Brodeur, and highest value is returned once the budget is exhausted. Kocsis 2008) is another technique for reducing the branch- ing factor and increasing the performance of MCTS in Go. 3 Methods Instead of choosing a single action, a move group (which contains similar moves with a highly-correlated expected 3.1 Tribes value) is picked and evaluated. Rapid Action Value Esti- Tribes (Perez et al. 2020) is a re-implementation of the game mation (Gelly and Silver 2007) updates the value for all of The Battle of Polytopia (Midjiwan AB 2016), a popular the nodes with the same action and state regardless of its award-winning turn-based strategy game, which can be seen position in the tree. A script approach such as Hierarchi- as a simplified version of Sid Meier’s Civilization. Fig. 1 cal Portfolio Search (Churchill and Buro 2015) and Port- shows the interface of Tribes. folio Greedy Search (Churchill and Buro 2013) specialized The game takes place in N ×N tiles. Each tile contains the in dealing with huge branching factors in real-time strat- terrain type (plain, mountain, shallow water or deep water). egy games, searches over the small amount of hand-coded Each tile can only have one resource, building and unit at the scripts rather than the whole action set. same time. Villages, resource and ruins will be generated in combination of actions per turn. Most of them can attack and/or move in a turn, but others can do multiple consecu- tive actions. Once a unit defeats three other units, the former is upgraded to a veteran, having much higher HP. Units can capture the enemy’s city or neutral villages after staying in that city/village for more than a turn. The Mind Bender is a special unit which can heal friendly units and convert en- emies to their faction. More details about the game can be found in the original paper (Perez et al. 2020). There are three kinds of actions in Tribes, being unit (the only set without requiring stars to perform actions), city and tribe. The unit actions include attack, move, cap- ture, convert, disband, examine, upgrade and become a vet- Figure 1: The user interference of Tribes. eran. Some of these actions are specialized for the particular unit type, or can only be performed when its corresponding technique is researched. The city actions include construct- tiles at the beginning of the game. Ruins contain randomized ing and destroying buildings, burning, growing and clear- bonuses (special unit, technology, resources, etc.) and can be ing forests, resource gathering, spawning a unit and level- explored by units. There are two game modes, being Capital ling up. The tribes action includes research, build a road in and Score mode. The goal in the Score mode is to maximize a non-enemy tile and end the player’s turn. Roads speed up the score or take all the enemies’ capitals within 30 turns. In- the units’ movement and can be used by any player. Some stead of reaching as much score as possible within 30 turns, actions decrease the action space for a turn. For example, the Capital mode only requires to capture the enemy’s capi- the build action consumes resources and results in lacking tal, or initial city. The score comes from a variety of actions enough stars for executing subsequent actions in that turn. such as exploring, resource gathering, capturing villages and The other actions, such as researching unlocking features, cites, and constructing determined buildings. At the begin- increases the action space for a turn. This situation makes ning of the game, each player controls one of the four avail- Tribes a complex environment for decision making, with able Tribes (Xin Xi, Imperius, Bardur and Oumaji). They huge branching factors when a player owns many units and have a starting unit and a researched technique, being differ- cities. A strong playing strategy must balance actions for ent for each tribe. Each player starts with five stars, which technology, economy management and combat, as well as are the resource currency. Each player executes as many ac- a proficient order of actions per turn. tions as possible within a turn until they decide to end it, or no more actions are available. 3.2 Baseline Approaches There are three crucial aspects of the game, technology, We use 3 algorithms from the original framework (Perez et economy management and combat, which determine the al. 2020) as baseline approaches: MCTS, RHEA and rule- playing strategy. The critical element in economy manage- based agents. It is interesting to compare to RHEA and rule- ment is handling stars, which is used for building, gather- based agents as they outperform MCTS in the original paper. ing resources, spawning units and researching technologies. Rule Based (RB): The simple rule-based agent is the only Stars are produced by cities on each turn. Buildings and re- agent that does not rely on the forward model in this pa- sources provide population to the city. When the population per. The forward model allows the agent to sample a possi- exceeds the level of the city, the city is upgraded, and a ble future state given a particular action in the current state. bonus (extra production, units, city border growth, etc.) is In (Perez et al. 2020), RB has the second-highest winning unlocked. For city levels 5 and higher, the player can se- rate and in particular beats MCTS in 56.2% of the games. lect a bonus between the strongest game unit (super unit The agent only uses the information of the current state, or Giant) or building a park (extra score). The higher the evaluating each action separately and returning the one with city level is, the higher the number of stars it produces per the highest value. If there is more than one action with the turn. The building can only be built in their own territory, highest value, it picks up randomly to break the tie. The eval- which is formed by the 3 × 3 tiles around the city. The types uation is hand-crafted and relies on domain knowledge. The of units and buildings that can be spawned and constructed main weakness of this agent is that it considers every action are limited at the beginning of the game, and more types independently and does not consider the action’s effect in the are unlocked by researching the different 24 technologies game state. The insights of this agent are that it disables the available. Two types of combat units can be spawned, be- destroy and disband action and focus on capturing, examin- ing melee (Warrior, Rider, Defender, Swordsman, Knight) ing ruins and turning units into veteran ones when possible. and ranged (Archer, Catapult). Each unit has different attack For more details about rule-based agents can refer to (Perez power, range and health points (HP). They can also embark et al. 2020). in port buildings and become a boat unit. Boats have three Rolling Horizon Evolutionary Algorithms (RHEA): levels: boat, ship and battleship. Attack power, movement RHEA, described in Section 2.3, achieved the best perfor- range and HP are increased when upgrading the boat unit. mance in (Perez et al. 2020) with a 63% winning rate against Moreover, a different type of units can perform a distinct MCTS. In the experiments of this paper, the RHEA agent uses the same parameters: population size of 1 and individ- there are two versions of MCTS with hard pruning: with and ual length of 20. Similar to the RB agent, the disbanding without move pruning. The reason for pruning the move is action and destroying actions are removed. that it increases the action space significantly: every unit has Monte Carlo Tree Search (MCTS): MCTS, described in multiple move actions based on its move distance and the Section 2.1, has below 50% wining rate agents RHEA and road. When it comes to the later game, a player tends to RB in (Perez et al. 2020). This MCTS variant prioritizes the have many units, increasing the action space significantly. root action by picking up different subsets of actions from The move action is pruned by the equation 1. It firstly ex- city, unit and tribe actions. This prioritization is only applied tracts the move action from all action sets. The move actions to the root, not the rest of the nodes. The maximum depth of whose destination is not a ruins or a city tile are extracted the tree is 20. UCB1 is chosen √ as the tree policy, with an and picked by the equation 1 with α = 1 and T = 5. The exploration constant set to 2. No rollouts are performed pruned move action will then be removed and never be con- in the simulation step, instead of evaluating the state being sidered in the action set of the current node. Although the rolled based on default policy until the tree reaches certain pruned actions are removed in the current node, they might depth or terminal state, the state of the current node is eval- appear in the following nodes meaning they still will be con- uated, and the value is updated in the back-propagation step. sidered in a later step. To differentiate it from the other MCTS versions proposed in this paper, we refer to this as Default MCTS. 3.4 MCTS with progressive widening Heuristic: A heuristic function is used to evaluate a game This variant of MCTS is a modification of MCTS with hard state. It compares the difference between two states and eval- pruning. Instead of permanently removing the pruned nodes, uates them based on seven features. The features in (Perez et the algorithm unprunes the nodes in later iterations. This al. 2020) include the differences between production, num- process allows the tree to search for every node and alleviate ber of technologies, score, number of units, number of kills the risk of pruning the best action. It is done by increas- and the total level of cities. Different features have different ing the number of remaining nodes if the iteration number weights, set to 5, 4, 0.1, 4, 2, 3 and 2 respectively, by trial is over the value given by Equation 2 (Chaslot et al. 2008). and error. Two features added in this paper, stressing the im- When the number of iterations is greater than Υ, a pruned portance of building and occupying the enemy’s cities. Ev- action picked uniformly at randomly is unpruned. ery new building will provide 5 points and each unit in the enemy’s city will add 10 points to the score of the state. Υ = αβ n−n init (2) α and β control the unpruning ratio, and they are set to 3.3 MCTS with hard pruning 20 and 1.7 respectively. The higher value β is, the higher MCTS with hard pruning modifies the tree policy of MCTS the number of iterations is needed for unpruning, approxi- to reduce the branching factor. The tree policy not only mating the behaviour of hard-pruning. n is the size of the guides search, it also prunes the node’s children if they are unpruned set of the current node and n init is the initial visited from the parent more than the 20 times. The num- number of unpruned nodes, set to 5. As for the hard pruning ber of remaining nodes, being the number of children kept case, two variants of MCTS with progressive widening have after pruning, is based on the branching factor and the limit been tested: with and without move pruning. for minimum nodes, shown in Equation 1. It will prune the children of nodes until the number of children matches the 4 Experiments and results number of remaining nodes by evaluating its state based This section provides the experimental setup for testing on heuristic described above. The children with the lowest MCTS with pruning followed by the results of our tests. Fi- value are pruned and never visited in the later iteration. nally, an analysis of the MCTS tree is provided. remainingN odes = max (α log n, T ) (1) 4.1 Experimental Setup α controls the speed of pruning when the number of ac- Both versions (hard pruning and progressive widening) of tions grows. This parameter plays a vital role in deciding the MCTS with pruning (i.e. with and without move pruning) number of the remaining nodes when the branching factor were pitched against the rule-based, the default MCTS and becomes huge. The higher it is, the more children are kept. RHEA agents, as included in the framework. For all search T is the minimum number of actions that need to be kept agents, the stopping condition is set to 2000 usages of the after pruning. If there is any action set smaller than T , prun- forward model’s next method. ing is not applied. The reason for using logarithm to prune Each agent pairing plays 5000 games (25 seeds with 200 the tree is to reduce most of the actions when the actions repetitions). The seeds are the same as in the original paper set becomes extremely large. The parameters are decided by (Perez et al. 2020) and they are used to generate different running the experiment under a different setting. The setting levels, all of size 11 × 11 tiles. Because levels might not with the highest winning rate is used. be balanced for both contestants, players swap starting po- Besides pruning by the tree, the destroy action is excluded sitions within the 40 repetitions of every seed. Each game as it does not provide any rewards and results in a reduced is independent, meaning that no information is carried from score. Furthermore, MCTS with hard pruning has another one game to the next. The game mode is Capital with full ob- parameter to decide to prune the move actions or not. Thus servability, thus the winning condition is set to take over the Table 1: Averaged winning rate for each version of MCTS vs 3 baseline agents over 5000 games. Hard Pruning Progressive Widening RB Default MCTS RHEA RB Default MCTS RHEA Move Pruning 56.9% 63.3% 57.8% 57.8% 62% 57.7% No Move Pruning 54.7% 61.2% 54.5% 55.8% 60.1% 57.7% enemy’s capital city. In order to have finite games, a winner former, and above 17 for the latter. Results also show that is declared after 50 turns, corresponding to the player with MCTS with progressive widening has higher number of re- the highest score at that point. searched technologies, and move pruning produces higher final scores, cities and production. This statistic suggests 4.2 Game Results that the modified MCTS is more focused on occupying cities Table 1 shows the results of both versions of MCTS com- rather than trying to maximize the final scores, which may be peting with three baseline approaches. The first row and first a consequence of the new features in the heuristic function column indicates the version (hard pruning or progressive that gives extra value for units in the enemy’s cities. In com- widening) and with or without move pruning respectively. parison to RB and RHEA, the score and researched tech- Default MCTS shows 43.8% and 37% winning rates when nologies percentage of MCTS sit between those two agents, playing against the RB and RHEA agents, respectively (not while the number of owned cities and final production is in the table). With the help of pruning techniques, all ver- smaller than RB and RHEA. The statistics of the different sions of MCTS have more than 50% winning rate playing versions of MCTS explored in this work, in relation to RB against the baseline approaches. and RHEA, are comparable with those of default MCTS. With the help of hard pruning, MCTS increases its win rate approximately 8% and 12% compared to the original 4.3 Tree analysis MCTS when playing against RB and RHEA, respectively. This section shows which action groups and actions are exe- Move pruning and progressive widening further increase the cuted more often, an analysis of the tree depth and the fully performance of MCTS with hard pruning. MCTS with hard expanded rate of the recommend action node in terms of pruning and move pruning shows the highest winning rate different action spaces and turns in the game. The fully ex- when competing against the original MCTS and RHEA. As panded rate of the recommended action node is measured can be seen, pruning moves increases the performance of between 0 and 1. If the value is 1, it means the recommended MCTS in relation to only applying progressive widening or action node has visited all of its possible children (exclud- hard pruning. Although using hard pruning or progressive ing the pruned nodes), from its current state. A value of 0 widening is clearly better than using no pruning at all, the indicates that no children have been explored1 . difference between these two methods is minimal. The win- During the game, the unit action group is the most fre- ning rate is higher for progressive widening when playing quently chosen: 68% of the time. The city and tribe action against RB, but lower than hard pruning when the oppo- groups are only chosen 20.5% and 11.4% of the time re- nent is default MCTS and RHEA. The difference however spectively. This difference are a consequence of unit actions is likely not to be significant, and different parameters for taking a large proportion of the action space. The most fre- Equation (?) may alter the unpruning rate and provide a bet- quently executed action is move. The move action has the ter value estimate of the different moves. Overall, although largest percentage (58.9%) among all actions, also explain- hard pruning, progressive widening, and domain knowledge ing why the unit action group has the largest proportion. The (move pruning) can improve the performance, using domain move action does not provide a significant reward per se, but knowledge has clearer effects in the winning rate. it can lead to future reward if the unit moves to ruins or en- Table 2 shows the statistics of the different versions of emy cities. Applying the move pruning does not change the MCTS over 15000 games (aggregated for each 5000 pair- proportion between the action groups. This may be caused ings against the baseline approaches). It shows the statis- by move actions being taken towards the end of the turn. tics of average winning rate over three baseline approach, Figures 2 and 3 show the influence of the branching fac- final score, researched technologies percentage, number of tor in the depth of MCTS and fully expanded rate of the owned cities at the end game and final production. The val- recommend action node. The original MCTS is unable to go ues obtained for these statistics for the default MCTS in deeper than 2 levels and visit all the possible actions from the (Perez et al. 2020) (not in this table) are 9966.55, 85.27%, current state when the branching factor is over 50. With the 2.23 and 16.96, respectively. The score and researched tech- help of pruning techniques, this problem happens only when nologies percentage of every version of MCTS are consid- the size of the action space grows over 180. This is three erably smaller than the original MCTS and have a similar times larger than in the original MCTS, providing better ac- value to the ones of the RB agent (8076.32 for scores and tion value estimates in early game turns (when the action 71.14% for researched technologies). However, the number of cities owned at the game end and final production val- 1 Note that these are the children of the node of the recom- ues are slightly larger, now between 2.24 and 2.26 for the mended action, not the children of the root. Table 2: The aggregated statistics for each version of MCTS over 15000 games Hard Pruning Progressive widening Win Rate Score Techs Cities Prod. Win Rate Score Techs Cities Prod. Move 59.3% 8500.06 77.88% 2.26 17.38 59.1% 8564.36 79.08% 2.25 17.34 Pruning No Move 56.8% 8458.52 77.46% 2.24 17.16 57.8% 8446.04 78.17% 2.24 17.07 Pruning Figure 2: MCTS depth under different action space sizes. Figure 4: The depth of MCTS in different turns. Figure 5: The fully explored rate for the recommended ac- Figure 3: The fully explored rate for the recommended ac- tion in MCTS under different turn tion in MCTS under different action space sizes. ically when the action space size reaches 50. Figures 2 and space tends to be smaller). The depth of the tree sharply de- 3 show the effect of unpruning in progressive widening and creases when the branching factor grows. As expected, due how the search tree is explored differently in each case. to the unpruning mechanism, progressive widening explores Figures 4 and 5 show the influence of the game turn in more shallow trees than hard pruning, which is most notice- the depth of MCTS and the fully expanded rate of the rec- able when the action space size is between 100 and 150. ommended action node. Note that, in Tribes, the size of Compared to hard pruning, progressive widening is unable the action space tends to grow (particularly if the game is to fully explore the chosen action when the branching factor favourable to the player) with the number of turns. An anal- becomes over 100. Although hard pruning also faces this is- ysis of this can be found at (Perez et al. 2020). The depth sue, it happens later than progressive widening. The rate of of not-pruned MCTS drops to around 3 when the game turn exploration shown in Figure 3 shows a similar trend: hard reaches 30, and the fully explored rate of chosen action starts pruning is able to delay the decay of children explored fur- to drop in turn 10 and ends up with the 40% in turn 50. The ther than progressive widening, and both approaches show pruned versions of MCTS show a similar trend but dropping clearly that they allow the tree to visit a higher proportion of more slowly than the original MCTS. The depth of modi- actions from the root than when there is no pruning at all. fied MCTS is around 4 at the game end, and the fully ex- With no pruning, the rate of default MCTS decays dramat- plored rate of the chosen action is considerably higher than that of the original MCTS in turn 50. Hard pruning decreases Childs, B. E.; Brodeur, J. H.; and Kocsis, L. 2008. Trans- slower than progressive widening, and move pruning further positions and move groups in monte carlo tree search. In slows down the decrease. This shows the significant effects 2008 IEEE Symposium On Computational Intelligence and of move pruning in increasing the explored ratio of the cho- Games, 389–395. sen action in later turns. Churchill, D., and Buro, M. 2013. Portfolio greedy search and simulation for large-scale combat in starcraft. In 2013 5 Conclusion and future work IEEE Conference on Computational Inteligence in Games This paper focuses on exploring pruning techniques for (CIG), 1–8. IEEE. MCTS in a multi-action strategy game, Tribes, in which Churchill, D., and Buro, M. 2015. Hierarchical portfo- players must execute multiple actions in a turn while man- lio search: Prismata’s robust AI architecture for games with aging other game aspects such as economy, technology re- large search spaces. In Eleventh Artificial Intelligence and search and resource gathering. Moreover, the size of action Interactive Digital Entertainment Conference (AIIDE). space is not fixed and generally increases during the game, Coulom, R. 2007. Computing “elo ratings” of move patterns resulting in a high and difficult to manage branching factor. in the game of go. ICGA journal 30(4):198–208. MCTS can handle large branching factors but it strug- gles when the action space size reaches more than a hun- Gelly, S., and Silver, D. 2007. Combining online and offline dred (Baier and Cowling 2018; Chaslot et al. 2008), which knowledge in UCT. In Proceedings of the 24th international is actually the case in Tribes. This paper shows that using conference on Machine learning (ICML), 273–280. Associ- pruning techniques help MCTS deal with this issue. We ex- ation for Computing Machinery. plored progressive widening, hard pruning and the insertion Gelly, S., and Wang, Y. 2006. Exploration exploitation in of domain knowledge pruning for move actions. The inclu- go: Uct for monte-carlo go. sion of progressive widening and hard pruning improves the Justesen, N.; Mahlmann, T.; Risi, S.; and Togelius, J. 2017. performance of MCTS in a similar fashion, and results are Playing multiaction adversarial games: Online evolutionary even better when adding the move action pruning. planning versus tree search. IEEE Transactions on Games We propose different avenues of future work. One possi- 10(3):281–291. bility is to explore other pruning algorithms, like NaiveM- Justesen, N.; Uth, L. M.; Jakobsen, C.; Moore, P. D.; To- CTS (Ontanón 2017) or other types of heuristic prun- gelius, J.; and Risi, S. 2019. Blood bowl: A new board game ing (Sephton et al. 2014), or to dive deeper in the use of challenge and competition for AI. In 2019 IEEE Conference domain knowledge for pruning. This paper shows that de- on Games (CoG), 1–8. creasing the proportion of move actions increases the per- formance of the agent. However, several actions might able Justesen, N.; Mahlmann, T.; and Togelius, J. 2016. Online to benefit from using domain knowledge, such as Level Up evolution for multi-action adversarial games. In Applica- or the Build actions. A more general approach would be to tions of Evolutionary Computation, 590–603. Springer In- use machine or reinforcement learning techniques to learn ternational Publishing. which actions must be pruned. Another interesting possibil- Midjiwan AB. 2016. The Battle of Polytopia. ity is to explore variants of OEP or evolutionary MCTS that Ontanón, S. 2017. Combinatorial multi-armed bandits for deal with varying number of actions per turn. Finally, there real-time strategy games. Journal of Artificial Intelligence is abundant work on portfilio approaches for strategy games, Research 58:665–702. and exploring synergies between these and different pruning Perez, D.; Samothrakis, S.; Lucas, S.; and Rohlfshagen, P. techniques can lead to other ways of dealing with large ac- 2013. Rolling horizon evolution versus tree search for nav- tion space sizes. igation in single-player real-time games. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary References Computation, 351–358. Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite- Perez, D.; Hsu, Y.-J.; Emmanouilidis, S.; Khaleque, B. time analysis of the multiarmed bandit problem. Machine D. A.; and Gaina, R. D. 2020. Tribes: A new turn-based learning 47(2-3):235–256. strategy game for AI research. In Proceedings AIIDE’20. Baier, H., and Cowling, P. I. 2018. Evolutionary MCTS Ramanujan, R.; Sabharwal, A.; and Selman, B. 2010. On for multi-action adversarial games. In EEE Conference on adversarial search spaces and sampling-based planning. In Computational Intelligence and Games, 1–8. IEEE. Twentieth International Conference on Automated Planning Browne, C. B.; Powley, E.; Whitehouse, D.; Lucas, S. M.; and Scheduling. Cowling, P. I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Sephton, N.; Cowling, P. I.; Powley, E.; and Slaven, N. H. Samothrakis, S.; and Colton, S. 2012. A survey of monte 2014. Heuristic move pruning in monte carlo tree search for carlo tree search methods. IEEE Transactions on Computa- the strategic card game lords of war. In 2014 IEEE Confer- tional Intelligence and AI in games 4(1):1–43. ence on Computational Intelligence and Games, 1–7. IEEE. Chaslot, G. M. J.; Winands, M. H.; HERIK, H. J. V. D.; Uiterwijk, J. W.; and Bouzy, B. 2008. Progressive strategies Teytaud, F., and Teytaud, O. 2009. Creating an upper- for monte-carlo tree search. New Mathematics and Natural confidence-tree program for havannah. In Advances in Com- Computation (NMNC) 4(03):343–357. puter Games, 65–74. Springer.