=Paper=
{{Paper
|id=Vol-2862/paper27
|storemode=property
|title=MCTS Pruning in Turn-Based Strategy Games
|pdfUrl=https://ceur-ws.org/Vol-2862/paper27.pdf
|volume=Vol-2862
|authors=Yu-Jhen Hsu,Diego Perez-Liebana
|dblpUrl=https://dblp.org/rec/conf/aiide/HsuL20
}}
==MCTS Pruning in Turn-Based Strategy Games==
<pdf width="1500px">https://ceur-ws.org/Vol-2862/paper27.pdf</pdf>
<pre>
                              MCTS pruning in Turn-Based Strategy Games


                                              Yu-Jhen Hsu, Diego Perez-Liebana
                                       School of Electronic Engineering and Computer Science
                                               Queen Mary University of London, UK


                            Abstract                                branching factors drop the performance of search algorithms
                                                                    (Justesen et al. 2017; Baier and Cowling 2018). The diffi-
  Large action spaces is one of the most problematic aspects of     culty of developing an evaluation function makes the situa-
  turn-based strategy games for all types of AI methods. Some
  of the state-of-the-art algorithms such as Online Evolutionary
                                                                    tion even worse (Justesen et al. 2017; Browne et al. 2012), as
  Planning and Evolutionary Monte Carlo Tree Search (MCTS)          the search algorithms require them to guide decision mak-
  have tried to deal with this problem, but they required a fixed   ing. Although Monte Carlo Tree Search (MCTS) can han-
  number of actions in each turn. In general strategy games, this   dle large branching factors, it fails when branching factors
  assumption can’t be held, as the number of actions that can be    reach the sizes mentioned above (Baier and Cowling 2018;
  executed in a turn is flexible and will vary during the game.     Chaslot et al. 2008). In the default tree policy of MCTS, all
  This paper studies pruning techniques and the insertion of do-    nodes are required to be visited before exploitation happens,
  main knowledge to deal with high branching factors in a new       making non or rarely visited less informative for making a
  turn-based strategy game: Tribes. The experiments show that,      good decision (Gelly and Wang 2006).
  with the help of these techniques, MCTS can increase its per-
  formance and outperform the rule-based agents and Rolling            New search algorithms have been developed to tackle
  Horizon Evolutionary Algorithms. Moreover, some insights          strategy games, such as Online Evolutionary Planning
  into the tree shape and the behaviour of MCTS with and with-      (OEP) (Justesen et al. 2017) and Evolutionary MCTS
  out pruning techniques are provided.                              (EvoMCTS) (Baier and Cowling 2018), obtaining decent
                                                                    performance in HeroAIcamdy. Instead of using a single ac-
                                                                    tion as genomes or nodes, they use a sequence of actions as
                     1    Introduction                              the representation. The difference between OEP and EvoM-
Turn-based multi-action strategy games raise the interest           CTS is the way to obtain an action set over the turn. OEP
in the AI domain due to, among other things, their huge             uses crossover and mutation operators to choose the best ac-
branching factors. These kinds of games contain a consid-           tion set. EvoMCTS combines MCTS and an evolutionary al-
erable number of actions within a turn, which the player ex-        gorithm to mutate action sets to guide search in a tree struc-
ecutes in sequence. The order of actions is important as pre-       ture. These two algorithms choose a fixed number of actions
vious actions influence latter ones. The branching factor can       for a turn, and each action is assigned to a specific unit. This
significantly increase up to billions next states. Compared to      assumption makes it hard to adapt for a game like Tribes
Go and Chess, the branching factor is much higher in turn-          (or most strategy games) for three reasons. First, the num-
based strategy games (Perez et al. 2020). Taking Bot Bowl           ber of actions might change after executing an action during
(Justesen et al. 2019) as an example, in which the goal is to       the same turn. Secondly, the branching factor generally in-
manage a football team where each unit can execute several          creases during the game. Lastly, instead of managing units,
actions, the branching factor can be as high as 1051 in a given     in Tribes agents require to manage not only unit actions but
turn. The branching factor for Tribes (Perez et al. 2020) and       also cities and general faction actions. These factors increase
HeroAIcamdy (Justesen, Mahlmann, and Togelius 2016) are             the difficulty to estimate the length of a fixed size genome.
1031 and 108 , lower than Bot Bowl but still much higher
                                                                       Using a single action rather than a complete turn as a rep-
than Go and Chess. Tribes is a strategy game that introduces
                                                                    resentation for MCTS is one possible solution. It decreases
extra management complexities, as it requires controlling
                                                                    the branching factor and makes the tree deeper (Baier and
not only the units but also cities, tech trees and resources.
                                                                    Cowling 2018). However, vanilla MCTS without improve-
Some recent progress has been made by analysing how huge
                                                                    ment fails as the tree is still shallow (Justesen et al. 2017;
branching factors impact the performance of agents. High
                                                                    Baier and Cowling 2018). Numerous enhancements for
Copyright c 2020for this paper by its authors. Use permitted un-    MCTS have been proposed to deal with this issue. One pos-
der Creative Commons License Attribution 4.0 International (CC      sible solution focuses on pruning, reducing the branching
BY 4.0).                                                            factors by removing some weak actions based on evaluation
functions or domain knowledge that increases the converge         2.2   Pruning techniques for MCTS
speed to find good action.                                        Pruning is another technique used to deal with huge branch-
   This paper aims to investigate classic pruning techniques      ing factors. The benefits of pruning are that poor actions can
and analysing the tree structure and behaviours of MCTS           be ignored, focusing more on promising actions, resulting in
for Tribes. This paper compares the performance of MCTS           a much deeper tree and potentially a quicker convergence.
with pruning with the original MCTS agent, Rolling Hori-          Moreover, it can be used without heuristic functions and can
zon Evolutionary Algorithms (RHEA) agent and rule-based           be applied in any domains (Browne et al. 2012). Some prun-
agent which are agents that outperformed MCTS in the orig-        ing also uses domain knowledge to remove some trap states
inal Tribes paper (Perez et al. 2020). Moreover, the different    (Ramanujan, Sabharwal, and Selman 2010), which are states
MCTS pruning techniques are analysed by their tree shape          which have a high reward but are leading to loss.
in terms of the influence of turn and branching factor.              Pruning techniques can be divided into two categories:
                                                                  hard and soft pruning. The hard version prunes the tree by
        2    Background and related work                          eliminating actions with the lowest evaluated score. Soft
                                                                  pruning prunes actions after certain iterations, but the elimi-
2.1   Monte Carlo Tree Search (MCTS)                              nated actions will be chosen in a later iteration. The benefit
                                                                  of soft pruning is alleviating the risk of removing the good
MCTS (Browne et al. 2012) is a tree search algorithm that         actions (Browne et al. 2012) while hard pruning allows the
explores the most promising parts of the state space by bal-      tree to go deeper (Justesen et al. 2017). Progressive widen-
ancing exploitation and exploration, and it has been success-     ing (PW) (Coulom 2007) or Progressive Unpruning (Chaslot
fully applied to many games. The tree, being asymmetric,          et al. 2008) is an example of soft pruning, showing big suc-
contains nodes representing the state and edges representing      cess in the game Go. N nodes are pruned based on evalu-
the actions. There are four main steps in the default MCTS:       ation functions after the iteration number exceeds a thresh-
selection, expansion, simulation and back-propagation. In         old. After several simulations, nodes are unpruned and can
the selection step, the tree policy (Upper Confidence Bounds      be chosen in the selection step. In this way, all actions will
- UCB1 (Auer, Cesa-Bianchi, and Fischer 2002) - in the            be visited, given enough budget. This technique provides a
most used version) is applied to guide the search from the        similar effect to First-Play urgency, encouraging early ex-
root until finding a node with non-visited children. An action    ploitation and allowing the tree to go deeper. While pruning
chosen at random is added in the expansion step followed          techniques can be used without heuristics, the performance
by the simulation step, performing a random rollout (Monte        of progressive widening depends on the heuristic function.
Carlo simulation) until reaching a terminal state. While it is    (Teytaud and Teytaud 2009) show that progressive widening
one of the main steps, some recent studies (Perez et al. 2020;    without the help of heuristic has a little impact on improving
Baier and Cowling 2018) have shown that skipping the roll-        the strength of the agents in the game Havannah.
out and directly evaluating the expanded node’s state can
provide better results. The final step, back-propagation, eval-   2.3   Rolling Horizon Evolutionary Algorithms
uates the terminal state and then updates all nodes that are            (RHEA)
visited in this iteration. MCTS ends when it runs out of bud-     RHEA is an evolutionary algorithm that achieves good re-
get and returns the most visited child of the root.               sults in several real-time games(Perez et al. 2013). RHEA
   There are different enhancements used to improve the per-      learns online, during the game, in a different way to other
formance of MCTS under large branching factors. These             evolutionary algorithms that learn offline. It first generates
methods either reduce the branching factor by domain              random individuals, each one representing a sequence of ac-
knowledge or handle the requirement of visiting all nodes         tions. It then evaluates the final state arrived after execut-
before exploitation. First-Play urgency (Gelly and Wang           ing, from the current state, all the actions in the individ-
2006) encourages early exploitation by assigning the ini-         ual sequentially. Evolutionary operators such as selection,
tial value to non-visited nodes. It reduces the need for vis-     crossover and mutation are used to generate new individuals
iting every node before exploitation under UCT and allows         based on old ones. The first action in the individual with the
the tree to go deeper. Move groups (Childs, Brodeur, and          highest value is returned once the budget is exhausted.
Kocsis 2008) is another technique for reducing the branch-
ing factor and increasing the performance of MCTS in Go.                                 3   Methods
Instead of choosing a single action, a move group (which
contains similar moves with a highly-correlated expected          3.1   Tribes
value) is picked and evaluated. Rapid Action Value Esti-          Tribes (Perez et al. 2020) is a re-implementation of the game
mation (Gelly and Silver 2007) updates the value for all of       The Battle of Polytopia (Midjiwan AB 2016), a popular
the nodes with the same action and state regardless of its        award-winning turn-based strategy game, which can be seen
position in the tree. A script approach such as Hierarchi-        as a simplified version of Sid Meier’s Civilization. Fig. 1
cal Portfolio Search (Churchill and Buro 2015) and Port-          shows the interface of Tribes.
folio Greedy Search (Churchill and Buro 2013) specialized            The game takes place in N ×N tiles. Each tile contains the
in dealing with huge branching factors in real-time strat-        terrain type (plain, mountain, shallow water or deep water).
egy games, searches over the small amount of hand-coded           Each tile can only have one resource, building and unit at the
scripts rather than the whole action set.                         same time. Villages, resource and ruins will be generated in
                                                                   combination of actions per turn. Most of them can attack
                                                                   and/or move in a turn, but others can do multiple consecu-
                                                                   tive actions. Once a unit defeats three other units, the former
                                                                   is upgraded to a veteran, having much higher HP. Units can
                                                                   capture the enemy’s city or neutral villages after staying in
                                                                   that city/village for more than a turn. The Mind Bender is
                                                                   a special unit which can heal friendly units and convert en-
                                                                   emies to their faction. More details about the game can be
                                                                   found in the original paper (Perez et al. 2020).
                                                                      There are three kinds of actions in Tribes, being unit
                                                                   (the only set without requiring stars to perform actions),
                                                                   city and tribe. The unit actions include attack, move, cap-
                                                                   ture, convert, disband, examine, upgrade and become a vet-
         Figure 1: The user interference of Tribes.                eran. Some of these actions are specialized for the particular
                                                                   unit type, or can only be performed when its corresponding
                                                                   technique is researched. The city actions include construct-
tiles at the beginning of the game. Ruins contain randomized       ing and destroying buildings, burning, growing and clear-
bonuses (special unit, technology, resources, etc.) and can be     ing forests, resource gathering, spawning a unit and level-
explored by units. There are two game modes, being Capital         ling up. The tribes action includes research, build a road in
and Score mode. The goal in the Score mode is to maximize          a non-enemy tile and end the player’s turn. Roads speed up
the score or take all the enemies’ capitals within 30 turns. In-   the units’ movement and can be used by any player. Some
stead of reaching as much score as possible within 30 turns,       actions decrease the action space for a turn. For example,
the Capital mode only requires to capture the enemy’s capi-        the build action consumes resources and results in lacking
tal, or initial city. The score comes from a variety of actions    enough stars for executing subsequent actions in that turn.
such as exploring, resource gathering, capturing villages and      The other actions, such as researching unlocking features,
cites, and constructing determined buildings. At the begin-        increases the action space for a turn. This situation makes
ning of the game, each player controls one of the four avail-      Tribes a complex environment for decision making, with
able Tribes (Xin Xi, Imperius, Bardur and Oumaji). They            huge branching factors when a player owns many units and
have a starting unit and a researched technique, being differ-     cities. A strong playing strategy must balance actions for
ent for each tribe. Each player starts with five stars, which      technology, economy management and combat, as well as
are the resource currency. Each player executes as many ac-        a proficient order of actions per turn.
tions as possible within a turn until they decide to end it, or
no more actions are available.                                     3.2   Baseline Approaches
   There are three crucial aspects of the game, technology,        We use 3 algorithms from the original framework (Perez et
economy management and combat, which determine the                 al. 2020) as baseline approaches: MCTS, RHEA and rule-
playing strategy. The critical element in economy manage-          based agents. It is interesting to compare to RHEA and rule-
ment is handling stars, which is used for building, gather-        based agents as they outperform MCTS in the original paper.
ing resources, spawning units and researching technologies.           Rule Based (RB): The simple rule-based agent is the only
Stars are produced by cities on each turn. Buildings and re-       agent that does not rely on the forward model in this pa-
sources provide population to the city. When the population        per. The forward model allows the agent to sample a possi-
exceeds the level of the city, the city is upgraded, and a         ble future state given a particular action in the current state.
bonus (extra production, units, city border growth, etc.) is       In (Perez et al. 2020), RB has the second-highest winning
unlocked. For city levels 5 and higher, the player can se-         rate and in particular beats MCTS in 56.2% of the games.
lect a bonus between the strongest game unit (super unit           The agent only uses the information of the current state,
or Giant) or building a park (extra score). The higher the         evaluating each action separately and returning the one with
city level is, the higher the number of stars it produces per      the highest value. If there is more than one action with the
turn. The building can only be built in their own territory,       highest value, it picks up randomly to break the tie. The eval-
which is formed by the 3 × 3 tiles around the city. The types      uation is hand-crafted and relies on domain knowledge. The
of units and buildings that can be spawned and constructed         main weakness of this agent is that it considers every action
are limited at the beginning of the game, and more types           independently and does not consider the action’s effect in the
are unlocked by researching the different 24 technologies          game state. The insights of this agent are that it disables the
available. Two types of combat units can be spawned, be-           destroy and disband action and focus on capturing, examin-
ing melee (Warrior, Rider, Defender, Swordsman, Knight)            ing ruins and turning units into veteran ones when possible.
and ranged (Archer, Catapult). Each unit has different attack      For more details about rule-based agents can refer to (Perez
power, range and health points (HP). They can also embark          et al. 2020).
in port buildings and become a boat unit. Boats have three            Rolling Horizon Evolutionary Algorithms (RHEA):
levels: boat, ship and battleship. Attack power, movement          RHEA, described in Section 2.3, achieved the best perfor-
range and HP are increased when upgrading the boat unit.           mance in (Perez et al. 2020) with a 63% winning rate against
Moreover, a different type of units can perform a distinct         MCTS. In the experiments of this paper, the RHEA agent
uses the same parameters: population size of 1 and individ-         there are two versions of MCTS with hard pruning: with and
ual length of 20. Similar to the RB agent, the disbanding           without move pruning. The reason for pruning the move is
action and destroying actions are removed.                          that it increases the action space significantly: every unit has
   Monte Carlo Tree Search (MCTS): MCTS, described in               multiple move actions based on its move distance and the
Section 2.1, has below 50% wining rate agents RHEA and              road. When it comes to the later game, a player tends to
RB in (Perez et al. 2020). This MCTS variant prioritizes the        have many units, increasing the action space significantly.
root action by picking up different subsets of actions from         The move action is pruned by the equation 1. It firstly ex-
city, unit and tribe actions. This prioritization is only applied   tracts the move action from all action sets. The move actions
to the root, not the rest of the nodes. The maximum depth of        whose destination is not a ruins or a city tile are extracted
the tree is 20. UCB1 is chosen √ as the tree policy, with an        and picked by the equation 1 with α = 1 and T = 5. The
exploration constant set to 2. No rollouts are performed            pruned move action will then be removed and never be con-
in the simulation step, instead of evaluating the state being       sidered in the action set of the current node. Although the
rolled based on default policy until the tree reaches certain       pruned actions are removed in the current node, they might
depth or terminal state, the state of the current node is eval-     appear in the following nodes meaning they still will be con-
uated, and the value is updated in the back-propagation step.       sidered in a later step.
To differentiate it from the other MCTS versions proposed
in this paper, we refer to this as Default MCTS.                    3.4   MCTS with progressive widening
   Heuristic: A heuristic function is used to evaluate a game       This variant of MCTS is a modification of MCTS with hard
state. It compares the difference between two states and eval-      pruning. Instead of permanently removing the pruned nodes,
uates them based on seven features. The features in (Perez et       the algorithm unprunes the nodes in later iterations. This
al. 2020) include the differences between production, num-          process allows the tree to search for every node and alleviate
ber of technologies, score, number of units, number of kills        the risk of pruning the best action. It is done by increas-
and the total level of cities. Different features have different    ing the number of remaining nodes if the iteration number
weights, set to 5, 4, 0.1, 4, 2, 3 and 2 respectively, by trial     is over the value given by Equation 2 (Chaslot et al. 2008).
and error. Two features added in this paper, stressing the im-      When the number of iterations is greater than Υ, a pruned
portance of building and occupying the enemy’s cities. Ev-          action picked uniformly at randomly is unpruned.
ery new building will provide 5 points and each unit in the
enemy’s city will add 10 points to the score of the state.                                Υ = αβ n−n init                       (2)
                                                                       α and β control the unpruning ratio, and they are set to
3.3   MCTS with hard pruning                                        20 and 1.7 respectively. The higher value β is, the higher
MCTS with hard pruning modifies the tree policy of MCTS             the number of iterations is needed for unpruning, approxi-
to reduce the branching factor. The tree policy not only            mating the behaviour of hard-pruning. n is the size of the
guides search, it also prunes the node’s children if they are       unpruned set of the current node and n init is the initial
visited from the parent more than the 20 times. The num-            number of unpruned nodes, set to 5. As for the hard pruning
ber of remaining nodes, being the number of children kept           case, two variants of MCTS with progressive widening have
after pruning, is based on the branching factor and the limit       been tested: with and without move pruning.
for minimum nodes, shown in Equation 1. It will prune the
children of nodes until the number of children matches the                      4    Experiments and results
number of remaining nodes by evaluating its state based             This section provides the experimental setup for testing
on heuristic described above. The children with the lowest          MCTS with pruning followed by the results of our tests. Fi-
value are pruned and never visited in the later iteration.          nally, an analysis of the MCTS tree is provided.
           remainingN odes = max (α log n, T )               (1)    4.1   Experimental Setup
   α controls the speed of pruning when the number of ac-           Both versions (hard pruning and progressive widening) of
tions grows. This parameter plays a vital role in deciding the      MCTS with pruning (i.e. with and without move pruning)
number of the remaining nodes when the branching factor             were pitched against the rule-based, the default MCTS and
becomes huge. The higher it is, the more children are kept.         RHEA agents, as included in the framework. For all search
T is the minimum number of actions that need to be kept             agents, the stopping condition is set to 2000 usages of the
after pruning. If there is any action set smaller than T , prun-    forward model’s next method.
ing is not applied. The reason for using logarithm to prune            Each agent pairing plays 5000 games (25 seeds with 200
the tree is to reduce most of the actions when the actions          repetitions). The seeds are the same as in the original paper
set becomes extremely large. The parameters are decided by          (Perez et al. 2020) and they are used to generate different
running the experiment under a different setting. The setting       levels, all of size 11 × 11 tiles. Because levels might not
with the highest winning rate is used.                              be balanced for both contestants, players swap starting po-
   Besides pruning by the tree, the destroy action is excluded      sitions within the 40 repetitions of every seed. Each game
as it does not provide any rewards and results in a reduced         is independent, meaning that no information is carried from
score. Furthermore, MCTS with hard pruning has another              one game to the next. The game mode is Capital with full ob-
parameter to decide to prune the move actions or not. Thus          servability, thus the winning condition is set to take over the
               Table 1: Averaged winning rate for each version of MCTS vs 3 baseline agents over 5000 games.
                                                Hard Pruning                    Progressive Widening
                                       RB       Default MCTS     RHEA       RB     Default MCTS RHEA
                Move Pruning          56.9%         63.3%        57.8%     57.8%        62%        57.7%
               No Move Pruning        54.7%         61.2%        54.5%     55.8%       60.1%       57.7%


enemy’s capital city. In order to have finite games, a winner    former, and above 17 for the latter. Results also show that
is declared after 50 turns, corresponding to the player with     MCTS with progressive widening has higher number of re-
the highest score at that point.                                 searched technologies, and move pruning produces higher
                                                                 final scores, cities and production. This statistic suggests
4.2   Game Results                                               that the modified MCTS is more focused on occupying cities
Table 1 shows the results of both versions of MCTS com-          rather than trying to maximize the final scores, which may be
peting with three baseline approaches. The first row and first   a consequence of the new features in the heuristic function
column indicates the version (hard pruning or progressive        that gives extra value for units in the enemy’s cities. In com-
widening) and with or without move pruning respectively.         parison to RB and RHEA, the score and researched tech-
Default MCTS shows 43.8% and 37% winning rates when              nologies percentage of MCTS sit between those two agents,
playing against the RB and RHEA agents, respectively (not        while the number of owned cities and final production is
in the table). With the help of pruning techniques, all ver-     smaller than RB and RHEA. The statistics of the different
sions of MCTS have more than 50% winning rate playing            versions of MCTS explored in this work, in relation to RB
against the baseline approaches.                                 and RHEA, are comparable with those of default MCTS.
   With the help of hard pruning, MCTS increases its win
rate approximately 8% and 12% compared to the original           4.3     Tree analysis
MCTS when playing against RB and RHEA, respectively.             This section shows which action groups and actions are exe-
Move pruning and progressive widening further increase the       cuted more often, an analysis of the tree depth and the fully
performance of MCTS with hard pruning. MCTS with hard            expanded rate of the recommend action node in terms of
pruning and move pruning shows the highest winning rate          different action spaces and turns in the game. The fully ex-
when competing against the original MCTS and RHEA. As            panded rate of the recommended action node is measured
can be seen, pruning moves increases the performance of          between 0 and 1. If the value is 1, it means the recommended
MCTS in relation to only applying progressive widening or        action node has visited all of its possible children (exclud-
hard pruning. Although using hard pruning or progressive         ing the pruned nodes), from its current state. A value of 0
widening is clearly better than using no pruning at all, the     indicates that no children have been explored1 .
difference between these two methods is minimal. The win-           During the game, the unit action group is the most fre-
ning rate is higher for progressive widening when playing        quently chosen: 68% of the time. The city and tribe action
against RB, but lower than hard pruning when the oppo-           groups are only chosen 20.5% and 11.4% of the time re-
nent is default MCTS and RHEA. The difference however            spectively. This difference are a consequence of unit actions
is likely not to be significant, and different parameters for    taking a large proportion of the action space. The most fre-
Equation (?) may alter the unpruning rate and provide a bet-     quently executed action is move. The move action has the
ter value estimate of the different moves. Overall, although     largest percentage (58.9%) among all actions, also explain-
hard pruning, progressive widening, and domain knowledge         ing why the unit action group has the largest proportion. The
(move pruning) can improve the performance, using domain         move action does not provide a significant reward per se, but
knowledge has clearer effects in the winning rate.               it can lead to future reward if the unit moves to ruins or en-
   Table 2 shows the statistics of the different versions of     emy cities. Applying the move pruning does not change the
MCTS over 15000 games (aggregated for each 5000 pair-            proportion between the action groups. This may be caused
ings against the baseline approaches). It shows the statis-      by move actions being taken towards the end of the turn.
tics of average winning rate over three baseline approach,          Figures 2 and 3 show the influence of the branching fac-
final score, researched technologies percentage, number of       tor in the depth of MCTS and fully expanded rate of the
owned cities at the end game and final production. The val-      recommend action node. The original MCTS is unable to go
ues obtained for these statistics for the default MCTS in        deeper than 2 levels and visit all the possible actions from the
(Perez et al. 2020) (not in this table) are 9966.55, 85.27%,     current state when the branching factor is over 50. With the
2.23 and 16.96, respectively. The score and researched tech-     help of pruning techniques, this problem happens only when
nologies percentage of every version of MCTS are consid-         the size of the action space grows over 180. This is three
erably smaller than the original MCTS and have a similar         times larger than in the original MCTS, providing better ac-
value to the ones of the RB agent (8076.32 for scores and        tion value estimates in early game turns (when the action
71.14% for researched technologies). However, the number
of cities owned at the game end and final production val-          1
                                                                     Note that these are the children of the node of the recom-
ues are slightly larger, now between 2.24 and 2.26 for the       mended action, not the children of the root.
                         Table 2: The aggregated statistics for each version of MCTS over 15000 games
                                     Hard Pruning                                        Progressive widening
                  Win Rate        Score    Techs        Cities     Prod.   Win Rate      Score     Techs    Cities       Prod.
      Move
                    59.3%        8500.06     77.88%      2.26      17.38    59.1%       8564.36     79.08%      2.25     17.34
     Pruning
     No Move
                    56.8%        8458.52     77.46%      2.24      17.16    57.8%       8446.04     78.17%      2.24     17.07
     Pruning


 Figure 2: MCTS depth under different action space sizes.                   Figure 4: The depth of MCTS in different turns.


                                                                      Figure 5: The fully explored rate for the recommended ac-
Figure 3: The fully explored rate for the recommended ac-
                                                                      tion in MCTS under different turn
tion in MCTS under different action space sizes.

                                                                      ically when the action space size reaches 50. Figures 2 and
space tends to be smaller). The depth of the tree sharply de-         3 show the effect of unpruning in progressive widening and
creases when the branching factor grows. As expected, due             how the search tree is explored differently in each case.
to the unpruning mechanism, progressive widening explores                Figures 4 and 5 show the influence of the game turn in
more shallow trees than hard pruning, which is most notice-           the depth of MCTS and the fully expanded rate of the rec-
able when the action space size is between 100 and 150.               ommended action node. Note that, in Tribes, the size of
Compared to hard pruning, progressive widening is unable              the action space tends to grow (particularly if the game is
to fully explore the chosen action when the branching factor          favourable to the player) with the number of turns. An anal-
becomes over 100. Although hard pruning also faces this is-           ysis of this can be found at (Perez et al. 2020). The depth
sue, it happens later than progressive widening. The rate of          of not-pruned MCTS drops to around 3 when the game turn
exploration shown in Figure 3 shows a similar trend: hard             reaches 30, and the fully explored rate of chosen action starts
pruning is able to delay the decay of children explored fur-          to drop in turn 10 and ends up with the 40% in turn 50. The
ther than progressive widening, and both approaches show              pruned versions of MCTS show a similar trend but dropping
clearly that they allow the tree to visit a higher proportion of      more slowly than the original MCTS. The depth of modi-
actions from the root than when there is no pruning at all.           fied MCTS is around 4 at the game end, and the fully ex-
With no pruning, the rate of default MCTS decays dramat-              plored rate of the chosen action is considerably higher than
that of the original MCTS in turn 50. Hard pruning decreases    Childs, B. E.; Brodeur, J. H.; and Kocsis, L. 2008. Trans-
slower than progressive widening, and move pruning further      positions and move groups in monte carlo tree search. In
slows down the decrease. This shows the significant effects     2008 IEEE Symposium On Computational Intelligence and
of move pruning in increasing the explored ratio of the cho-    Games, 389–395.
sen action in later turns.                                      Churchill, D., and Buro, M. 2013. Portfolio greedy search
                                                                and simulation for large-scale combat in starcraft. In 2013
         5    Conclusion and future work                        IEEE Conference on Computational Inteligence in Games
This paper focuses on exploring pruning techniques for          (CIG), 1–8. IEEE.
MCTS in a multi-action strategy game, Tribes, in which          Churchill, D., and Buro, M. 2015. Hierarchical portfo-
players must execute multiple actions in a turn while man-      lio search: Prismata’s robust AI architecture for games with
aging other game aspects such as economy, technology re-        large search spaces. In Eleventh Artificial Intelligence and
search and resource gathering. Moreover, the size of action     Interactive Digital Entertainment Conference (AIIDE).
space is not fixed and generally increases during the game,
                                                                Coulom, R. 2007. Computing “elo ratings” of move patterns
resulting in a high and difficult to manage branching factor.
                                                                in the game of go. ICGA journal 30(4):198–208.
   MCTS can handle large branching factors but it strug-
gles when the action space size reaches more than a hun-        Gelly, S., and Silver, D. 2007. Combining online and offline
dred (Baier and Cowling 2018; Chaslot et al. 2008), which       knowledge in UCT. In Proceedings of the 24th international
is actually the case in Tribes. This paper shows that using     conference on Machine learning (ICML), 273–280. Associ-
pruning techniques help MCTS deal with this issue. We ex-       ation for Computing Machinery.
plored progressive widening, hard pruning and the insertion     Gelly, S., and Wang, Y. 2006. Exploration exploitation in
of domain knowledge pruning for move actions. The inclu-        go: Uct for monte-carlo go.
sion of progressive widening and hard pruning improves the      Justesen, N.; Mahlmann, T.; Risi, S.; and Togelius, J. 2017.
performance of MCTS in a similar fashion, and results are       Playing multiaction adversarial games: Online evolutionary
even better when adding the move action pruning.                planning versus tree search. IEEE Transactions on Games
   We propose different avenues of future work. One possi-      10(3):281–291.
bility is to explore other pruning algorithms, like NaiveM-
                                                                Justesen, N.; Uth, L. M.; Jakobsen, C.; Moore, P. D.; To-
CTS (Ontanón 2017) or other types of heuristic prun-
                                                                gelius, J.; and Risi, S. 2019. Blood bowl: A new board game
ing (Sephton et al. 2014), or to dive deeper in the use of
                                                                challenge and competition for AI. In 2019 IEEE Conference
domain knowledge for pruning. This paper shows that de-
                                                                on Games (CoG), 1–8.
creasing the proportion of move actions increases the per-
formance of the agent. However, several actions might able      Justesen, N.; Mahlmann, T.; and Togelius, J. 2016. Online
to benefit from using domain knowledge, such as Level Up        evolution for multi-action adversarial games. In Applica-
or the Build actions. A more general approach would be to       tions of Evolutionary Computation, 590–603. Springer In-
use machine or reinforcement learning techniques to learn       ternational Publishing.
which actions must be pruned. Another interesting possibil-     Midjiwan AB. 2016. The Battle of Polytopia.
ity is to explore variants of OEP or evolutionary MCTS that     Ontanón, S. 2017. Combinatorial multi-armed bandits for
deal with varying number of actions per turn. Finally, there    real-time strategy games. Journal of Artificial Intelligence
is abundant work on portfilio approaches for strategy games,    Research 58:665–702.
and exploring synergies between these and different pruning
                                                                Perez, D.; Samothrakis, S.; Lucas, S.; and Rohlfshagen, P.
techniques can lead to other ways of dealing with large ac-
                                                                2013. Rolling horizon evolution versus tree search for nav-
tion space sizes.
                                                                igation in single-player real-time games. In Proceedings of
                                                                the 15th Annual Conference on Genetic and Evolutionary
                       References                               Computation, 351–358.
Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite-
                                                                Perez, D.; Hsu, Y.-J.; Emmanouilidis, S.; Khaleque, B.
time analysis of the multiarmed bandit problem. Machine
                                                                D. A.; and Gaina, R. D. 2020. Tribes: A new turn-based
learning 47(2-3):235–256.
                                                                strategy game for AI research. In Proceedings AIIDE’20.
Baier, H., and Cowling, P. I. 2018. Evolutionary MCTS
                                                                Ramanujan, R.; Sabharwal, A.; and Selman, B. 2010. On
for multi-action adversarial games. In EEE Conference on
                                                                adversarial search spaces and sampling-based planning. In
Computational Intelligence and Games, 1–8. IEEE.
                                                                Twentieth International Conference on Automated Planning
Browne, C. B.; Powley, E.; Whitehouse, D.; Lucas, S. M.;        and Scheduling.
Cowling, P. I.; Rohlfshagen, P.; Tavener, S.; Perez, D.;
                                                                Sephton, N.; Cowling, P. I.; Powley, E.; and Slaven, N. H.
Samothrakis, S.; and Colton, S. 2012. A survey of monte
                                                                2014. Heuristic move pruning in monte carlo tree search for
carlo tree search methods. IEEE Transactions on Computa-
                                                                the strategic card game lords of war. In 2014 IEEE Confer-
tional Intelligence and AI in games 4(1):1–43.
                                                                ence on Computational Intelligence and Games, 1–7. IEEE.
Chaslot, G. M. J.; Winands, M. H.; HERIK, H. J. V. D.;
Uiterwijk, J. W.; and Bouzy, B. 2008. Progressive strategies    Teytaud, F., and Teytaud, O. 2009. Creating an upper-
for monte-carlo tree search. New Mathematics and Natural        confidence-tree program for havannah. In Advances in Com-
Computation (NMNC) 4(03):343–357.                               puter Games, 65–74. Springer.

</pre>