=Paper=
{{Paper
|id=Vol-2862/paper30
|storemode=property
|title=STRATEGA: A General Strategy Games Framework
|pdfUrl=https://ceur-ws.org/Vol-2862/paper30.pdf
|volume=Vol-2862
|authors=Alexander Dockhorn,Jorge Hurtado-Grueso,Dominik Jeurissen,Diego Perez-Liebana
|dblpUrl=https://dblp.org/rec/conf/aiide/DockhornGJL20
}}
==STRATEGA: A General Strategy Games Framework==
<pdf width="1500px">https://ceur-ws.org/Vol-2862/paper30.pdf</pdf>
<pre>
                          S TRATEGA - A General Strategy Games Framework


        Alexander Dockhorn, Jorge Hurtado-Grueso, Dominik Jeurissen, Diego Perez-Liebana
                                        School of Electronic Engineering and Computer Science
                                                Queen Mary University of London, UK


                            Abstract                                  one game to another, we believe the time is right to introduce
                                                                      general AI into this domain. The objective of this paper is to
  Strategy games are complex environments often used in AI-           present a new framework for general AI research in strategy
  research to evaluate new algorithms. Despite the common-
                                                                      games, which tackles the fundamental problems of this type
  alities of most strategy games, often research is focused on
  one game only, which may lead to bias or overfitting to a par-      of domains without focusing on an individual game at a time:
  ticular environment. In this paper, we motivate and present         resource management, decision making under uncertainty,
  S TRATEGA - a general strategy games framework for playing          spatial and temporal reasoning, competition and collabora-
  n-player turn-based and real-time strategy games. The plat-         tion in multiple-player settings, partial observability, large
  form currently implements turn-based games, which can be            action spaces and opponent modelling, among others.
  configured via YAML-files. It exposes an API with access to a          In a similar way to General Game Playing (GGP) (Gene-
  forward model to facilitate research on statistical forward plan-   sereth, Love, and Pell 2005) and General Video Game Play-
  ning agents. The framework and agents can log information           ing (Perez-Liebana, Lucas, and others 2019), this paper pro-
  during games for analysing and debugging algorithms. We             poses a general multi-agent, multi-action strategy games
  also present some sample rule-based agents, as well as search-
  based agents like Monte Carlo Tree Search and Rolling Hori-
                                                                      framework for AI research. Section 3 presents the vision for
  zon Evolution, and quantitatively analyse their performance         this framework, while Section 4 describes the current imple-
  to demonstrate the use of the framework. Results, although          mentation. Our main interest when proposing this framework
  purely illustrative, show the known problems that traditional       is to foster research into the complexity of the action decision
  search-based agents have when dealing with high branching           process without a dependency on a concrete strategy game.
  factors in these games. S TRATEGA can be downloaded at:             The following list summarises the main characteristics of this
            https://github.com/GAIGResearch/Stratega                  platform:
                                                                      • Games and levels are defined via text files in YAML format.
                      1    Introduction                                 These files include definitions for games (rules, duration,
                                                                        winning conditions, terrain types and effects), units (skills,
Since Michael Buro motivated AI research in strategy                    actions per turn, movement and combat features) and their
games (Buro 2003), multiple games and frameworks have                   actions (strength, range and targets).
been proposed and used by investigators in the field. These
games, although different in themes, rules and goals, share           • S TRATEGA incorporates a Forward Model (FM) that per-
certain characteristics that make them interesting for Game             mits rolling any game state forward by supplying an action
AI research. Most of the work done in this area pertains to             during the agent’s thinking time. The FM is used by the
the sub-genre of real-time strategy games (and, in particular,          Statistical Forward Planning (SFP) within the framework:
to Starcraft (Ontanón, Synnaeve, and others 2013)), but it             One Step Look Ahead, Monte Carlo Tree Search (MCTS)
is possible to find abundant research in other real-time and            and Rolling Horizon Evolutionary Algorithm (RHEA).
turn-based strategy games in the literature (see Section 2).          • A common API for all agents that provides access to the
   The complexity of these problems is often addressed by               FM and a copy of the current game state, which supplies
incorporating game domain knowledge into the agents, ei-                information about the state of the game and entities. This
ther by providing the AI with programmed game specific                  copy of the state can be modified by the agent, so what-if
information or training it with game replays. Despite the con-          scenarios can be built for planning. This is particularly
tributions made this way can be significant, and in some cases          relevant to tackle partial observability in RTS games.
it is possible to transfer algorithms and architectures from
                                                                      • The framework includes functionality for profiling the
Copyright c 2020 for this paper by its authors. Use permitted under     agent’s decision-making process, in particular regarding
Creative Commons License Attribution 4.0 International (CC BY           FM usage. This facilitates analysis of the impact, in the
4.0).                                                                   execution of the game, of methods that use these models
  (such as the ones included in the benchmark) by providing            The most complete turn-based strategy games framework
  the footprint in time and memory of FM usage.                     to date is arguably Freeciv (Prochaska and others 1996), in-
• Functionality for game and agent logging, in order to un-         spired by Sid Meier’s Civilization series (Firaxis 1995 2020).
  derstand the complexity of the action-decision process            It incorporates most of the complexities and dynamics of
  faced by the agents and easily analyse experimental results.      the original game, allowing the interactions between poten-
  This information includes data at the game end (outcome           tially hundreds of players. Due to its complexity, most re-
  and duration), turn (score, leading player, state size, actions   searchers have used it to tackle certain aspects of strategy
  executed) and action (action space) levels.                       games, like level exploration and city placement (Jones and
                                                                    Goel 2004) (Arnold, Horvat, and Sacks 2004).
   The aim of this framework is not only to provide a general          Regarding real-time strategy games, microRTS (Ontañón
benchmark for research on game playing AI performance on            et al. 2018) is a framework and competition developed to
strategy games, but also to shed some light on how decisions        foster research in this genre, which generally has a high entry
are made and the complexity of the games. A common API              threshold due to the complexity of the game to be learned
for games and agents allows to build new scenarios and to           (e.g. Starcraft using BWAPI(Team 2020)). In comparison
compare different AI approaches in them. In fact, the second        to other frameworks, players can issue commands at the
contribution of this paper is to showcase the use of the current    same time and each action takes a fixed time to complete.
framework. To this end, Section 5 presents baseline results for     The framework implements various unit and building types
the agents and games already implemented in this benchmark.         that act on the player’s command. The framework supports
                                                                    both fully and partially observable states. Recently, AlphaS-
                    2    Related Work                               tar (Vinyals et al. 2019) shows the great proficiency of deep
S TRATEGA incorporates many features of well-known strat-           supervised and reinforcement learning (RL) methods in Star-
egy games and GGP frameworks, described in this section.            craft II. With the use of an important amount of computational
                                                                    resources, their system is able to beat professional human
2.1   Strategy Games                                                players consistently by learning first from human replays and
There is a relevant proliferation of multi-action and multi-unit    then training multiple versions of their agent in the so-called
games in the literature of games research in the last couple of     AlphaStar League. This example show that even for complex
decades, ranging from situational and tactical environments         RTS games it is possible to develop agents of high proficiency
to fully-fledged strategy games. An example of the former           which so far remain limited to play a single game.
is HeroAIcademy (Justesen, Mahlmann, and others 2017),
where the authors present a turn-based game where each              2.2   General Game Playing (GGP)
player controls a series of different units in a small board.
Each turn, players distribute 5 action points across these units    As previously mentioned, the goal of S TRATEGA is to support
with the objective of destroying the opponent’s base. The           research on general strategy game playing, which forms an
authors used this framework to introduce Online Evolution-          interesting sub-domain of general game playing. GGP has al-
ary Planning, outperforming tree search methods with more           ready been supported by numerous frameworks that focus on
efficient management of the game’s large branching factor.          its different aspects. The GGP framework (Genesereth, Love,
   Later on, (Justesen et al. 2019) introduced the Fantasy          and Pell 2005) was introduced to study general board-games
Football AI (FFAI) framework and its accompanying Bot               and its game description language motivated the development
Bowl competition. This is a fully-observable, stochastic, turn-     of the video game description language (Schaul 2013) and
based game with a grid-based game board. Due to a large             its accompanying General Video Game AI (GVGAI) (Perez-
number of actions per unit and the possibility to move each         Liebana, Liu, and others 2019) framework. GVGAI focuses
unit several times per turn, the branching factor is enormous,      on 2D tile-based arcade-like video games and supports a
reportedly the largest in the literature of turn-based board        small number of 2D physics-based games. In a similar fash-
games. Its gym interface provides access to several environ-        ion, the Arcade Learning Environment (ALE) (Bellemare
ments each offering a vector-based state observation. While         et al. 2013) provides access to Atari 2600 games, offering
those environments differ in the size of the game-board, the        multiple ways in which game states can be perceived and is
rules of the underlying game cannot be adjusted.                    tightly interconnected with Open AI Gym (Brockman et al.
   Recently, (Perez-Liebana et al. 2020b) provided an open-         2016).
source implementation of the multi-player turn-based strategy          Different styles of defining games have been presented by
and award-winning game The Battle of Polytopia (Midjiwan            the Ludii (Piette et al. 2019) and the Regular Boardgames
AB 2016). In this game, players need to deal with resource          (RBG) (Kowalski et al. 2019) frameworks. While the former
management and production, technology trees, terrain types,         uses high-level game-related concepts (ludemes) for game
partial observability and control multiple units of different       definitions, the latter uses regular expressions. Both permit
types. The action space is very large, with averages of more        defining turn-based games, but currently seem to lack meth-
than 50 possible actions per move, and an estimated branch-         ods for implementing real-time games. Finally, (Tian et al.
ing factor per state of 1015 . The framework includes support       2017) propose the Extensive, Lightweight and Flexible (ELF)
for SFP agents, including MCTS and RHEA, which in base-             platform that allows the execution of Atari, Board and Real-
line experiments seem to be at a similar level to rule-based        time Strategy games. In particular, ELF incorporates Mini-
agents, but inferior to a human level of play.                      RTS, a fast RTS game with a similar scope to microRTS.
2.3       General Strategy Games
Some initial attempts have been made to provide platforms
to host multiple RTS games. In one of the most recent
works, (Andersen, Goodwin, and Granmo 2018) signified
the need for an RTS framework that can adjust the com-
plexity of its games and presented Deep RTS. This platform
focused on providing a research benchmark for (deep) Rein-                 Figure 1: Overall structure of the framework.
forcement Learning methods, supported games of different
complexity, ranging from low (such as those in microRTS) to
high (as Starcraft II). A similar but more flexible framework
is Stratagus (Ponsen et al. 2005), a platform that shares some
characteristics with our proposal. Different strategy games
can be configured via text files and LUA scripts can be used
to modify some game dynamics. Some statistics are also
gathered for all games, such as units killed and lost, and a
common API is provided for agents.
    Our general strategy games platform goes beyond these
proposals in a two-fold manner. First, from the perspective
of the agents, we provide forward model functionality to
enable the use of statistical forward planning agents. Sec-
ondly, from the games perspective, our platform provides
higher customisation of the game mechanics, allowing the
specification of game goals, terrain features, unit and action
types, complemented with agent and game logging function-
ality. Furthermore, the S TRATEGA framework makes use of           Figure 2: Exemplary game state of S TRATEGA and its GUI.
higher level concepts to ease development and customisation
of strategy games. While GPP frameworks may be able to
produce strategy games of similar complexity, they can re-         files. The excerpt of the Action YAML-file shown in Figure 3
quire extensive effort to encode the games we are looking          shows the definition of an Attack action. Several properties
at.                                                                can be configured and even new properties can be added to
                                                                   the framework. In the example, the attack action has a range
      3    Platform for General Strategy Games                     of 6 tiles, damage of 10 (that can affect friendly units) and
                                                                   establishes if and how much score the attacker and attacked
S TRATEGA currently implements n-player turn-based strat-          players receive when the action is executed.
egy games, where games use a 2D-tile representation of level          The unit definition in the Units YAML-file shown in Fig-
and units (Perez-Liebana et al. 2020a). During a single turn,      ure 3 follows a similar pattern, allowing for hierarchically
a player can issue one or more actions to each unit. While         structured unit types. In our example, the LongRangeUnit is
the standard game-play lets all players fight until all but one    an extension of the BasicUnit type, inheriting the properties
has lost all its units, the game’s rules can be modified to im-    from its base. The entry defines basic properties for the unit:
plement custom winning conditions. At the game start, every        range of vision, movement, base attack damage, health and
player receives a set of units, which can be moved along the       also the Path to its graphical asset. The actions available for
grid and attack other units to reduce their health. Further-       this unit, as defined in the units YAML file, are indicated
more, units can get assigned special abilities and differ in       under the Actions heading. Two more attributes indicate the
their range, health points, damage and other variables.            number of actions that the unit can execute during a turn
   The framework is written in C++. It can run headless or         (NumberActionExecutePerTurn) and if they can be executed
with a graphical user interface (GUI) that provides informa-       more than once. Finally, CanBeMoreThanOne determines if
tion on the current game state, run-time statistics, and allows    this unit can be instantiated multiple times or is unique.
human players to play the game. The GUI, the game engine              The Configuration YAML-file defines how a specific in-
and game playing agents are separated in multiple threads          stance of a game should be created and played. The game
to maximise the framework’s efficiency and the developer’s         rules section defines how many turns a game can have and
control over the agents. Figure 2 shows a screenshot of the        if players or agents have a limited time budget to execute
current state of the framework. Isometric assets are included      their turns. Game modifiers include (but are not limited to)
in the platform to depict different types of units and terrains,   turning on/off the fog of war or changing winning conditions.
which can also be assigned via YAML configuration files.           This eases the customisation by quickly modifying the game
Figure 1 shows the overall structure of the framework.             without changing and compiling its code, allowing to reuse
                                                                   the units and actions in different games. The Configuration
3.1       Creating Games                                           YAML-file also specifies the list of N ≥ 2 players and the
   The definition of all game components such as units, abili-     level to play in. This level is formed by the initial distribution
ties, game modifiers, levels and tiles is done through YAML-       of the tiles and the definition of each terrain type. Figure 3
      Actions:                                                     receives a copy of the game state, which can be modified (i.e.
       - Attack:                                                   to incorporate game objects into the non-visible part of the
         Value: 10                                                 level due to fog of war) and rolled forward supplying actions.
         Range: 6                                                  This access to an FM facilitates the implementation of SFP
         GiveRewardAttacker: true
                                                                   agents. Agents have also access to the properties of the game
         AttackerReward: 2
                                                                   state (positions of units, terrain tiles, game turn, score, etc.)
         GiveRewardAttacked: false
         AttackedReward: -1
                                                                   and a pathfinding feature to determine shortest paths.
         CanExecutedToFriends: true                                   On each turn, the game requests an action from the player
                                                                   to be executed on the game. The agent receives information
      Units:                                                       about all possible actions that can be executed in the current
       - BasicUnit:                                                game state, for all the units present in the board. The agents
         NumberActionExecutePerTurn: 1                             can choose to return one of these actions, or a special action
         CanRepeatSameAction: false
                                                                   that indicates that the agent does not want to execute any
         Types:
          - LongRangeUnit:
                                                                   more actions during the present turn. The game requests an
             RangeVision: 6
                                                                   action from the player as long as i) the agent does not return
             RangeMovement: 4                                      an EndTurn action; ii) there are still actions available for the
             RangeAction: 5                                        player; and iii) the turn budget, if specified in the YAML
             AttackDamage: 70                                      configuration files, is not consumed.
             Health: 100
             Path: LongRange.png                                   3.3   Debugging and Logging
             Actions:
              - Move
                                                                   One of the most important aspects of this framework is the ca-
              - Attack                                             pability of analysing and logging game states and executions.
             CanBeMoreThanOne: false                               Figure 2 shows live debug information by means of interac-
                                                                   tive floating windows. This information includes game data
      Game Configuration:                                          (current turn, number of actions executed by a player in a
        Game Rules:                                                turn, frames per second, score and leading player), profiling
          TimeForEachTurn: 10
                                                                   (size - in bytes - of the game state and time - in microseconds
          NumberOfMaxRounds: 100
                                                                   - needed to copy the game state, advance the forward model
        Players:
         - MCTS Player
                                                                   by one action and the time taken by agents to provide a move
         - RHEA Player                                             to play) and action and unit information, which indicates
        Level: >                                                   the size of the action space in the current game state and the
          XXXXXXXXXXXXX                                            accumulated action space during the present turn. The inter-
          XX.........XX                                            face also allows us to obtain more information and execute
          X.....T.....X                                            those actions from the list on the floating window, as well as
          XX.........XX                                            obtaining information from the units in the game.
          XXXXXXXXXXXXX                                               Once the game is over, a log file is also written in YAML
        LevelNodes:
                                                                   format, including per-turn information on decision-making
         - Trap:
           Character: T
                                                                   time, score, action space and actions executed, number of
           Walkable: true
                                                                   units, player rankings and specific game information. The
           Cover: false                                            framework includes simple scripts to analyse this data and
           HealthEffect: true                                      produce logging plots as the ones shown below in Figure 4.
           EffectEntry: true
           KillUnit: false                                               4   A Turn-based Strategy Framework
                                                                   This section describes the implementation of the agents and 3
Figure 3: Excerpts of YAML files that define the game. From        turn-based strategy games defined currently in the platform.
top to bottom: actions, units, and general rules.
                                                                   4.1   Games: Kings, Healers and Pushers
includes as an example the definition of a trap tile, which        In Kings, players receive a king unit and a random set of
indicates that is walkable (units can enter the tile), does not    additional units. Their task is to keep their king alive at all
offer any cover, deals 50 points of damage to the unit as soon     costs while trying to defeat the opponent’s king. Similar
as it enters the tile, but does not kill the unit automatically.   to chess, losing other units does not determine the end of
                                                                   the game but effectively reduces the flexibility of a player.
                                                                   Four types of units are defined in this game mode, archer,
3.2   Agent interface                                              warrior, healer, and the king. While the warrior moves slowly
We provide an API for agent development and offer access to        and deals high damage, the archer moves quickly, has long-
several sample agents (see Section 4.2). Agents must define a      range attacks, but its damage is reduced. In addition to its
constructor for initialising the player and a method to indicate   movement speed, the archer can also see further than any
an action to execute in the current game tick. This method         other unit in the game. The healer can restore other units’
health points. At last, the king can only move one square           Rule-based Push (RBP) Agent The Push-Agent is highly
at a time but deals the highest damage. All units can move          specialised for games like Pushers. The agent’s strategy is to
and attack once in the same turn. The game is played on a           push opponents into a direction that is closest to a death trap.
map with different types of terrains, each type provides a          For each unit, the agent computes the shortest paths from the
different cover-percentage for reducing incoming damage.            unit to the adjacent tiles of the opponent’s units. Each path
Additionally, the map contains traps, which kill a unit upon        then gets assigned a score equal to the length of the path, plus
entering. The map is covered in fog-of-war, with each unit          an estimate of how long it would take to kill the opponent
revealing parts of the map based on its vision radius.              from the tile. Starting from the path with the lowest score,
   In the game Healers, both players have access to warriors        the agent checks if following the path for one turn would
and healers. The healer can move faster than the warrior but        result in a position that endangers the unit. If the path is not
cannot attack. In comparison to Kings, healers and warriors         dangerous it is assigned to the unit, otherwise, the agent will
have higher starting health points The twist in this game is        try the next path. Once a unit was assigned a path, it will
that all units receive damage at the end of each turn. The          either push the target opponent or follow the path. Once a
goal of the players is to keep their units alive, while they can    unit has moved or pushed, the agent will restart the process
attack the opponent’s units. The last player with units left        until no unit can act safely any more.
wins. The map contains plains and mountains, this time with
no tile providing coverage. Mountain act as a non-walkable          One Step Look Ahead (OSLA) Agent The OSLA agent
obstacle and fog-of-war is disabled on this game.                   uses the game’s forward model to predict the upcoming state
                                                                    for each of the available actions. Resulting states are rated
   The game Pushers is fundamentally different from the
                                                                    according to the SDH heuristic function. A high positive
way the other games are played. Only 1 unit type is available,
                                                                    (resp. negative) score will be used in case the agent won
the Pusher. They cannot attack other units but can push them
                                                                    (lost) the game after applying an action. Finally, the agent
one tile back once per turn, to make the other player’s units
                                                                    selects the action which yields the highest score.
fall into the traps in the level. The agent’s winning condition
remains the same (survive the longest), but the game focuses        Monte Carlo Tree Search (MCTS) Agent Over time,
on tactical movement instead of aggressive unit actions.            many variants of MCTS have been proposed for various
                                                                    problem domains (Browne et al. 2012). For the framework,
4.2   Agents                                                        we implemented a basic version of MCTS using the Upper
                                                                    Confidence Bounds (UCB) (Auer, Cesa-Bianchi, and Fischer
This section describes the different agents implemented in          2002) selection criterion. The MCTS agent uses a tree node
the framework and a heuristic used to evaluate game states.         structure to facilitate a search through the game’s state space.
                                                                    Each node stores an associated game state, a list of available
Strength Difference Heuristic (SDH) SDH is a heuristic              actions, and a pointer to one child node per action. The tree
to estimate the relative strength of a unit, which estimates        is initialised by creating a root node using the provided game
it as a linear sum of the unit’s attributes (maximum health,        state. During each iteration of the search, the agent first se-
attack damage, or movement range) divided by the maximum            lects a node, then expands it by another child node, further
value among all available unit types. If a unit cannot execute      simulates a rollout, and ends with backpropagating its value
an action, the corresponding attribute is 0. Note this heuristic    on the path to the root node. A node is selected for expansion
will not change during a game, dynamic attributes like a unit’s     by step-wise going down the tree until a tree node which has
current health are not considered in the strength-estimation.       not been fully expanded yet has been found. Each step, the
   To estimate the value of a state, we compute the difference      child node with the highest UCB value is selected. The new
between the strength of the current player’s units and the          child-node is generated by applying the associated action to
opponent’s units. Additionally, a unit’s strength is multiplied     the selected node’s game state. During the tree policy we do
with the percentage of remaining health to encourage attacks.       not consider opponent turns, instead we skip them to avoid
                                                                    the non-determinism of their action selection. The new child
Rule-based Combat (RBC) Agent This agent focuses on                 node’s value is determined by applying random actions until
combat-oriented games like Kings or Healers. Its strategy           the end of the game or a predetermined depth. Its value is
is to focus all attacks on a single enemy unit while keeping        backpropagated through the visited nodes of the tree until the
its units out of danger. Every time the agent has to make a         root. The search ends in case a maximum number of forward
decision, it first targets an enemy unit. It then tests for each    model calls has been reached. Finally, we return the root’s
friendly unit if it can attack an opponent, heal an ally, or move   child node with the highest visit count.
closer to the target. Once a valid action has been found, the
agent will execute it and repeat the process until no actions       Rolling Horizon Evolutionary Algorithm (RHEA) Agent
are left. The target is chosen based on an isolation-score. A       The Rolling Horizon Evolutionary Algorithm searches for
unit’s allies contribute negatively to the isolation score, while   an optimal action sequence with a fixed length (the hori-
its enemies contribute positively. The contribution is equal to     zon) (Perez-Liebana, Samothrakis, and others 2013). There-
the unit’s strength divided by the turns it takes for it to reach   fore, it first generates a pool of candidate action sequences
the unit. To find a target, the agent searches for an enemy         which is then continuously modified by an evolutionary al-
with the highest isolation score. When attacking or healing a       gorithm. Each individual is created by step-wise selecting an
unit, the agent prioritises units with high-strength.               action and applying it to the current game state. Afterwards,
the individual’s value is determined using a provided heuris-       Agents    RBC     OSLA MCTS           RHEA      Average
tic. Similarly, to the MCTS agent, the RHEA agent skips                                     Kings
the opponent’s turn during rollouts since they introduced           RBC         —      1.00     0.86        0.90      0.92
too much non-determinism in the evaluation of an action             RHEA       0.10    0.98     0.60         —        0.56
sequence. At the beginning of each iteration, tournament            MCTS       0.14    0.92      —          0.12      0.39
selection is applied to select the best individuals among a         OSLA       0.00     —       0.02        0.00      0.01
random subset of individuals. The generated pool is modified                               Healers
by mutation and crossover operators. During mutation, we            RBC         —      0.98     0.82        0.66      0.82
iterate over an individuals action list and randomly choose         RHEA       0.34    1.00     0.70         —        0.68
to replace an action with a random one. Remaining actions           MCTS       0.16    0.94      —          0.26      0.45
are checked if they would still be feasible according to the        OSLA       0.02     —       0.06        0.00      0.03
given game state and, if not, replaced by a random feasible                                Pushers
action. During crossover of two individuals, we randomly            RBP         —      1.00     0.46        0.74      0.73
select which parent provides the next action. If the action         MCTS       0.54    1.00      —          0.30      0.61
is not applicable, it is replaced by a random feasible action.      RHEA       0.26    0.94     0.40         —        0.53
Resulting individuals are reevaluated and added to the next         OSLA       0.00     —       0.00        0.00      0.00
population. RHEA keeps iterating until a maximal number
of forward model calls has been reached. Thereafter, the first    Table 1: Winning rate by row player against column agent.
action of the best-rated individual is returned.                  Players sorted, per game, by overall higher winning average.

             5   Experimental Showcase
We tested the performance of the sample agents by running
a round-robin tournament for each of the three games. We
ran 50 games per match-up between the rule-based, RHEA,
MCTS, and OSLA agents. During these 50 games, we have
randomised 25 initial game states which were played twice,
each player alternating their starting positions. The search-
based agents were configured to use a budget of 2000 forward
model calls (number of times the state is rolled forward) per        Figure 4: Logging: MCTS vs RHEA games in Kings.
selected action. For the RHEA agent, we used a population
size of 1, individuals of length 5, and a mutation rate of 0.1.
The MCTS agent was configured√to use a rollout length of          gradually reduced, although RHEA’s is always a bit higher.
3 and an exploration constant of 2. Both SFP agents skip          On the other hand, the number of actions executed per turn,
the opponent’s turn and only optimise the player’s action         although it also decreases with the game, is higher in RHEA.
sequence. OSLA, MCTS and RHEA use the Strength Differ-            This shows that RHEA’s higher winning rate is correlated
ence Heuristic to evaluate game states. Games are run for a       with a more precise action selection that maintains a larger
maximum of 30 turns, ending in a tie if no winner has been        action space through the game.
declared when reaching this number.
   Table 1 summarises our results, reporting each agent’s win
rate per opponent and across all games. Results show that the            6    Opportunities and Future Work
RBC agent is very proficient in playing the game-modes            The goal of this framework is to allow research in the many
Kings (avg. win-rate = 0.92) and Healers (0.82). While            different facets of Game AI research in strategical and tac-
MCTS and RHEA agents were able to beat the OSLA agent,            tical games, either turn-based or real-time. These include
they were no match against the RBC agent. In contrast, the        games that require a complex decision-making process, from
RBP agent did perform quite well against OSLA (1.00) and          multi-unit management to resource gathering, technology
RHEA (0.74) but lost against the MCTS (0.46) agent.               trees and long-term planning. Our aim is to provide a frame-
   The good performance of both rule-based agents shows           work for i) search (showcased in this paper with SFP agents)
that there is much room for improvement in terms of the           and reinforcement learning agents; and ii) research in game
performance of search-based agents. A great starting point        and level generation, and automatic game tuning, which is
to understand their problems is to analyse the game’s com-        made possible due to the definition of rules, mechanics, units
plexity. Figure 4 shows two plots with the average size of the    and action via YAML files. The framework implemented in
action-space over time and the number of actions executed         C++ aims to provide a much required high execution speed
per turn of 50 MCTS vs RHEA games in Kings. Both agents           and interfaces for different programming languages for the
start with an average of 150 actions per move and execute         implementation of agents and generators.
between 5 and 6 moves per turn. The large fluctuation of             The current state of S TRATEGA is fully functional for tac-
the action-space size can be explained with the number of         tical turn-based games and SFP agents, and provides logging
units that are still active in the agent’s turn. After the unit   capabilities to analyse game results, as shown in this paper.
count has been reduced, the size of both action spaces gets       It has been, however, developed with a road-map in mind
to incorporate extra games and logging features. Regarding        Justesen, N.; Mahlmann, T.; et al. 2017. Playing multiaction
the former, we plan to incorporate aspects of tactical role-      adversarial games: Online evolutionary planning versus tree
playing games (object pick-ups, inventories, buff/debuffs         search. IEEE Transactions on Games 10(3):281–291.
etc.), technology and cultural trees, and resource and econ-      Kowalski, J.; Mika, M.; Sutowicz, J.; and Szykuła, M. 2019.
omy management, both for turn-based and real-time games.          Regular Boardgames. Proceedings of the AAAI Conference
Regarding the logging features, the API will be enhanced          on Artificial Intelligence 33:1699–1706.
so agents can log aspects of the internal representation of
                                                                  Midjiwan AB. 2016. The Battle of Polytopia.
their decision-making process, following the example laid
out in (Volz, Ashlock, and Colton 2015). This will provide a      Ontañón, S.; Barriga, N. A.; Silva, C. R.; Moraes, R. O.; and
deeper insight into this task and also facilitate research into   Lelis, L. H. S. 2018. The first microRTS artificial intelligence
the explainability of the agent’s decision-making process.        competition. AI Magazine 39(1):75–83.
   Finally, the agent’s API and the highly customisable games     Ontanón, S.; Synnaeve, G.; et al. 2013. A survey of real-
allow tackling research on strategy games from a general          time strategy game AI research and competition in StarCraft.
game playing perspective, which is exemplified here by test-      Trans. on CI and AI in games 5(4):293–311.
ing several agents in three different games implemented           Perez-Liebana, D.; Dockhorn, A.; Hurtado-Grueso, J.; and
within the framework. Our intention is to propose this plat-      Jeurissen, D. 2020a. The Design Of “Stratega”: A
form as a new competition benchmark in the near future.           General Strategy Games Framework.              arXiv preprint
                                                                  arXiv:2009.05643.
                  Acknowledgements                                Perez-Liebana, D.; Hsu, Y.-J.; Emmanouilidis, S.; Khaleque,
This work is supported by UK EPSRC research grant                 B.; and Gaina, R. D. 2020b. Tribes: A New Turn-Based
EP/T008962/1.                                                     Strategy Game for AI Research. In 2020 AAAI Advancement
                                                                  for the Artificial Intelligence in Digital Entertainment, 1–8.
                        References                                Perez-Liebana, D.; Liu, J.; et al. 2019. General Video Game
                                                                  AI: A Multitrack Framework for Evaluating Agents, Games,
Andersen, P.-A.; Goodwin, M.; and Granmo, O.-C. 2018.
                                                                  and Content Generation Algorithms. IEEE Transactions on
Deep RTS: a Game Environment for Deep Reinforcement
                                                                  Games 11(3):195–214.
Learning in Real-time Strategy Games. In 2018 IEEE confer-
ence on computational intelligence and games (CIG), 1–8.          Perez-Liebana, D.; Lucas, S. M.; et al. 2019. General Video
                                                                  Game Artificial Intelligence, volume 3. Morgan & Claypool
Arnold, F.; Horvat, B.; and Sacks, A. 2004. Freeciv learner: a
                                                                  Publishers. https://gaigresearch.github.io/gvgaibook/.
machine learning project utilizing genetic algorithms. Geor-
gia Institute of Technology, Atlanta.                             Perez-Liebana, D.; Samothrakis, S.; et al. 2013. Rolling
                                                                  horizon evolution versus tree search for navigation in single-
Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite-         player real-time games. In Proceedings of GECCO, 351–358.
time analysis of the multiarmed bandit problem. Machine
Learning 47(2/3):235–256.                                         Piette, E.; Soemers, D. J.; Stephenson, M.; Sironi, C. F.;
                                                                  Winands, M. H.; and Browne, C. 2019. Ludii–The Ludemic
Bellemare, M. G.; Naddaf, Y.; Veness, J.; and Bowling, M.         General Game System. arXiv preprint arXiv:1905.05013.
2013. The arcade learning environment: an evaluation plat-
form for general agents. Journal of Artificial Intelligence       Ponsen, M. J.; Lee-Urban, S.; Muñoz-Avila, H.; Aha, D. W.;
Research 47(1):253–279.                                           and Molineaux, M. 2005. Stratagus: An open-source game
                                                                  engine for research in real-time strategy games. Reasoning,
Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.;          Representation, and Learning in Computer Games 78.
Schulman, J.; et al. 2016. Openai gym.
                                                                  Prochaska, C., et al. 1996. FreeCiv. http://www.freeciv.org/.
Browne, C. B.; Powley, E.; Whitehouse, D.; et al. 2012. A
Survey of Monte Carlo Tree Search Methods. Transactions           Schaul, T. 2013. A video game description language for
on Computational Intelligence and AI in games 4(1):1–43.          model-based or interactive learning. In 2013 IEEE Confer-
                                                                  ence on Computational Inteligence in Games (CIG), 1–8.
Buro, M. 2003. Real-time Strategy Games: A new AI Re-
                                                                  Team, B. D. 2020. The Brood War API (BWAPI) 4.2.0.
search Challenge. In IJCAI, volume 2003, 1534–1535.
                                                                  https://github.com/bwapi/bwapi.
Firaxis. 1995 – 2020. Civilization.
                                                                  Tian, Y.; Gong, Q.; Shang, W.; Wu, Y.; and Zitnick, C. L.
Genesereth, M.; Love, N.; and Pell, B. 2005. General Game         2017. ELF: An Extensive, Lightweight and Flexible Research
Playing: Overview of the AAAI Competition. AI magazine            Platform for Real-time Strategy Games. In Advances in
26(2):62–62.                                                      Neural Information Processing Systems, 2659–2669.
Jones, J., and Goel, A. 2004. Hierarchical judgement compo-       Vinyals, O.; Babuschkin, I.; Chung, J.; Mathieu, M.; Jader-
sition: Revisiting the structural credit assignment problem. In   berg, M.; et al. 2019. Alphastar: Mastering the Real-time
Proceedings of the AAAI Workshop on Challenges in Game            Strategy Game Starcraft II. DeepMind blog 2.
AI, San Jose, CA, USA, 67–71.                                     Volz, V.; Ashlock, D.; and Colton, S. 2015. 4.18 Gameplay
Justesen, N.; Uth, L. M.; Jakobsen, C.; et al. 2019. Blood        Evaluation Measures. Dagstuhl Seminar on AI and CI in
Bowl: A new Board Game Challenge and Competition for              Games: AI-Driven Game Design 122.
AI. In 2019 Conference on Games, 1–8. IEEE.

</pre>