<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>General Strategy Games Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Dockhorn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jorge Hurtado-Grueso</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dominik Jeurissen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Perez-Liebana</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Electronic Engineering and Computer Science Queen Mary University of London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Strategy games are complex environments often used in AIresearch to evaluate new algorithms. Despite the commonalities of most strategy games, often research is focused on one game only, which may lead to bias or overfitting to a particular environment. In this paper, we motivate and present STRATEGA - a general strategy games framework for playing n-player turn-based and real-time strategy games. The platform currently implements turn-based games, which can be configured via YAML-files. It exposes an API with access to a forward model to facilitate research on statistical forward planning agents. The framework and agents can log information during games for analysing and debugging algorithms. We also present some sample rule-based agents, as well as searchbased agents like Monte Carlo Tree Search and Rolling Horizon Evolution, and quantitatively analyse their performance to demonstrate the use of the framework. Results, although purely illustrative, show the known problems that traditional search-based agents have when dealing with high branching factors in these games. STRATEGA can be downloaded at: https://github.com/GAIGResearch/Stratega</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Since Michael Buro motivated AI research in strategy
games
        <xref ref-type="bibr" rid="ref8">(Buro 2003)</xref>
        , multiple games and frameworks have
been proposed and used by investigators in the field. These
games, although different in themes, rules and goals, share
certain characteristics that make them interesting for Game
AI research. Most of the work done in this area pertains to
the sub-genre of real-time strategy games (and, in particular,
to Starcraft
        <xref ref-type="bibr" rid="ref18">(Ontan o´n, Synnaeve, and others 2013)</xref>
        ), but it
is possible to find abundant research in other real-time and
turn-based strategy games in the literature (see Section 2).
      </p>
      <p>The complexity of these problems is often addressed by
incorporating game domain knowledge into the agents,
either by providing the AI with programmed game specific
information or training it with game replays. Despite the
contributions made this way can be significant, and in some cases
it is possible to transfer algorithms and architectures from
one game to another, we believe the time is right to introduce
general AI into this domain. The objective of this paper is to
present a new framework for general AI research in strategy
games, which tackles the fundamental problems of this type
of domains without focusing on an individual game at a time:
resource management, decision making under uncertainty,
spatial and temporal reasoning, competition and
collaboration in multiple-player settings, partial observability, large
action spaces and opponent modelling, among others.</p>
      <p>
        In a similar way to General Game Playing (GGP)
        <xref ref-type="bibr" rid="ref10">(Genesereth, Love, and Pell 2005)</xref>
        and General Video Game
Playing
        <xref ref-type="bibr" rid="ref22 ref23">(Perez-Liebana, Lucas, and others 2019)</xref>
        , this paper
proposes a general multi-agent, multi-action strategy games
framework for AI research. Section 3 presents the vision for
this framework, while Section 4 describes the current
implementation. Our main interest when proposing this framework
is to foster research into the complexity of the action decision
process without a dependency on a concrete strategy game.
The following list summarises the main characteristics of this
platform:
      </p>
      <p>Games and levels are defined via text files in YAML format.
These files include definitions for games (rules, duration,
winning conditions, terrain types and effects), units (skills,
actions per turn, movement and combat features) and their
actions (strength, range and targets).</p>
      <p>STRATEGA incorporates a Forward Model (FM) that
permits rolling any game state forward by supplying an action
during the agent’s thinking time. The FM is used by the
Statistical Forward Planning (SFP) within the framework:
One Step Look Ahead, Monte Carlo Tree Search (MCTS)
and Rolling Horizon Evolutionary Algorithm (RHEA).
A common API for all agents that provides access to the
FM and a copy of the current game state, which supplies
information about the state of the game and entities. This
copy of the state can be modified by the agent, so what-if
scenarios can be built for planning. This is particularly
relevant to tackle partial observability in RTS games.
The framework includes functionality for profiling the
agent’s decision-making process, in particular regarding
FM usage. This facilitates analysis of the impact, in the
execution of the game, of methods that use these models
(such as the ones included in the benchmark) by providing
the footprint in time and memory of FM usage.</p>
      <p>Functionality for game and agent logging, in order to
understand the complexity of the action-decision process
faced by the agents and easily analyse experimental results.
This information includes data at the game end (outcome
and duration), turn (score, leading player, state size, actions
executed) and action (action space) levels.</p>
      <p>The aim of this framework is not only to provide a general
benchmark for research on game playing AI performance on
strategy games, but also to shed some light on how decisions
are made and the complexity of the games. A common API
for games and agents allows to build new scenarios and to
compare different AI approaches in them. In fact, the second
contribution of this paper is to showcase the use of the current
framework. To this end, Section 5 presents baseline results for
the agents and games already implemented in this benchmark.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>STRATEGA incorporates many features of well-known
strategy games and GGP frameworks, described in this section.
2.1</p>
      <sec id="sec-2-1">
        <title>Strategy Games</title>
        <p>
          There is a relevant proliferation of multi-action and multi-unit
games in the literature of games research in the last couple of
decades, ranging from situational and tactical environments
to fully-fledged strategy games. An example of the former
is HeroAIcademy
          <xref ref-type="bibr" rid="ref13 ref30">(Justesen, Mahlmann, and others 2017)</xref>
          ,
where the authors present a turn-based game where each
player controls a series of different units in a small board.
Each turn, players distribute 5 action points across these units
with the objective of destroying the opponent’s base. The
authors used this framework to introduce Online
Evolutionary Planning, outperforming tree search methods with more
efficient management of the game’s large branching factor.
        </p>
        <p>
          Later on,
          <xref ref-type="bibr" rid="ref12">(Justesen et al. 2019)</xref>
          introduced the Fantasy
Football AI (FFAI) framework and its accompanying Bot
Bowl competition. This is a fully-observable, stochastic,
turnbased game with a grid-based game board. Due to a large
number of actions per unit and the possibility to move each
unit several times per turn, the branching factor is enormous,
reportedly the largest in the literature of turn-based board
games. Its gym interface provides access to several
environments each offering a vector-based state observation. While
those environments differ in the size of the game-board, the
rules of the underlying game cannot be adjusted.
        </p>
        <p>
          Recently,
          <xref ref-type="bibr" rid="ref20 ref21 ref29">(Perez-Liebana et al. 2020b)</xref>
          provided an
opensource implementation of the multi-player turn-based strategy
and award-winning game The Battle of Polytopia
          <xref ref-type="bibr" rid="ref16">(Midjiwan
AB 2016)</xref>
          . In this game, players need to deal with resource
management and production, technology trees, terrain types,
partial observability and control multiple units of different
types. The action space is very large, with averages of more
than 50 possible actions per move, and an estimated
branching factor per state of 1015. The framework includes support
for SFP agents, including MCTS and RHEA, which in
baseline experiments seem to be at a similar level to rule-based
agents, but inferior to a human level of play.
        </p>
        <p>
          The most complete turn-based strategy games framework
to date is arguably Freeciv
          <xref ref-type="bibr" rid="ref27">(Prochaska and others 1996)</xref>
          ,
inspired by Sid Meier’s Civilization series
          <xref ref-type="bibr" rid="ref9">(Firaxis 1995 2020)</xref>
          .
It incorporates most of the complexities and dynamics of
the original game, allowing the interactions between
potentially hundreds of players. Due to its complexity, most
researchers have used it to tackle certain aspects of strategy
games, like level exploration and city placement
          <xref ref-type="bibr" rid="ref11">(Jones and
Goel 2004)</xref>
          <xref ref-type="bibr" rid="ref11 ref3">(Arnold, Horvat, and Sacks 2004)</xref>
          .
        </p>
        <p>
          Regarding real-time strategy games, microRTS
          <xref ref-type="bibr" rid="ref17">(Ontan˜o´n
et al. 2018)</xref>
          is a framework and competition developed to
foster research in this genre, which generally has a high entry
threshold due to the complexity of the game to be learned
(e.g. Starcraft using BWAPI
          <xref ref-type="bibr" rid="ref29">(Team 2020)</xref>
          ). In comparison
to other frameworks, players can issue commands at the
same time and each action takes a fixed time to complete.
The framework implements various unit and building types
that act on the player’s command. The framework supports
both fully and partially observable states. Recently,
AlphaStar
          <xref ref-type="bibr" rid="ref31">(Vinyals et al. 2019)</xref>
          shows the great proficiency of deep
supervised and reinforcement learning (RL) methods in
Starcraft II. With the use of an important amount of computational
resources, their system is able to beat professional human
players consistently by learning first from human replays and
then training multiple versions of their agent in the so-called
AlphaStar League. This example show that even for complex
RTS games it is possible to develop agents of high proficiency
which so far remain limited to play a single game.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>General Game Playing (GGP)</title>
        <p>
          As previously mentioned, the goal of STRATEGA is to support
research on general strategy game playing, which forms an
interesting sub-domain of general game playing. GGP has
already been supported by numerous frameworks that focus on
its different aspects. The GGP framework
          <xref ref-type="bibr" rid="ref10">(Genesereth, Love,
and Pell 2005)</xref>
          was introduced to study general board-games
and its game description language motivated the development
of the video game description language
          <xref ref-type="bibr" rid="ref28">(Schaul 2013)</xref>
          and
its accompanying General Video Game AI (GVGAI)
          <xref ref-type="bibr" rid="ref22">(PerezLiebana, Liu, and others 2019)</xref>
          framework. GVGAI focuses
on 2D tile-based arcade-like video games and supports a
small number of 2D physics-based games. In a similar
fashion, the Arcade Learning Environment (ALE) (Bellemare
et al. 2013) provides access to Atari 2600 games, offering
multiple ways in which game states can be perceived and is
tightly interconnected with Open AI Gym
          <xref ref-type="bibr" rid="ref6">(Brockman et al.
2016)</xref>
          .
        </p>
        <p>
          Different styles of defining games have been presented by
the Ludii
          <xref ref-type="bibr" rid="ref25">(Piette et al. 2019)</xref>
          and the Regular Boardgames
(RBG)
          <xref ref-type="bibr" rid="ref14">(Kowalski et al. 2019)</xref>
          frameworks. While the former
uses high-level game-related concepts (ludemes) for game
definitions, the latter uses regular expressions. Both permit
defining turn-based games, but currently seem to lack
methods for implementing real-time games. Finally, (Tian et al.
2017) propose the Extensive, Lightweight and Flexible (ELF)
platform that allows the execution of Atari, Board and
Realtime Strategy games. In particular, ELF incorporates
MiniRTS, a fast RTS game with a similar scope to microRTS.
Some initial attempts have been made to provide platforms
to host multiple RTS games. In one of the most recent
works,
          <xref ref-type="bibr" rid="ref1">(Andersen, Goodwin, and Granmo 2018)</xref>
          signified
the need for an RTS framework that can adjust the
complexity of its games and presented Deep RTS. This platform
focused on providing a research benchmark for (deep)
Reinforcement Learning methods, supported games of different
complexity, ranging from low (such as those in microRTS) to
high (as Starcraft II). A similar but more flexible framework
is Stratagus
          <xref ref-type="bibr" rid="ref26">(Ponsen et al. 2005)</xref>
          , a platform that shares some
characteristics with our proposal. Different strategy games
can be configured via text files and LUA scripts can be used
to modify some game dynamics. Some statistics are also
gathered for all games, such as units killed and lost, and a
common API is provided for agents.
        </p>
        <p>Our general strategy games platform goes beyond these
proposals in a two-fold manner. First, from the perspective
of the agents, we provide forward model functionality to
enable the use of statistical forward planning agents.
Secondly, from the games perspective, our platform provides
higher customisation of the game mechanics, allowing the
specification of game goals, terrain features, unit and action
types, complemented with agent and game logging
functionality. Furthermore, the STRATEGA framework makes use of
higher level concepts to ease development and customisation
of strategy games. While GPP frameworks may be able to
produce strategy games of similar complexity, they can
require extensive effort to encode the games we are looking
at.</p>
        <p>3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Platform for General Strategy Games</title>
      <p>
        STRATEGA currently implements n-player turn-based
strategy games, where games use a 2D-tile representation of level
and units
        <xref ref-type="bibr" rid="ref20 ref21">(Perez-Liebana et al. 2020a)</xref>
        . During a single turn,
a player can issue one or more actions to each unit. While
the standard game-play lets all players fight until all but one
has lost all its units, the game’s rules can be modified to
implement custom winning conditions. At the game start, every
player receives a set of units, which can be moved along the
grid and attack other units to reduce their health.
Furthermore, units can get assigned special abilities and differ in
their range, health points, damage and other variables.
      </p>
      <p>The framework is written in C++. It can run headless or
with a graphical user interface (GUI) that provides
information on the current game state, run-time statistics, and allows
human players to play the game. The GUI, the game engine
and game playing agents are separated in multiple threads
to maximise the framework’s efficiency and the developer’s
control over the agents. Figure 2 shows a screenshot of the
current state of the framework. Isometric assets are included
in the platform to depict different types of units and terrains,
which can also be assigned via YAML configuration files.
Figure 1 shows the overall structure of the framework.
3.1</p>
      <sec id="sec-3-1">
        <title>Creating Games</title>
        <p>The definition of all game components such as units,
abilities, game modifiers, levels and tiles is done through
YAMLfiles. The excerpt of the Action YAML-file shown in Figure 3
shows the definition of an Attack action. Several properties
can be configured and even new properties can be added to
the framework. In the example, the attack action has a range
of 6 tiles, damage of 10 (that can affect friendly units) and
establishes if and how much score the attacker and attacked
players receive when the action is executed.</p>
        <p>The unit definition in the Units YAML-file shown in
Figure 3 follows a similar pattern, allowing for hierarchically
structured unit types. In our example, the LongRangeUnit is
an extension of the BasicUnit type, inheriting the properties
from its base. The entry defines basic properties for the unit:
range of vision, movement, base attack damage, health and
also the Path to its graphical asset. The actions available for
this unit, as defined in the units YAML file, are indicated
under the Actions heading. Two more attributes indicate the
number of actions that the unit can execute during a turn
(NumberActionExecutePerTurn) and if they can be executed
more than once. Finally, CanBeMoreThanOne determines if
this unit can be instantiated multiple times or is unique.</p>
        <p>The Configuration YAML-file defines how a specific
instance of a game should be created and played. The game
rules section defines how many turns a game can have and
if players or agents have a limited time budget to execute
their turns. Game modifiers include (but are not limited to)
turning on/off the fog of war or changing winning conditions.
This eases the customisation by quickly modifying the game
without changing and compiling its code, allowing to reuse
the units and actions in different games. The Configuration
YAML-file also specifies the list of N 2 players and the
level to play in. This level is formed by the initial distribution
of the tiles and the definition of each terrain type. Figure 3
includes as an example the definition of a trap tile, which
indicates that is walkable (units can enter the tile), does not
offer any cover, deals 50 points of damage to the unit as soon
as it enters the tile, but does not kill the unit automatically.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Agent interface</title>
        <p>We provide an API for agent development and offer access to
several sample agents (see Section 4.2). Agents must define a
constructor for initialising the player and a method to indicate
an action to execute in the current game tick. This method
receives a copy of the game state, which can be modified (i.e.
to incorporate game objects into the non-visible part of the
level due to fog of war) and rolled forward supplying actions.
This access to an FM facilitates the implementation of SFP
agents. Agents have also access to the properties of the game
state (positions of units, terrain tiles, game turn, score, etc.)
and a pathfinding feature to determine shortest paths.</p>
        <p>On each turn, the game requests an action from the player
to be executed on the game. The agent receives information
about all possible actions that can be executed in the current
game state, for all the units present in the board. The agents
can choose to return one of these actions, or a special action
that indicates that the agent does not want to execute any
more actions during the present turn. The game requests an
action from the player as long as i) the agent does not return
an EndTurn action; ii) there are still actions available for the
player; and iii) the turn budget, if specified in the YAML
configuration files, is not consumed.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Debugging and Logging</title>
        <p>One of the most important aspects of this framework is the
capability of analysing and logging game states and executions.
Figure 2 shows live debug information by means of
interactive floating windows. This information includes game data
(current turn, number of actions executed by a player in a
turn, frames per second, score and leading player), profiling
(size - in bytes - of the game state and time - in microseconds
- needed to copy the game state, advance the forward model
by one action and the time taken by agents to provide a move
to play) and action and unit information, which indicates
the size of the action space in the current game state and the
accumulated action space during the present turn. The
interface also allows us to obtain more information and execute
those actions from the list on the floating window, as well as
obtaining information from the units in the game.</p>
        <p>Once the game is over, a log file is also written in YAML
format, including per-turn information on decision-making
time, score, action space and actions executed, number of
units, player rankings and specific game information. The
framework includes simple scripts to analyse this data and
produce logging plots as the ones shown below in Figure 4.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>A Turn-based Strategy Framework</title>
      <p>This section describes the implementation of the agents and 3
turn-based strategy games defined currently in the platform.
4.1</p>
      <sec id="sec-4-1">
        <title>Games: Kings, Healers and Pushers</title>
        <p>In Kings, players receive a king unit and a random set of
additional units. Their task is to keep their king alive at all
costs while trying to defeat the opponent’s king. Similar
to chess, losing other units does not determine the end of
the game but effectively reduces the flexibility of a player.
Four types of units are defined in this game mode, archer,
warrior, healer, and the king. While the warrior moves slowly
and deals high damage, the archer moves quickly, has
longrange attacks, but its damage is reduced. In addition to its
movement speed, the archer can also see further than any
other unit in the game. The healer can restore other units’
health points. At last, the king can only move one square
at a time but deals the highest damage. All units can move
and attack once in the same turn. The game is played on a
map with different types of terrains, each type provides a
different cover-percentage for reducing incoming damage.
Additionally, the map contains traps, which kill a unit upon
entering. The map is covered in fog-of-war, with each unit
revealing parts of the map based on its vision radius.</p>
        <p>In the game Healers, both players have access to warriors
and healers. The healer can move faster than the warrior but
cannot attack. In comparison to Kings, healers and warriors
have higher starting health points The twist in this game is
that all units receive damage at the end of each turn. The
goal of the players is to keep their units alive, while they can
attack the opponent’s units. The last player with units left
wins. The map contains plains and mountains, this time with
no tile providing coverage. Mountain act as a non-walkable
obstacle and fog-of-war is disabled on this game.</p>
        <p>The game Pushers is fundamentally different from the
way the other games are played. Only 1 unit type is available,
the Pusher. They cannot attack other units but can push them
one tile back once per turn, to make the other player’s units
fall into the traps in the level. The agent’s winning condition
remains the same (survive the longest), but the game focuses
on tactical movement instead of aggressive unit actions.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Agents</title>
        <p>This section describes the different agents implemented in
the framework and a heuristic used to evaluate game states.
Strength Difference Heuristic (SDH) SDH is a heuristic
to estimate the relative strength of a unit, which estimates
it as a linear sum of the unit’s attributes (maximum health,
attack damage, or movement range) divided by the maximum
value among all available unit types. If a unit cannot execute
an action, the corresponding attribute is 0. Note this heuristic
will not change during a game, dynamic attributes like a unit’s
current health are not considered in the strength-estimation.</p>
        <p>To estimate the value of a state, we compute the difference
between the strength of the current player’s units and the
opponent’s units. Additionally, a unit’s strength is multiplied
with the percentage of remaining health to encourage attacks.
Rule-based Combat (RBC) Agent This agent focuses on
combat-oriented games like Kings or Healers. Its strategy
is to focus all attacks on a single enemy unit while keeping
its units out of danger. Every time the agent has to make a
decision, it first targets an enemy unit. It then tests for each
friendly unit if it can attack an opponent, heal an ally, or move
closer to the target. Once a valid action has been found, the
agent will execute it and repeat the process until no actions
are left. The target is chosen based on an isolation-score. A
unit’s allies contribute negatively to the isolation score, while
its enemies contribute positively. The contribution is equal to
the unit’s strength divided by the turns it takes for it to reach
the unit. To find a target, the agent searches for an enemy
with the highest isolation score. When attacking or healing a
unit, the agent prioritises units with high-strength.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Rule-based Push (RBP) Agent The Push-Agent is highly</title>
        <p>specialised for games like Pushers. The agent’s strategy is to
push opponents into a direction that is closest to a death trap.
For each unit, the agent computes the shortest paths from the
unit to the adjacent tiles of the opponent’s units. Each path
then gets assigned a score equal to the length of the path, plus
an estimate of how long it would take to kill the opponent
from the tile. Starting from the path with the lowest score,
the agent checks if following the path for one turn would
result in a position that endangers the unit. If the path is not
dangerous it is assigned to the unit, otherwise, the agent will
try the next path. Once a unit was assigned a path, it will
either push the target opponent or follow the path. Once a
unit has moved or pushed, the agent will restart the process
until no unit can act safely any more.</p>
        <p>One Step Look Ahead (OSLA) Agent The OSLA agent
uses the game’s forward model to predict the upcoming state
for each of the available actions. Resulting states are rated
according to the SDH heuristic function. A high positive
(resp. negative) score will be used in case the agent won
(lost) the game after applying an action. Finally, the agent
selects the action which yields the highest score.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Monte Carlo Tree Search (MCTS) Agent Over time,</title>
        <p>
          many variants of MCTS have been proposed for various
problem domains
          <xref ref-type="bibr" rid="ref7">(Browne et al. 2012)</xref>
          . For the framework,
we implemented a basic version of MCTS using the Upper
Confidence Bounds (UCB)
          <xref ref-type="bibr" rid="ref4">(Auer, Cesa-Bianchi, and Fischer
2002)</xref>
          selection criterion. The MCTS agent uses a tree node
structure to facilitate a search through the game’s state space.
Each node stores an associated game state, a list of available
actions, and a pointer to one child node per action. The tree
is initialised by creating a root node using the provided game
state. During each iteration of the search, the agent first
selects a node, then expands it by another child node, further
simulates a rollout, and ends with backpropagating its value
on the path to the root node. A node is selected for expansion
by step-wise going down the tree until a tree node which has
not been fully expanded yet has been found. Each step, the
child node with the highest UCB value is selected. The new
child-node is generated by applying the associated action to
the selected node’s game state. During the tree policy we do
not consider opponent turns, instead we skip them to avoid
the non-determinism of their action selection. The new child
node’s value is determined by applying random actions until
the end of the game or a predetermined depth. Its value is
backpropagated through the visited nodes of the tree until the
root. The search ends in case a maximum number of forward
model calls has been reached. Finally, we return the root’s
child node with the highest visit count.
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>Rolling Horizon Evolutionary Algorithm (RHEA) Agent</title>
        <p>
          The Rolling Horizon Evolutionary Algorithm searches for
an optimal action sequence with a fixed length (the
horizon)
          <xref ref-type="bibr" rid="ref24">(Perez-Liebana, Samothrakis, and others 2013)</xref>
          .
Therefore, it first generates a pool of candidate action sequences
which is then continuously modified by an evolutionary
algorithm. Each individual is created by step-wise selecting an
action and applying it to the current game state. Afterwards,
the individual’s value is determined using a provided
heuristic. Similarly, to the MCTS agent, the RHEA agent skips
the opponent’s turn during rollouts since they introduced
too much non-determinism in the evaluation of an action
sequence. At the beginning of each iteration, tournament
selection is applied to select the best individuals among a
random subset of individuals. The generated pool is modified
by mutation and crossover operators. During mutation, we
iterate over an individuals action list and randomly choose
to replace an action with a random one. Remaining actions
are checked if they would still be feasible according to the
given game state and, if not, replaced by a random feasible
action. During crossover of two individuals, we randomly
select which parent provides the next action. If the action
is not applicable, it is replaced by a random feasible action.
Resulting individuals are reevaluated and added to the next
population. RHEA keeps iterating until a maximal number
of forward model calls has been reached. Thereafter, the first
action of the best-rated individual is returned.
        </p>
        <p>5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Showcase</title>
      <p>We tested the performance of the sample agents by running
a round-robin tournament for each of the three games. We
ran 50 games per match-up between the rule-based, RHEA,
MCTS, and OSLA agents. During these 50 games, we have
randomised 25 initial game states which were played twice,
each player alternating their starting positions. The
searchbased agents were configured to use a budget of 2000 forward
model calls (number of times the state is rolled forward) per
selected action. For the RHEA agent, we used a population
size of 1, individuals of length 5, and a mutation rate of 0:1.
The MCTS agent was configured to use a rollout length of
3 and an exploration constant of p2. Both SFP agents skip
the opponent’s turn and only optimise the player’s action
sequence. OSLA, MCTS and RHEA use the Strength
Difference Heuristic to evaluate game states. Games are run for a
maximum of 30 turns, ending in a tie if no winner has been
declared when reaching this number.</p>
      <p>Table 1 summarises our results, reporting each agent’s win
rate per opponent and across all games. Results show that the
RBC agent is very proficient in playing the game-modes
Kings (avg. win-rate = 0:92) and Healers (0:82). While
MCTS and RHEA agents were able to beat the OSLA agent,
they were no match against the RBC agent. In contrast, the
RBP agent did perform quite well against OSLA (1:00) and
RHEA (0:74) but lost against the MCTS (0:46) agent.</p>
      <p>The good performance of both rule-based agents shows
that there is much room for improvement in terms of the
performance of search-based agents. A great starting point
to understand their problems is to analyse the game’s
complexity. Figure 4 shows two plots with the average size of the
action-space over time and the number of actions executed
per turn of 50 MCTS vs RHEA games in Kings. Both agents
start with an average of 150 actions per move and execute
between 5 and 6 moves per turn. The large fluctuation of
the action-space size can be explained with the number of
units that are still active in the agent’s turn. After the unit
count has been reduced, the size of both action spaces gets
RBC
RHEA
MCTS
OSLA
RBC
RHEA
MCTS
OSLA
RBP
MCTS
RHEA
OSLA
gradually reduced, although RHEA’s is always a bit higher.
On the other hand, the number of actions executed per turn,
although it also decreases with the game, is higher in RHEA.
This shows that RHEA’s higher winning rate is correlated
with a more precise action selection that maintains a larger
action space through the game.</p>
      <p>6</p>
    </sec>
    <sec id="sec-6">
      <title>Opportunities and Future Work</title>
      <p>The goal of this framework is to allow research in the many
different facets of Game AI research in strategical and
tactical games, either turn-based or real-time. These include
games that require a complex decision-making process, from
multi-unit management to resource gathering, technology
trees and long-term planning. Our aim is to provide a
framework for i) search (showcased in this paper with SFP agents)
and reinforcement learning agents; and ii) research in game
and level generation, and automatic game tuning, which is
made possible due to the definition of rules, mechanics, units
and action via YAML files. The framework implemented in
C++ aims to provide a much required high execution speed
and interfaces for different programming languages for the
implementation of agents and generators.</p>
      <p>
        The current state of STRATEGA is fully functional for
tactical turn-based games and SFP agents, and provides logging
capabilities to analyse game results, as shown in this paper.
It has been, however, developed with a road-map in mind
to incorporate extra games and logging features. Regarding
the former, we plan to incorporate aspects of tactical
roleplaying games (object pick-ups, inventories, buff/debuffs
etc.), technology and cultural trees, and resource and
economy management, both for turn-based and real-time games.
Regarding the logging features, the API will be enhanced
so agents can log aspects of the internal representation of
their decision-making process, following the example laid
out in
        <xref ref-type="bibr" rid="ref32">(Volz, Ashlock, and Colton 2015)</xref>
        . This will provide a
deeper insight into this task and also facilitate research into
the explainability of the agent’s decision-making process.
      </p>
      <p>Finally, the agent’s API and the highly customisable games
allow tackling research on strategy games from a general
game playing perspective, which is exemplified here by
testing several agents in three different games implemented
within the framework. Our intention is to propose this
platform as a new competition benchmark in the near future.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work is supported by UK EPSRC research grant
EP/T008962/1.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Andersen</surname>
          </string-name>
          , P.-A.;
          <string-name>
            <surname>Goodwin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Granmo</surname>
            ,
            <given-names>O.-C.</given-names>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Deep</surname>
            <given-names>RTS</given-names>
          </string-name>
          :
          <article-title>a Game Environment for Deep Reinforcement Learning in Real-time Strategy Games</article-title>
          .
          <source>In 2018 IEEE conference on computational intelligence and games (CIG)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Arnold</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Horvat</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Sacks</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>Freeciv learner: a machine learning project utilizing genetic algorithms</article-title>
          . Georgia Institute of Technology, Atlanta.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cesa-Bianchi</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2002</year>
          .
          <article-title>Finitetime analysis of the multiarmed bandit problem</article-title>
          .
          <source>Machine Learning</source>
          <volume>47</volume>
          (
          <issue>2</issue>
          /3):
          <fpage>235</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <year>2013</year>
          .
          <article-title>The arcade learning environment: an evaluation platform for general agents</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          <volume>47</volume>
          (
          <issue>1</issue>
          ):
          <fpage>253</fpage>
          -
          <lpage>279</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Brockman</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cheung</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pettersson</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schulman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; et al.
          <year>2016</year>
          .
          <article-title>Openai gym</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Browne</surname>
            ,
            <given-names>C. B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Powley</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Whitehouse</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; et al.
          <year>2012</year>
          .
          <article-title>A Survey of Monte Carlo Tree Search Methods. Transactions on Computational Intelligence and AI in games 4(1</article-title>
          ):
          <fpage>1</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Buro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2003</year>
          .
          <article-title>Real-time Strategy Games: A new AI Research Challenge</article-title>
          . In IJCAI, volume
          <year>2003</year>
          ,
          <fpage>1534</fpage>
          -
          <lpage>1535</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Firaxis.</surname>
          </string-name>
          <year>1995</year>
          -
          <fpage>2020</fpage>
          . Civilization.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Genesereth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Love</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Pell</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2005</year>
          .
          <article-title>General Game Playing: Overview of the AAAI Competition</article-title>
          .
          <source>AI</source>
          magazine
          <volume>26</volume>
          (
          <issue>2</issue>
          ):
          <fpage>62</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Goel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2004</year>
          .
          <article-title>Hierarchical judgement composition: Revisiting the structural credit assignment problem</article-title>
          .
          <source>In Proceedings of the AAAI Workshop on Challenges in Game AI</source>
          , San Jose, CA, USA,
          <fpage>67</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Justesen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Uth</surname>
            ,
            <given-names>L. M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Jakobsen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ; et al.
          <year>2019</year>
          .
          <article-title>Blood Bowl: A new Board Game Challenge and Competition for AI</article-title>
          .
          <source>In 2019 Conference on Games</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Justesen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mahlmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ; et al.
          <year>2017</year>
          .
          <article-title>Playing multiaction adversarial games: Online evolutionary planning versus tree search</article-title>
          .
          <source>IEEE Transactions on Games</source>
          <volume>10</volume>
          (
          <issue>3</issue>
          ):
          <fpage>281</fpage>
          -
          <lpage>291</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Kowalski</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Mika,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Sutowicz</surname>
          </string-name>
          , J.; and Szykuła,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Regular</given-names>
            <surname>Boardgames</surname>
          </string-name>
          .
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>33</volume>
          :
          <fpage>1699</fpage>
          -
          <lpage>1706</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Midjiwan</given-names>
            <surname>AB</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The Battle of Polytopia.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>Ontan˜o´n, S.;</article-title>
          <string-name>
            <surname>Barriga</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>C. R.</given-names>
          </string-name>
          ; Moraes,
          <string-name>
            <given-names>R. O.</given-names>
            ; and
            <surname>Lelis</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. H. S.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>The first microRTS artificial intelligence competition</article-title>
          .
          <source>AI</source>
          Magazine
          <volume>39</volume>
          (
          <issue>1</issue>
          ):
          <fpage>75</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Ontano´n</surname>
            , S.; Synnaeve,
            <given-names>G.</given-names>
          </string-name>
          ; et al.
          <year>2013</year>
          .
          <article-title>A survey of realtime strategy game AI research and competition in StarCraft.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          Trans.
          <article-title>on CI and AI in games 5(4</article-title>
          ):
          <fpage>293</fpage>
          -
          <lpage>311</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Perez-Liebana</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dockhorn</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hurtado-Grueso</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Jeurissen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2020a</year>
          .
          <article-title>The Design Of “Stratega”: A General Strategy Games Framework</article-title>
          . arXiv preprint arXiv:
          <year>2009</year>
          .05643.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Perez-Liebana</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Hsu, Y.-J.; Emmanouilidis,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Khaleque</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          ; and Gaina,
          <string-name>
            <surname>R. D.</surname>
          </string-name>
          <year>2020b</year>
          .
          <article-title>Tribes: A New Turn-Based Strategy Game for AI Research</article-title>
          .
          <source>In 2020 AAAI Advancement for the Artificial Intelligence in Digital Entertainment</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Perez-Liebana</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Liu,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ; et al.
          <year>2019</year>
          .
          <article-title>General Video Game AI: A Multitrack Framework for Evaluating Agents, Games, and Content Generation Algorithms</article-title>
          .
          <source>IEEE Transactions on Games</source>
          <volume>11</volume>
          (
          <issue>3</issue>
          ):
          <fpage>195</fpage>
          -
          <lpage>214</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Perez-Liebana</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lucas</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          ; et al.
          <source>2019. General Video Game Artificial Intelligence</source>
          , volume
          <volume>3</volume>
          . Morgan &amp; Claypool Publishers. https://gaigresearch.github.io/gvgaibook/.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Perez-Liebana</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Samothrakis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; et al.
          <year>2013</year>
          .
          <article-title>Rolling horizon evolution versus tree search for navigation in singleplayer real-time games</article-title>
          .
          <source>In Proceedings of GECCO</source>
          ,
          <fpage>351</fpage>
          -
          <lpage>358</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Piette</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Soemers</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          ; Stephenson,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Sironi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. F.</given-names>
            ;
            <surname>Winands</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            ; and
            <surname>Browne</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>Ludii-The Ludemic General Game System</article-title>
          . arXiv preprint arXiv:
          <year>1905</year>
          .05013.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Ponsen</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lee-Urban</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <article-title>Mun˜oz-</article-title>
          <string-name>
            <surname>Avila</surname>
            , H.; Aha,
            <given-names>D. W.</given-names>
          </string-name>
          ; and Molineaux,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2005</year>
          .
          <article-title>Stratagus: An open-source game engine for research in real-time strategy games</article-title>
          .
          <source>Reasoning</source>
          , Representation, and Learning in Computer Games 78.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Prochaska</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , et al.
          <year>1996</year>
          . FreeCiv. http://www.freeciv.org/.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Schaul</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>A video game description language for model-based or interactive learning</article-title>
          .
          <source>In 2013 IEEE Conference on Computational Inteligence in Games (CIG)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Team</surname>
            ,
            <given-names>B. D.</given-names>
          </string-name>
          <year>2020</year>
          .
          <article-title>The Brood War API (BWAPI) 4.2.0</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          2017.
          <article-title>ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          ,
          <volume>2659</volume>
          -
          <fpage>2669</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Babuschkin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ; Chung,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ; Mathieu,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Jaderberg</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ; et al.
          <year>2019</year>
          .
          <article-title>Alphastar: Mastering the Real-time Strategy Game Starcraft II. DeepMind blog 2</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Volz</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ashlock</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Colton</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2015</year>
          .
          <volume>4</volume>
          .18
          <string-name>
            <given-names>Gameplay</given-names>
            <surname>Evaluation</surname>
          </string-name>
          <article-title>Measures. Dagstuhl Seminar on AI and CI in Games: AI-Driven Game Design 122</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>