=Paper=
{{Paper
|id=Vol-2862/paper30
|storemode=property
|title=STRATEGA: A General Strategy Games Framework
|pdfUrl=https://ceur-ws.org/Vol-2862/paper30.pdf
|volume=Vol-2862
|authors=Alexander Dockhorn,Jorge Hurtado-Grueso,Dominik Jeurissen,Diego Perez-Liebana
|dblpUrl=https://dblp.org/rec/conf/aiide/DockhornGJL20
}}
==STRATEGA: A General Strategy Games Framework==
S TRATEGA - A General Strategy Games Framework
Alexander Dockhorn, Jorge Hurtado-Grueso, Dominik Jeurissen, Diego Perez-Liebana
School of Electronic Engineering and Computer Science
Queen Mary University of London, UK
Abstract one game to another, we believe the time is right to introduce
general AI into this domain. The objective of this paper is to
Strategy games are complex environments often used in AI- present a new framework for general AI research in strategy
research to evaluate new algorithms. Despite the common-
games, which tackles the fundamental problems of this type
alities of most strategy games, often research is focused on
one game only, which may lead to bias or overfitting to a par- of domains without focusing on an individual game at a time:
ticular environment. In this paper, we motivate and present resource management, decision making under uncertainty,
S TRATEGA - a general strategy games framework for playing spatial and temporal reasoning, competition and collabora-
n-player turn-based and real-time strategy games. The plat- tion in multiple-player settings, partial observability, large
form currently implements turn-based games, which can be action spaces and opponent modelling, among others.
configured via YAML-files. It exposes an API with access to a In a similar way to General Game Playing (GGP) (Gene-
forward model to facilitate research on statistical forward plan- sereth, Love, and Pell 2005) and General Video Game Play-
ning agents. The framework and agents can log information ing (Perez-Liebana, Lucas, and others 2019), this paper pro-
during games for analysing and debugging algorithms. We poses a general multi-agent, multi-action strategy games
also present some sample rule-based agents, as well as search-
based agents like Monte Carlo Tree Search and Rolling Hori-
framework for AI research. Section 3 presents the vision for
zon Evolution, and quantitatively analyse their performance this framework, while Section 4 describes the current imple-
to demonstrate the use of the framework. Results, although mentation. Our main interest when proposing this framework
purely illustrative, show the known problems that traditional is to foster research into the complexity of the action decision
search-based agents have when dealing with high branching process without a dependency on a concrete strategy game.
factors in these games. S TRATEGA can be downloaded at: The following list summarises the main characteristics of this
https://github.com/GAIGResearch/Stratega platform:
• Games and levels are defined via text files in YAML format.
1 Introduction These files include definitions for games (rules, duration,
winning conditions, terrain types and effects), units (skills,
Since Michael Buro motivated AI research in strategy actions per turn, movement and combat features) and their
games (Buro 2003), multiple games and frameworks have actions (strength, range and targets).
been proposed and used by investigators in the field. These
games, although different in themes, rules and goals, share • S TRATEGA incorporates a Forward Model (FM) that per-
certain characteristics that make them interesting for Game mits rolling any game state forward by supplying an action
AI research. Most of the work done in this area pertains to during the agent’s thinking time. The FM is used by the
the sub-genre of real-time strategy games (and, in particular, Statistical Forward Planning (SFP) within the framework:
to Starcraft (Ontanón, Synnaeve, and others 2013)), but it One Step Look Ahead, Monte Carlo Tree Search (MCTS)
is possible to find abundant research in other real-time and and Rolling Horizon Evolutionary Algorithm (RHEA).
turn-based strategy games in the literature (see Section 2). • A common API for all agents that provides access to the
The complexity of these problems is often addressed by FM and a copy of the current game state, which supplies
incorporating game domain knowledge into the agents, ei- information about the state of the game and entities. This
ther by providing the AI with programmed game specific copy of the state can be modified by the agent, so what-if
information or training it with game replays. Despite the con- scenarios can be built for planning. This is particularly
tributions made this way can be significant, and in some cases relevant to tackle partial observability in RTS games.
it is possible to transfer algorithms and architectures from
• The framework includes functionality for profiling the
Copyright c 2020 for this paper by its authors. Use permitted under agent’s decision-making process, in particular regarding
Creative Commons License Attribution 4.0 International (CC BY FM usage. This facilitates analysis of the impact, in the
4.0). execution of the game, of methods that use these models
(such as the ones included in the benchmark) by providing The most complete turn-based strategy games framework
the footprint in time and memory of FM usage. to date is arguably Freeciv (Prochaska and others 1996), in-
• Functionality for game and agent logging, in order to un- spired by Sid Meier’s Civilization series (Firaxis 1995 2020).
derstand the complexity of the action-decision process It incorporates most of the complexities and dynamics of
faced by the agents and easily analyse experimental results. the original game, allowing the interactions between poten-
This information includes data at the game end (outcome tially hundreds of players. Due to its complexity, most re-
and duration), turn (score, leading player, state size, actions searchers have used it to tackle certain aspects of strategy
executed) and action (action space) levels. games, like level exploration and city placement (Jones and
Goel 2004) (Arnold, Horvat, and Sacks 2004).
The aim of this framework is not only to provide a general Regarding real-time strategy games, microRTS (Ontañón
benchmark for research on game playing AI performance on et al. 2018) is a framework and competition developed to
strategy games, but also to shed some light on how decisions foster research in this genre, which generally has a high entry
are made and the complexity of the games. A common API threshold due to the complexity of the game to be learned
for games and agents allows to build new scenarios and to (e.g. Starcraft using BWAPI(Team 2020)). In comparison
compare different AI approaches in them. In fact, the second to other frameworks, players can issue commands at the
contribution of this paper is to showcase the use of the current same time and each action takes a fixed time to complete.
framework. To this end, Section 5 presents baseline results for The framework implements various unit and building types
the agents and games already implemented in this benchmark. that act on the player’s command. The framework supports
both fully and partially observable states. Recently, AlphaS-
2 Related Work tar (Vinyals et al. 2019) shows the great proficiency of deep
S TRATEGA incorporates many features of well-known strat- supervised and reinforcement learning (RL) methods in Star-
egy games and GGP frameworks, described in this section. craft II. With the use of an important amount of computational
resources, their system is able to beat professional human
2.1 Strategy Games players consistently by learning first from human replays and
There is a relevant proliferation of multi-action and multi-unit then training multiple versions of their agent in the so-called
games in the literature of games research in the last couple of AlphaStar League. This example show that even for complex
decades, ranging from situational and tactical environments RTS games it is possible to develop agents of high proficiency
to fully-fledged strategy games. An example of the former which so far remain limited to play a single game.
is HeroAIcademy (Justesen, Mahlmann, and others 2017),
where the authors present a turn-based game where each 2.2 General Game Playing (GGP)
player controls a series of different units in a small board.
Each turn, players distribute 5 action points across these units As previously mentioned, the goal of S TRATEGA is to support
with the objective of destroying the opponent’s base. The research on general strategy game playing, which forms an
authors used this framework to introduce Online Evolution- interesting sub-domain of general game playing. GGP has al-
ary Planning, outperforming tree search methods with more ready been supported by numerous frameworks that focus on
efficient management of the game’s large branching factor. its different aspects. The GGP framework (Genesereth, Love,
Later on, (Justesen et al. 2019) introduced the Fantasy and Pell 2005) was introduced to study general board-games
Football AI (FFAI) framework and its accompanying Bot and its game description language motivated the development
Bowl competition. This is a fully-observable, stochastic, turn- of the video game description language (Schaul 2013) and
based game with a grid-based game board. Due to a large its accompanying General Video Game AI (GVGAI) (Perez-
number of actions per unit and the possibility to move each Liebana, Liu, and others 2019) framework. GVGAI focuses
unit several times per turn, the branching factor is enormous, on 2D tile-based arcade-like video games and supports a
reportedly the largest in the literature of turn-based board small number of 2D physics-based games. In a similar fash-
games. Its gym interface provides access to several environ- ion, the Arcade Learning Environment (ALE) (Bellemare
ments each offering a vector-based state observation. While et al. 2013) provides access to Atari 2600 games, offering
those environments differ in the size of the game-board, the multiple ways in which game states can be perceived and is
rules of the underlying game cannot be adjusted. tightly interconnected with Open AI Gym (Brockman et al.
Recently, (Perez-Liebana et al. 2020b) provided an open- 2016).
source implementation of the multi-player turn-based strategy Different styles of defining games have been presented by
and award-winning game The Battle of Polytopia (Midjiwan the Ludii (Piette et al. 2019) and the Regular Boardgames
AB 2016). In this game, players need to deal with resource (RBG) (Kowalski et al. 2019) frameworks. While the former
management and production, technology trees, terrain types, uses high-level game-related concepts (ludemes) for game
partial observability and control multiple units of different definitions, the latter uses regular expressions. Both permit
types. The action space is very large, with averages of more defining turn-based games, but currently seem to lack meth-
than 50 possible actions per move, and an estimated branch- ods for implementing real-time games. Finally, (Tian et al.
ing factor per state of 1015 . The framework includes support 2017) propose the Extensive, Lightweight and Flexible (ELF)
for SFP agents, including MCTS and RHEA, which in base- platform that allows the execution of Atari, Board and Real-
line experiments seem to be at a similar level to rule-based time Strategy games. In particular, ELF incorporates Mini-
agents, but inferior to a human level of play. RTS, a fast RTS game with a similar scope to microRTS.
2.3 General Strategy Games
Some initial attempts have been made to provide platforms
to host multiple RTS games. In one of the most recent
works, (Andersen, Goodwin, and Granmo 2018) signified
the need for an RTS framework that can adjust the com-
plexity of its games and presented Deep RTS. This platform
focused on providing a research benchmark for (deep) Rein- Figure 1: Overall structure of the framework.
forcement Learning methods, supported games of different
complexity, ranging from low (such as those in microRTS) to
high (as Starcraft II). A similar but more flexible framework
is Stratagus (Ponsen et al. 2005), a platform that shares some
characteristics with our proposal. Different strategy games
can be configured via text files and LUA scripts can be used
to modify some game dynamics. Some statistics are also
gathered for all games, such as units killed and lost, and a
common API is provided for agents.
Our general strategy games platform goes beyond these
proposals in a two-fold manner. First, from the perspective
of the agents, we provide forward model functionality to
enable the use of statistical forward planning agents. Sec-
ondly, from the games perspective, our platform provides
higher customisation of the game mechanics, allowing the
specification of game goals, terrain features, unit and action
types, complemented with agent and game logging function-
ality. Furthermore, the S TRATEGA framework makes use of Figure 2: Exemplary game state of S TRATEGA and its GUI.
higher level concepts to ease development and customisation
of strategy games. While GPP frameworks may be able to
produce strategy games of similar complexity, they can re- files. The excerpt of the Action YAML-file shown in Figure 3
quire extensive effort to encode the games we are looking shows the definition of an Attack action. Several properties
at. can be configured and even new properties can be added to
the framework. In the example, the attack action has a range
3 Platform for General Strategy Games of 6 tiles, damage of 10 (that can affect friendly units) and
establishes if and how much score the attacker and attacked
S TRATEGA currently implements n-player turn-based strat- players receive when the action is executed.
egy games, where games use a 2D-tile representation of level The unit definition in the Units YAML-file shown in Fig-
and units (Perez-Liebana et al. 2020a). During a single turn, ure 3 follows a similar pattern, allowing for hierarchically
a player can issue one or more actions to each unit. While structured unit types. In our example, the LongRangeUnit is
the standard game-play lets all players fight until all but one an extension of the BasicUnit type, inheriting the properties
has lost all its units, the game’s rules can be modified to im- from its base. The entry defines basic properties for the unit:
plement custom winning conditions. At the game start, every range of vision, movement, base attack damage, health and
player receives a set of units, which can be moved along the also the Path to its graphical asset. The actions available for
grid and attack other units to reduce their health. Further- this unit, as defined in the units YAML file, are indicated
more, units can get assigned special abilities and differ in under the Actions heading. Two more attributes indicate the
their range, health points, damage and other variables. number of actions that the unit can execute during a turn
The framework is written in C++. It can run headless or (NumberActionExecutePerTurn) and if they can be executed
with a graphical user interface (GUI) that provides informa- more than once. Finally, CanBeMoreThanOne determines if
tion on the current game state, run-time statistics, and allows this unit can be instantiated multiple times or is unique.
human players to play the game. The GUI, the game engine The Configuration YAML-file defines how a specific in-
and game playing agents are separated in multiple threads stance of a game should be created and played. The game
to maximise the framework’s efficiency and the developer’s rules section defines how many turns a game can have and
control over the agents. Figure 2 shows a screenshot of the if players or agents have a limited time budget to execute
current state of the framework. Isometric assets are included their turns. Game modifiers include (but are not limited to)
in the platform to depict different types of units and terrains, turning on/off the fog of war or changing winning conditions.
which can also be assigned via YAML configuration files. This eases the customisation by quickly modifying the game
Figure 1 shows the overall structure of the framework. without changing and compiling its code, allowing to reuse
the units and actions in different games. The Configuration
3.1 Creating Games YAML-file also specifies the list of N ≥ 2 players and the
The definition of all game components such as units, abili- level to play in. This level is formed by the initial distribution
ties, game modifiers, levels and tiles is done through YAML- of the tiles and the definition of each terrain type. Figure 3
Actions: receives a copy of the game state, which can be modified (i.e.
- Attack: to incorporate game objects into the non-visible part of the
Value: 10 level due to fog of war) and rolled forward supplying actions.
Range: 6 This access to an FM facilitates the implementation of SFP
GiveRewardAttacker: true
agents. Agents have also access to the properties of the game
AttackerReward: 2
state (positions of units, terrain tiles, game turn, score, etc.)
GiveRewardAttacked: false
AttackedReward: -1
and a pathfinding feature to determine shortest paths.
CanExecutedToFriends: true On each turn, the game requests an action from the player
to be executed on the game. The agent receives information
Units: about all possible actions that can be executed in the current
- BasicUnit: game state, for all the units present in the board. The agents
NumberActionExecutePerTurn: 1 can choose to return one of these actions, or a special action
CanRepeatSameAction: false
that indicates that the agent does not want to execute any
Types:
- LongRangeUnit:
more actions during the present turn. The game requests an
RangeVision: 6
action from the player as long as i) the agent does not return
RangeMovement: 4 an EndTurn action; ii) there are still actions available for the
RangeAction: 5 player; and iii) the turn budget, if specified in the YAML
AttackDamage: 70 configuration files, is not consumed.
Health: 100
Path: LongRange.png 3.3 Debugging and Logging
Actions:
- Move
One of the most important aspects of this framework is the ca-
- Attack pability of analysing and logging game states and executions.
CanBeMoreThanOne: false Figure 2 shows live debug information by means of interac-
tive floating windows. This information includes game data
Game Configuration: (current turn, number of actions executed by a player in a
Game Rules: turn, frames per second, score and leading player), profiling
TimeForEachTurn: 10
(size - in bytes - of the game state and time - in microseconds
NumberOfMaxRounds: 100
- needed to copy the game state, advance the forward model
Players:
- MCTS Player
by one action and the time taken by agents to provide a move
- RHEA Player to play) and action and unit information, which indicates
Level: > the size of the action space in the current game state and the
XXXXXXXXXXXXX accumulated action space during the present turn. The inter-
XX.........XX face also allows us to obtain more information and execute
X.....T.....X those actions from the list on the floating window, as well as
XX.........XX obtaining information from the units in the game.
XXXXXXXXXXXXX Once the game is over, a log file is also written in YAML
LevelNodes:
format, including per-turn information on decision-making
- Trap:
Character: T
time, score, action space and actions executed, number of
Walkable: true
units, player rankings and specific game information. The
Cover: false framework includes simple scripts to analyse this data and
HealthEffect: true produce logging plots as the ones shown below in Figure 4.
EffectEntry: true
KillUnit: false 4 A Turn-based Strategy Framework
This section describes the implementation of the agents and 3
Figure 3: Excerpts of YAML files that define the game. From turn-based strategy games defined currently in the platform.
top to bottom: actions, units, and general rules.
4.1 Games: Kings, Healers and Pushers
includes as an example the definition of a trap tile, which In Kings, players receive a king unit and a random set of
indicates that is walkable (units can enter the tile), does not additional units. Their task is to keep their king alive at all
offer any cover, deals 50 points of damage to the unit as soon costs while trying to defeat the opponent’s king. Similar
as it enters the tile, but does not kill the unit automatically. to chess, losing other units does not determine the end of
the game but effectively reduces the flexibility of a player.
Four types of units are defined in this game mode, archer,
3.2 Agent interface warrior, healer, and the king. While the warrior moves slowly
We provide an API for agent development and offer access to and deals high damage, the archer moves quickly, has long-
several sample agents (see Section 4.2). Agents must define a range attacks, but its damage is reduced. In addition to its
constructor for initialising the player and a method to indicate movement speed, the archer can also see further than any
an action to execute in the current game tick. This method other unit in the game. The healer can restore other units’
health points. At last, the king can only move one square Rule-based Push (RBP) Agent The Push-Agent is highly
at a time but deals the highest damage. All units can move specialised for games like Pushers. The agent’s strategy is to
and attack once in the same turn. The game is played on a push opponents into a direction that is closest to a death trap.
map with different types of terrains, each type provides a For each unit, the agent computes the shortest paths from the
different cover-percentage for reducing incoming damage. unit to the adjacent tiles of the opponent’s units. Each path
Additionally, the map contains traps, which kill a unit upon then gets assigned a score equal to the length of the path, plus
entering. The map is covered in fog-of-war, with each unit an estimate of how long it would take to kill the opponent
revealing parts of the map based on its vision radius. from the tile. Starting from the path with the lowest score,
In the game Healers, both players have access to warriors the agent checks if following the path for one turn would
and healers. The healer can move faster than the warrior but result in a position that endangers the unit. If the path is not
cannot attack. In comparison to Kings, healers and warriors dangerous it is assigned to the unit, otherwise, the agent will
have higher starting health points The twist in this game is try the next path. Once a unit was assigned a path, it will
that all units receive damage at the end of each turn. The either push the target opponent or follow the path. Once a
goal of the players is to keep their units alive, while they can unit has moved or pushed, the agent will restart the process
attack the opponent’s units. The last player with units left until no unit can act safely any more.
wins. The map contains plains and mountains, this time with
no tile providing coverage. Mountain act as a non-walkable One Step Look Ahead (OSLA) Agent The OSLA agent
obstacle and fog-of-war is disabled on this game. uses the game’s forward model to predict the upcoming state
for each of the available actions. Resulting states are rated
The game Pushers is fundamentally different from the
according to the SDH heuristic function. A high positive
way the other games are played. Only 1 unit type is available,
(resp. negative) score will be used in case the agent won
the Pusher. They cannot attack other units but can push them
(lost) the game after applying an action. Finally, the agent
one tile back once per turn, to make the other player’s units
selects the action which yields the highest score.
fall into the traps in the level. The agent’s winning condition
remains the same (survive the longest), but the game focuses Monte Carlo Tree Search (MCTS) Agent Over time,
on tactical movement instead of aggressive unit actions. many variants of MCTS have been proposed for various
problem domains (Browne et al. 2012). For the framework,
4.2 Agents we implemented a basic version of MCTS using the Upper
Confidence Bounds (UCB) (Auer, Cesa-Bianchi, and Fischer
This section describes the different agents implemented in 2002) selection criterion. The MCTS agent uses a tree node
the framework and a heuristic used to evaluate game states. structure to facilitate a search through the game’s state space.
Each node stores an associated game state, a list of available
Strength Difference Heuristic (SDH) SDH is a heuristic actions, and a pointer to one child node per action. The tree
to estimate the relative strength of a unit, which estimates is initialised by creating a root node using the provided game
it as a linear sum of the unit’s attributes (maximum health, state. During each iteration of the search, the agent first se-
attack damage, or movement range) divided by the maximum lects a node, then expands it by another child node, further
value among all available unit types. If a unit cannot execute simulates a rollout, and ends with backpropagating its value
an action, the corresponding attribute is 0. Note this heuristic on the path to the root node. A node is selected for expansion
will not change during a game, dynamic attributes like a unit’s by step-wise going down the tree until a tree node which has
current health are not considered in the strength-estimation. not been fully expanded yet has been found. Each step, the
To estimate the value of a state, we compute the difference child node with the highest UCB value is selected. The new
between the strength of the current player’s units and the child-node is generated by applying the associated action to
opponent’s units. Additionally, a unit’s strength is multiplied the selected node’s game state. During the tree policy we do
with the percentage of remaining health to encourage attacks. not consider opponent turns, instead we skip them to avoid
the non-determinism of their action selection. The new child
Rule-based Combat (RBC) Agent This agent focuses on node’s value is determined by applying random actions until
combat-oriented games like Kings or Healers. Its strategy the end of the game or a predetermined depth. Its value is
is to focus all attacks on a single enemy unit while keeping backpropagated through the visited nodes of the tree until the
its units out of danger. Every time the agent has to make a root. The search ends in case a maximum number of forward
decision, it first targets an enemy unit. It then tests for each model calls has been reached. Finally, we return the root’s
friendly unit if it can attack an opponent, heal an ally, or move child node with the highest visit count.
closer to the target. Once a valid action has been found, the
agent will execute it and repeat the process until no actions Rolling Horizon Evolutionary Algorithm (RHEA) Agent
are left. The target is chosen based on an isolation-score. A The Rolling Horizon Evolutionary Algorithm searches for
unit’s allies contribute negatively to the isolation score, while an optimal action sequence with a fixed length (the hori-
its enemies contribute positively. The contribution is equal to zon) (Perez-Liebana, Samothrakis, and others 2013). There-
the unit’s strength divided by the turns it takes for it to reach fore, it first generates a pool of candidate action sequences
the unit. To find a target, the agent searches for an enemy which is then continuously modified by an evolutionary al-
with the highest isolation score. When attacking or healing a gorithm. Each individual is created by step-wise selecting an
unit, the agent prioritises units with high-strength. action and applying it to the current game state. Afterwards,
the individual’s value is determined using a provided heuris- Agents RBC OSLA MCTS RHEA Average
tic. Similarly, to the MCTS agent, the RHEA agent skips Kings
the opponent’s turn during rollouts since they introduced RBC — 1.00 0.86 0.90 0.92
too much non-determinism in the evaluation of an action RHEA 0.10 0.98 0.60 — 0.56
sequence. At the beginning of each iteration, tournament MCTS 0.14 0.92 — 0.12 0.39
selection is applied to select the best individuals among a OSLA 0.00 — 0.02 0.00 0.01
random subset of individuals. The generated pool is modified Healers
by mutation and crossover operators. During mutation, we RBC — 0.98 0.82 0.66 0.82
iterate over an individuals action list and randomly choose RHEA 0.34 1.00 0.70 — 0.68
to replace an action with a random one. Remaining actions MCTS 0.16 0.94 — 0.26 0.45
are checked if they would still be feasible according to the OSLA 0.02 — 0.06 0.00 0.03
given game state and, if not, replaced by a random feasible Pushers
action. During crossover of two individuals, we randomly RBP — 1.00 0.46 0.74 0.73
select which parent provides the next action. If the action MCTS 0.54 1.00 — 0.30 0.61
is not applicable, it is replaced by a random feasible action. RHEA 0.26 0.94 0.40 — 0.53
Resulting individuals are reevaluated and added to the next OSLA 0.00 — 0.00 0.00 0.00
population. RHEA keeps iterating until a maximal number
of forward model calls has been reached. Thereafter, the first Table 1: Winning rate by row player against column agent.
action of the best-rated individual is returned. Players sorted, per game, by overall higher winning average.
5 Experimental Showcase
We tested the performance of the sample agents by running
a round-robin tournament for each of the three games. We
ran 50 games per match-up between the rule-based, RHEA,
MCTS, and OSLA agents. During these 50 games, we have
randomised 25 initial game states which were played twice,
each player alternating their starting positions. The search-
based agents were configured to use a budget of 2000 forward
model calls (number of times the state is rolled forward) per Figure 4: Logging: MCTS vs RHEA games in Kings.
selected action. For the RHEA agent, we used a population
size of 1, individuals of length 5, and a mutation rate of 0.1.
The MCTS agent was configured√to use a rollout length of gradually reduced, although RHEA’s is always a bit higher.
3 and an exploration constant of 2. Both SFP agents skip On the other hand, the number of actions executed per turn,
the opponent’s turn and only optimise the player’s action although it also decreases with the game, is higher in RHEA.
sequence. OSLA, MCTS and RHEA use the Strength Differ- This shows that RHEA’s higher winning rate is correlated
ence Heuristic to evaluate game states. Games are run for a with a more precise action selection that maintains a larger
maximum of 30 turns, ending in a tie if no winner has been action space through the game.
declared when reaching this number.
Table 1 summarises our results, reporting each agent’s win
rate per opponent and across all games. Results show that the 6 Opportunities and Future Work
RBC agent is very proficient in playing the game-modes The goal of this framework is to allow research in the many
Kings (avg. win-rate = 0.92) and Healers (0.82). While different facets of Game AI research in strategical and tac-
MCTS and RHEA agents were able to beat the OSLA agent, tical games, either turn-based or real-time. These include
they were no match against the RBC agent. In contrast, the games that require a complex decision-making process, from
RBP agent did perform quite well against OSLA (1.00) and multi-unit management to resource gathering, technology
RHEA (0.74) but lost against the MCTS (0.46) agent. trees and long-term planning. Our aim is to provide a frame-
The good performance of both rule-based agents shows work for i) search (showcased in this paper with SFP agents)
that there is much room for improvement in terms of the and reinforcement learning agents; and ii) research in game
performance of search-based agents. A great starting point and level generation, and automatic game tuning, which is
to understand their problems is to analyse the game’s com- made possible due to the definition of rules, mechanics, units
plexity. Figure 4 shows two plots with the average size of the and action via YAML files. The framework implemented in
action-space over time and the number of actions executed C++ aims to provide a much required high execution speed
per turn of 50 MCTS vs RHEA games in Kings. Both agents and interfaces for different programming languages for the
start with an average of 150 actions per move and execute implementation of agents and generators.
between 5 and 6 moves per turn. The large fluctuation of The current state of S TRATEGA is fully functional for tac-
the action-space size can be explained with the number of tical turn-based games and SFP agents, and provides logging
units that are still active in the agent’s turn. After the unit capabilities to analyse game results, as shown in this paper.
count has been reduced, the size of both action spaces gets It has been, however, developed with a road-map in mind
to incorporate extra games and logging features. Regarding Justesen, N.; Mahlmann, T.; et al. 2017. Playing multiaction
the former, we plan to incorporate aspects of tactical role- adversarial games: Online evolutionary planning versus tree
playing games (object pick-ups, inventories, buff/debuffs search. IEEE Transactions on Games 10(3):281–291.
etc.), technology and cultural trees, and resource and econ- Kowalski, J.; Mika, M.; Sutowicz, J.; and Szykuła, M. 2019.
omy management, both for turn-based and real-time games. Regular Boardgames. Proceedings of the AAAI Conference
Regarding the logging features, the API will be enhanced on Artificial Intelligence 33:1699–1706.
so agents can log aspects of the internal representation of
Midjiwan AB. 2016. The Battle of Polytopia.
their decision-making process, following the example laid
out in (Volz, Ashlock, and Colton 2015). This will provide a Ontañón, S.; Barriga, N. A.; Silva, C. R.; Moraes, R. O.; and
deeper insight into this task and also facilitate research into Lelis, L. H. S. 2018. The first microRTS artificial intelligence
the explainability of the agent’s decision-making process. competition. AI Magazine 39(1):75–83.
Finally, the agent’s API and the highly customisable games Ontanón, S.; Synnaeve, G.; et al. 2013. A survey of real-
allow tackling research on strategy games from a general time strategy game AI research and competition in StarCraft.
game playing perspective, which is exemplified here by test- Trans. on CI and AI in games 5(4):293–311.
ing several agents in three different games implemented Perez-Liebana, D.; Dockhorn, A.; Hurtado-Grueso, J.; and
within the framework. Our intention is to propose this plat- Jeurissen, D. 2020a. The Design Of “Stratega”: A
form as a new competition benchmark in the near future. General Strategy Games Framework. arXiv preprint
arXiv:2009.05643.
Acknowledgements Perez-Liebana, D.; Hsu, Y.-J.; Emmanouilidis, S.; Khaleque,
This work is supported by UK EPSRC research grant B.; and Gaina, R. D. 2020b. Tribes: A New Turn-Based
EP/T008962/1. Strategy Game for AI Research. In 2020 AAAI Advancement
for the Artificial Intelligence in Digital Entertainment, 1–8.
References Perez-Liebana, D.; Liu, J.; et al. 2019. General Video Game
AI: A Multitrack Framework for Evaluating Agents, Games,
Andersen, P.-A.; Goodwin, M.; and Granmo, O.-C. 2018.
and Content Generation Algorithms. IEEE Transactions on
Deep RTS: a Game Environment for Deep Reinforcement
Games 11(3):195–214.
Learning in Real-time Strategy Games. In 2018 IEEE confer-
ence on computational intelligence and games (CIG), 1–8. Perez-Liebana, D.; Lucas, S. M.; et al. 2019. General Video
Game Artificial Intelligence, volume 3. Morgan & Claypool
Arnold, F.; Horvat, B.; and Sacks, A. 2004. Freeciv learner: a
Publishers. https://gaigresearch.github.io/gvgaibook/.
machine learning project utilizing genetic algorithms. Geor-
gia Institute of Technology, Atlanta. Perez-Liebana, D.; Samothrakis, S.; et al. 2013. Rolling
horizon evolution versus tree search for navigation in single-
Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite- player real-time games. In Proceedings of GECCO, 351–358.
time analysis of the multiarmed bandit problem. Machine
Learning 47(2/3):235–256. Piette, E.; Soemers, D. J.; Stephenson, M.; Sironi, C. F.;
Winands, M. H.; and Browne, C. 2019. Ludii–The Ludemic
Bellemare, M. G.; Naddaf, Y.; Veness, J.; and Bowling, M. General Game System. arXiv preprint arXiv:1905.05013.
2013. The arcade learning environment: an evaluation plat-
form for general agents. Journal of Artificial Intelligence Ponsen, M. J.; Lee-Urban, S.; Muñoz-Avila, H.; Aha, D. W.;
Research 47(1):253–279. and Molineaux, M. 2005. Stratagus: An open-source game
engine for research in real-time strategy games. Reasoning,
Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Representation, and Learning in Computer Games 78.
Schulman, J.; et al. 2016. Openai gym.
Prochaska, C., et al. 1996. FreeCiv. http://www.freeciv.org/.
Browne, C. B.; Powley, E.; Whitehouse, D.; et al. 2012. A
Survey of Monte Carlo Tree Search Methods. Transactions Schaul, T. 2013. A video game description language for
on Computational Intelligence and AI in games 4(1):1–43. model-based or interactive learning. In 2013 IEEE Confer-
ence on Computational Inteligence in Games (CIG), 1–8.
Buro, M. 2003. Real-time Strategy Games: A new AI Re-
Team, B. D. 2020. The Brood War API (BWAPI) 4.2.0.
search Challenge. In IJCAI, volume 2003, 1534–1535.
https://github.com/bwapi/bwapi.
Firaxis. 1995 – 2020. Civilization.
Tian, Y.; Gong, Q.; Shang, W.; Wu, Y.; and Zitnick, C. L.
Genesereth, M.; Love, N.; and Pell, B. 2005. General Game 2017. ELF: An Extensive, Lightweight and Flexible Research
Playing: Overview of the AAAI Competition. AI magazine Platform for Real-time Strategy Games. In Advances in
26(2):62–62. Neural Information Processing Systems, 2659–2669.
Jones, J., and Goel, A. 2004. Hierarchical judgement compo- Vinyals, O.; Babuschkin, I.; Chung, J.; Mathieu, M.; Jader-
sition: Revisiting the structural credit assignment problem. In berg, M.; et al. 2019. Alphastar: Mastering the Real-time
Proceedings of the AAAI Workshop on Challenges in Game Strategy Game Starcraft II. DeepMind blog 2.
AI, San Jose, CA, USA, 67–71. Volz, V.; Ashlock, D.; and Colton, S. 2015. 4.18 Gameplay
Justesen, N.; Uth, L. M.; Jakobsen, C.; et al. 2019. Blood Evaluation Measures. Dagstuhl Seminar on AI and CI in
Bowl: A new Board Game Challenge and Competition for Games: AI-Driven Game Design 122.
AI. In 2019 Conference on Games, 1–8. IEEE.