Strategic Features for General Games

                             Cameron Browne, Dennis J. N. J. Soemers, and Eric Piette
                                   Department of Data Science and Knowledge Engineering (DKE)
                                            Maastricht University, Bouillonstraat 8-10
                                               Maastricht, 6211 LH, The Netherlands


                             Abstract                                  We aim to model the 1,000 most influential traditional
                                                                    games throughout history, each of which may have multiple
   This short paper describes an ongoing research project that      interpretations, each of which may require hundreds of vari-
   requires the automated self-play learning and evaluation of a
                                                                    ant rule sets to be tested. This is therefore not just a mathe-
   large number of board games in digital form. We describe the
   approach we are taking to determine relevant features, for bi-   matical/computational challenge but also a logistical one, as
   asing MCTS playouts for arbitrary games played on arbitrary      L UDII may need to be able to learn to play and evaluate up
   geometries. Benefits of our approach include efficient imple-    to a million candidate rule sets over the project’s five years.
   mentation, the potential to transfer learnt knowledge to new        This paper outlines work-in-progress for achieving this
   contexts, and the potential to explain strategic knowledge em-   ambitious goal, through the self-play learning of simple
   bedded in features in human-comprehensible terms.                lightweight features that embody strategic knowledge about
                                                                    the games being modelled. The features we propose are
                                                                    geometry-independent, facilitating their transfer to new con-
                         Introduction                               texts, and represent basic strategies that may be understood
The Digital Ludeme Project1 is a five-year research project,        and explained in human-comprehensible terms.
recently launched at Maastricht University, that aims to
model the world’s traditional strategy games in a single,                         Geometric Piece Features
playable digital database. This database will be used to find       The L UDII general game system employs Monte Carlo tree
relationships between games and their component ludemes,2           search (MCTS) (Browne et al. 2012) as its core method
in order to develop a model of the evolution of games               for AI move planning, which has proven to be a supe-
throughout recorded human history and to chart their spread         rior approach for general games in the absence of domain-
across cultures worldwide. This paper describes a new ap-           specific knowledge (Finnsson and Björnsson 2010). While
proach to defining strategic features for general games that        MCTS performance can vary significantly if strictly random
we are implementing in order to achieve the project’s aims.         playouts are used (Browne 2013), the reliability of Monte
                                                                    Carlo simulations can be improved by biasing playouts with
Statement of the Problem                                            domain-relevant features.
 While there exists substantial archaeological evidence for            There is a significant amount of prior research that de-
 ancient games (i.e. those played before 500AD) the rule sets       scribes the use of local features or patterns for game-playing
 explaining how these games were actually played are typ-           agents tailored towards a specific game. Some of the most
 ically lost; our modern understanding of ancient and early         common approaches for generating features are:
 games is based primarily on modern reconstructions that            • Manually constructing features using expert knowledge.
 have proved to be less than reliable (Murray 1952).                  This was, for example, done by Gelly et al. (2006) for the
    A key challenge in this project will be to evaluate recon-        game of Go, and by Sturtevant and White (2007) for the
 structions of ancient rule sets and to improve them where            game of Hearts.
 possible. The general game system we are developing for
                                                                    • Exhaustively generating all possible patterns up to a cer-
 this project, called L UDII, must therefore be able to:
                                                                      tain restriction (often limited to, e.g., areas of 3×3 cells).
1. Model the full range of traditional strategy games.                This was, for example, done by Gelly and Silver (2007)
2. Play them at a competent human level.                              for the game of Go, and Lorentz and Zosa IV (2017) for
3. Evaluate rule set reconstructions for: a) historical authen-       the game of Breakthrough. Sturtevant and White (2007)
    ticity, and b) quality of play.                                   also exhaustively generated combinations (for instance
4. Improve rule set reconstructions where needed.                     pairs, triples, etc.) of the manually selected features for
                                                                      the game of Hearts as described above.
   1
       Funded by a e2m ERC Consolidator Grant (http://ludeme.eu).   • Supervised learning based on databases of human expert
   2
       Units of game-related information (Borvo 1977).                games. This has been particularly common in the game
  of Go (Enderton 1991; Stoutamire 1991; van der Werf et         • “In” Piece with index n in the game definition.3
  al. 2003; Bouzy and Chaslot 2005; Stern, Herbrich, and
                                                                    We also define “!” (i.e. “not”) versions of these element
  Graepel 2006; Araki et al. 2007; Coulom 2007).
                                                                 types. For example, “!P2” means that the specified location
  Other approaches include Gradual Focus (Skowronski,            does not contain a piece belonging to Player 2. Locations can
Björnsson, and Winands 2009), a genetic programming ap-         simultaneously have multiple qualifiers, e.g. a location with
proach to find useful patterns (Hoock and Teytaud 2010),         both “x” and “!I3” qualified would match an enemy piece at
and the use of deep convolutional neural networks for learn-     that location unless it has index 3 in the game definition.
ing relevant features through self-play (Silver et al. 2016).
                                                                 Topology
                                                                 Rather than tying feature topology to any particular board
                                                                 or grid type, feature elements are described by their rela-
                                                                 tive locations on the underlying graph of the game board
             1                                2                  (or more accurately its dual, which defines orthogonal cell
                                                                 adjacencies). Starting with a given anchor location and di-
                                                                 rection, the relative location of each element is defined as a
                                                                 walk through the underlying board graph. We assume that
                                                                 the orthogonal neighbours of each board location are listed
                                                                 in consecutive clockwise order, including null placeholders
 Figure 1: Bridge completion is a beneficial pattern in Hex.     for off-board steps.
                                                                    Each walk is defined as a sequence of adjacent steps
   The features we are interested in here are geometric piece    through the underlying board graph, where:
patterns of arbitrary topology. For example, Figure 1 shows      • 0 denotes a step forwards in the current direction.
a pattern of pieces that represents a beneficial move for con-   • − 31 denotes a3 counterclokwise turns in a cell with a sides
nection games played on the hexagonal grid. If Black has            (i.e. one turn in a triangle), then a step forwards,
just played move 1 to intrude into the virtual connection
of two White pieces (left), then White should reply imme-        • + 24 denotes 2a 4 clockwise turns in a cell with a sides (i.e.
diately with move 2 to complete that connection. Biasing            two turns in a square), then a step forwards, etc.
MCTS playouts to play such moves with higher probability            Figure 2 shows the relative steps through cells with dif-
was found to dramatically improve AI playing strength for        ferent numbers of sides. Fractional turns are rounded to the
connection games such as Hex and Y (Raiko and Peltonen           nearest sensible fraction for any given cell (e.g., a turn of 14
2008).                                                           in a triangle becomes equal to a turn of 13 ). The basic mech-
                                                                 anism is similar in principle to the use of turtle commands
                  Feature Definition                             in computer graphics.
Each feature x defines:                                             Note that cells with an odd number of sides, such as the
                                                                 triangular and pentagonal cells shown in Figure 2, do not
1. A pattern px which specifies one or more elements that        have a forwards (0) direction. Such cases may be handled
   must be present or absent in certain locations in game        by instantiating two pattern instances per ambiguous step:
   states for feature x to hold.                                 one with 0 replaced by − a1 and one with 0 replaced by + a1
2. An action ax which may be encouraged or discouraged in        for that step in a cell with a sides.
   game states where x holds.
                                                                                                                             0
   For most board games, actions can be sufficiently speci-
fied using one or two integers; a location to move “to”, and                                          -1      1       -1             1
                                                                                        0              5      5        6             6
in some games a location to move “from”. For example, in
games such as Hex and Go a single board location is suffi-       -1            1 -1            1
                                                                  3            3 4             4 -2               2   -1             1
cient to uniquely identify any action in any particular game                                      5               5    3             3
state. In games such as Draughts or Chess, the current loca-
tion of the piece to be moved is also required.
                                                                  Figure 2: Steps through various cells based on CW turns.
Syntax
Each pattern consists of a set of elements. We define the fol-      For example, a knight move may be described using this
lowing element types for each potential feature element, rel-    notation as {0, 0, 41 }, as shown in Figure 3 (left). This de-
ative to the current mover:                                      scription can be mapped directly to other grids of arbitrary
• “-” Off-board location (for locating edges and corners).       topology to produce plausible results, such as the semi-
                                                                 regular 3.4.6.4 tiling (right). Note that the turn of 14 is am-
• “.” Empty board location.
                                                                 biguous in the semi-regular grid, and is therefore split into
• “o” Friendly piece (i.e. of the mover’s colour).               two possible patterns.
• “x” Enemy piece (i.e. not of the mover’s colour).
                                                                      3
• “Pn” Piece belonging to player n.                                       The game definition maintains a unique index for each type of
                                                                                  +
                                                                                                                          +


          N                                   N

Figure 3: Relative knight move {0, 0, 14 } on the square grid
(left) and two equivalent moves on the semi-regular 3.4.6.4                           +
grid (right).                                                                                                             +

Local vs Global Patterns
Patterns may be defined as either:
                                                                       Figure 4: Reactive (bottom) and proactive (top) features.
• Relative: Apply to all valid anchor locations across the
  board (taking null locations into account).
• Absolute: Apply only to the specified anchor location.                For example, if a game involves no more equipment than
   For convenience, the number of valid rotations and reflec-        a board and uniform pieces in N colours, then the game state
tions for each pattern may also be specified, so that all possi-     is described by a ChunkSet subdivided into chunks of B
ble instances of a given pattern may be generated across the         bits per board cell, where B is the lowest power of 2 that
board from a single description.                                     provides enough bits to represent every possible state per
                                                                     cell (including state 0 for the empty cells).5
Reactive vs Proactive Patterns                                          For each game, we seek to derive an optimal feature set F ,
                                                                     which is the set of features that outperform all other known
We distinguish between reactive patterns that trigger actions
                                                                     feature sets for that game on average, within a given compu-
directly in response to the previous player’s last move,4 and
                                                                     tational budget.
proactive patterns that can trigger actions anywhere across
the board regardless of the previous player’s last move. This
distinction is similar to the distinction between “responsive”
                                                                     Feature Instantiation
and “non-responsive” patterns used in (Silver et al. 2016).          As each game is loaded, an instance of every possible valid
   For example, the bottom row of Figure 4 shows the Hex             rotation, reflection and translation is pre-generated once for
pattern described earlier in Figure 1 (left) and a reactive pat-     all component features fx of that game’s optimal feature
tern that describes this situation (middle). The top row shows       set F , including combinations of valid element types within
a proactive pattern that encourages Hex players to play two          each feature. Each feature definition can therefore generate
disjoint steps away from existing friendly pieces. For clar-         hundreds of instances.
ity in the diagrams, friendly pieces are denoted as white               Each instance is defined using the same custom BitSet
disks and enemy pieces as black discs (the last move made            class as the game state. Hence, each feature instance can
is dotted). White dots indicate empty cells, edges between           be matched to the given game state efficiently using bitwise
elements indicate cell adjacency, and green “+” symbols in-          parallel operations, such that only a few bitwise operations
dicate the preferred action triggered by each pattern. The           need to be applied per instance match test, regardless of the
rightmost column of Figure 4 shows how these patterns map            feature’s complexity.
easily from the hexagonal grid to the square grid.
                                                                     Feature Application
                     Implementation                                  Feature sets are applied to bias MCTS playouts as follows:
The L UDII system and associated feature mechanisms are
implemented in Java 8. Internal game states are defined us-          1. For each player move, all legal moves are generated and
ing a custom BitSet class, called a ChunkSet, that com-                 initialised with equal probability of being selected.
presses the required state information in the minimum num-           2. Each reactive feature instance corresponding to the pre-
ber of bits, based on each game’s definition.                           vious player’s last move is then checked for a match to
                                                                        the current game state, and if a match is detected then
equipment it involves, including containers (boards, piles, hands,
decks, etc.) and components (pieces, tiles, dice, cards, etc).          5
                                                                          Chunk sizes are set to the lowest power of 2 to avoid issues
   4
     There may be more than one opponent.                            with chunks straddling consecutive long member variables.
    the feature’s weight is added to the probability of the fea-
    ture’s action being selected (note that feature weights can
    be negative for detrimental moves).
3. Each proactive feature instance is then applied across the
    board and checked for a match to the current game state,
    and if a match is detected then the feature’s weight is        Figure 5: Patterns for a “line of 4” strategy (top) and a “not
    added to the probability of the feature’s action being se-     line of 3” strategy (bottom).
    lected.
4. The move to be made is then selected randomly according
    to the resulting biased distribution.
    Features can similarly be used in other parts of MCTS,
 such as its selection phase. Note that reactive features are
 much more efficient to apply than proactive ones, as only
 those patterns relevant to the previous player’s last move
 need be tested. We therefore seek to generate reactive fea-
 tures, and to replace proactive features with their reactive
 equivalents, where possible.

Feature Generation                                                     Figure 6: Patterns for a “form groups of 3” strategy.
Manually generating features using expert knowledge, or ex-
tracting them through supervised learning from human ex-
pert games, are not suitable solutions for the current task           The features shown in Figure 6 encourage the player to
in which a wide variety of unfamiliar games must be sup-           form groups of three pieces of their colour, by encouraging
ported. Exhaustive generation of features can be done more         singleton pieces to grow in any direction then discouraging
easily, but has clear drawbacks due to the massive volume of       growth beyond group size 3. Similarly, the features shown in
features it can generate.                                          Figure 7 encourage the player to form long, thin groups of
   One advantage of describing games by their component            their pieces, by encouraging singleton pieces to grow in any
ludemes is that these may be exploited to generate a plausi-       direction, then encouraging friendly pairs to extend at the
ble set of candidate feature patterns based on the specified       ends but discouraging growth at the common points adjacent
rules and equipment. Such patterns may not be suitable fea-        to both pieces (which would create shorter, thicker groups).
tures in and of themselves, but represent “minimum” pat-
terns which must be present in every feature – for example,        Feature Explanation
every Hex or Go feature must contain an empty “to” action
                                                                   An additional benefit of the ludemic model for games is that
location – allowing significant reductions in the number of
                                                                   features applicable to a given game may be readily trans-
possible patterns to be generated. For many games, this kind
                                                                   ferred as candidate features for related games defined by
of knowledge can be extracted automatically from ludemes.
                                                                   similar ludemes. Any feature that has proved relevant for a
   The subsequent fine-tuning of features to obtain the op-
                                                                   given context is a good starting point for similar contexts.
timal feature set F for a given game is still a work-in-
                                                                      Further, we can potentially invert the causal relationship
progress. Suffice it to say that frequent pattern mining (Ag-
                                                                   between ludemes and features to reverse engineer compre-
garwal and Han 2014) approaches to iteratively building
                                                                   hensible explanations for the learnt strategies that optimal
features from large numbers of randomly generated self-
                                                                   feature sets represent, by determining which ludemes are
play games has not proved effective. Learning feature sets
                                                                   responsible for given features. This relationship between a
against a random opponent can have disastrous – and some-
                                                                   game’s ludemes and its derived features is further strength-
times amusing – results in the unexpected behaviours that
                                                                   ened by the fact that initial candidate feature sets are derived
can thwart the random player. There does not seem to be
                                                                   directly from the game’s ludemic description during feature
any escape from evaluating feature sets using more time-
                                                                   generation process.
consuming but realistic tests against intelligent AI oppo-
nents, with MCTS as the baseline yardstick.                           Each ludeme in the L UDII Ludeme Library corresponds to
                                                                   a Java class with an appropriate name, providing a source of
                Features and Strategies                            convenient plain English captions for learnt concepts. And
                                                                   even if the ludeme names do not provide sufficient expla-
An attractive aspect of the proposed approach is that learnt
features have the potential to encode simple strategies for the
games to which they apply. For example, the features shown
on the top row of Figure 5 encourage the player to make
lines of four pieces of their colour, while the features shown
on the bottom row encourage the player to not make lines
of three pieces of their colour (the red “–” symbols indicate
negative weights that discourage such actions).                     Figure 7: Patterns for a “form long thin groups” strategy.
nation in themselves, they provide hints for further post-        Finnsson, H., and Björnsson, Y. 2010. Learning simula-
processing steps to find geometric relationships within the       tion control in general game-playing agents. In Proceedings
feature patterns. It is plausible that simple descriptions such   of the Twenty-Fourth AAAI Conference on Artificial Intelli-
as “make lines of 4” or “make groups of 3” may be derived         gence, 954–959. AAAI Press.
from such feature sets. Implementing such mechanisms for          Gelly, S., and Silver, D. 2007. Combining online and offline
explainable AI in the context of startegy learning for general    knowledge in UCT. In Proceedings of the 24th International
games will be a key focus of future work.                         Conference on Machine Learning, 273–280.
                                                                  Gelly, S.; Wang, Y.; Munos, R.; and Teytaud, O. 2006. Mod-
                       Conclusion                                 ification of UCT with patterns in Monte-Carlo Go. Technical
Lightweight features based on geometric piece patterns have       Report RR-6062, INRIA, Paris.
a number of advantages for our work on the Anonymous              Hoock, J.-P., and Teytaud, O. 2010. Bandit-based genetic
Project. They improve MCTS playing strength, map readily          programming. In Esparcia-Alcázar, A. I.; Ekárt, A.; Silva,
to other geometries, encode simple strategies, can be asso-       S.; Dignum, S.; and Uyar, A. Ş., eds., European Conference
ciated with the underlying ludemic descriptions of games,         on Genetic Programming, volume 6021 of Lecture Notes in
and have the potential to help explain learnt strategies in       Computer Science, 268–277. Springer.
human-comprehensible terms. We will continue developing
                                                                  Lorentz, R. J., and Zosa IV, T. E. 2017. Machine learning in
and testing this approach as the LUDII general game sys-
                                                                  the game of Breakthrough. In Winands, M. H. M.; van den
tem matures to support an increasing range of game types,
                                                                  Herik, H.; and Kosters, W. A., eds., Advances in Computer
and investigating the automated extraction and explanation
                                                                  Games, volume 10664 of Lecture Notes in Computer Sci-
of relevant strategies from such learnt features.
                                                                  ence, 140–150. Springer.
                  Acknowledgements                                Murray, H. 1952. A History of Board-Games Other Than
                                                                  Chess. Clarendon Press.
This research is part of the European Research Council-
                                                                  Raiko, T., and Peltonen, J. 2008. Application of UCT search
funded Digital Ludeme Project (ERC Consolidator Grant
                                                                  to the connection games of Hex, Y, *Star, and Renkula!
#771292) being run by Cameron Browne. We would also
                                                                  In Proceedings of the Finnish Artificial Intelligence Confer-
like to acknowledge the RIKEN Institute’s Advanced Intel-
                                                                  ence, 89–93.
ligence Project (AIP), especially Kazuki Yoshizoe, for their
generous support of prior research that led to this work.         Silver, D.; Huang, A.; Maddison, C.; Guez, A.; Sifre, L.;
                                                                  van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.;
                        References                                Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.;
                                                                  Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.;
Aggarwal, C. C., and Han, J., eds. 2014. Frequent Pattern         Leach, M.; Kavukcuoglu, K.; Graepel, T.; and Hassabis, D.
Mining. Springer.                                                 2016. Mastering the game of Go with deep neural networks
Araki, N.; Yoshida, K.; Tsuruoka, Y.; and Tsujii, J. 2007.        and tree search. Nature 529(7587):484–489.
Move prediction in Go with the maximum entropy method.            Skowronski, P.; Björnsson, Y.; and Winands, M. H. M.
In Proceedings of the 2007 IEEE Symposium on Computa-             2009. Automated discovery of search-extension features.
tional intelligence and Games, 189–195. IEEE.                     In van den Herik, H. J., and Spronck, P., eds., Advances in
Borvo, A. 1977. Anatomie d’un jeu de cartes: L’Aluette ou         Computer Games, volume 6048 of Lecture Notes in Com-
le Jeu de Vache. Nantes: Librairie Nantaise Yves Vachon.          puter Science. Springer, Berlin, Heidelberg.
Bouzy, B., and Chaslot, G. 2005. Bayesian generation and          Stern, D.; Herbrich, R.; and Graepel, T. 2006. Bayesian
integration of K-nearest-neighbor patterns for 19x19 Go. In       pattern ranking for move prediction in the game of Go. In
Kendall, G., and Lucas, S., eds., Proceedings of the 2005         Cohen, W. W., and Moore, A., eds., Proceedings of the 23rd
IEEE Symposium on Computational Intelligence in Games,            International Conference on Machine Learning, 873–880.
176–181. IEEE.                                                    Stoutamire, D. 1991. Machine learning, game play, and Go.
Browne, C.; Powley, E.; Whitehouse, D.; Lucas, S.; Cowl-          Technical Report TR 91-128, Center for Automation and In-
ing, P. I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samoth-      telligent Systems Research, Case Western Reserve Univer-
rakis, S.; and Colton, S. 2012. A survey of Monte Carlo           sity.
tree search methods. IEEE Transactions on Computational           Sturtevant, N. R., and White, A. M. 2007. Feature con-
Intelligence and AI in Games 4(1):1–49.                           struction for reinforcement learning in Hearts. In van den
Browne, C. 2013. A problem case for UCT. IEEE Trans-              Herik, H. J.; Ciancarini, P.; and Donkers, H. H. L. M., eds.,
actions on Computational Intelligence and AI in Games             Computers and Games, volume 4630 of Lecture Notes in
5(1):69–74.                                                       Computer Science, 122–134. Springer.
Coulom, R. 2007. Computing “ELO ratings” of move pat-             van der Werf, E.; Uiterwijk, J. W. H. M.; Postma, E.; and
terns in the game of Go. ICGA Journal 30(4):198–208.              van den Herik, J. 2003. Local move prediction in Go. In
Enderton, H. D. 1991. The Golem Go program. Techni-               Schaeffer, J.; Müller, M.; and Björnsson, Y., eds., Comput-
cal Report CMU-CS-92-101, School of Computer Science,             ers and Games, volume 2883 of Lecture Notes in Computer
Carnegie-Mellon University.                                       Science. Springer.