Strategic Features for General Games Cameron Browne, Dennis J. N. J. Soemers, and Eric Piette Department of Data Science and Knowledge Engineering (DKE) Maastricht University, Bouillonstraat 8-10 Maastricht, 6211 LH, The Netherlands Abstract We aim to model the 1,000 most influential traditional games throughout history, each of which may have multiple This short paper describes an ongoing research project that interpretations, each of which may require hundreds of vari- requires the automated self-play learning and evaluation of a ant rule sets to be tested. This is therefore not just a mathe- large number of board games in digital form. We describe the approach we are taking to determine relevant features, for bi- matical/computational challenge but also a logistical one, as asing MCTS playouts for arbitrary games played on arbitrary L UDII may need to be able to learn to play and evaluate up geometries. Benefits of our approach include efficient imple- to a million candidate rule sets over the project’s five years. mentation, the potential to transfer learnt knowledge to new This paper outlines work-in-progress for achieving this contexts, and the potential to explain strategic knowledge em- ambitious goal, through the self-play learning of simple bedded in features in human-comprehensible terms. lightweight features that embody strategic knowledge about the games being modelled. The features we propose are geometry-independent, facilitating their transfer to new con- Introduction texts, and represent basic strategies that may be understood The Digital Ludeme Project1 is a five-year research project, and explained in human-comprehensible terms. recently launched at Maastricht University, that aims to model the world’s traditional strategy games in a single, Geometric Piece Features playable digital database. This database will be used to find The L UDII general game system employs Monte Carlo tree relationships between games and their component ludemes,2 search (MCTS) (Browne et al. 2012) as its core method in order to develop a model of the evolution of games for AI move planning, which has proven to be a supe- throughout recorded human history and to chart their spread rior approach for general games in the absence of domain- across cultures worldwide. This paper describes a new ap- specific knowledge (Finnsson and Björnsson 2010). While proach to defining strategic features for general games that MCTS performance can vary significantly if strictly random we are implementing in order to achieve the project’s aims. playouts are used (Browne 2013), the reliability of Monte Carlo simulations can be improved by biasing playouts with Statement of the Problem domain-relevant features. While there exists substantial archaeological evidence for There is a significant amount of prior research that de- ancient games (i.e. those played before 500AD) the rule sets scribes the use of local features or patterns for game-playing explaining how these games were actually played are typ- agents tailored towards a specific game. Some of the most ically lost; our modern understanding of ancient and early common approaches for generating features are: games is based primarily on modern reconstructions that • Manually constructing features using expert knowledge. have proved to be less than reliable (Murray 1952). This was, for example, done by Gelly et al. (2006) for the A key challenge in this project will be to evaluate recon- game of Go, and by Sturtevant and White (2007) for the structions of ancient rule sets and to improve them where game of Hearts. possible. The general game system we are developing for • Exhaustively generating all possible patterns up to a cer- this project, called L UDII, must therefore be able to: tain restriction (often limited to, e.g., areas of 3×3 cells). 1. Model the full range of traditional strategy games. This was, for example, done by Gelly and Silver (2007) 2. Play them at a competent human level. for the game of Go, and Lorentz and Zosa IV (2017) for 3. Evaluate rule set reconstructions for: a) historical authen- the game of Breakthrough. Sturtevant and White (2007) ticity, and b) quality of play. also exhaustively generated combinations (for instance 4. Improve rule set reconstructions where needed. pairs, triples, etc.) of the manually selected features for the game of Hearts as described above. 1 Funded by a e2m ERC Consolidator Grant (http://ludeme.eu). • Supervised learning based on databases of human expert 2 Units of game-related information (Borvo 1977). games. This has been particularly common in the game of Go (Enderton 1991; Stoutamire 1991; van der Werf et • “In” Piece with index n in the game definition.3 al. 2003; Bouzy and Chaslot 2005; Stern, Herbrich, and We also define “!” (i.e. “not”) versions of these element Graepel 2006; Araki et al. 2007; Coulom 2007). types. For example, “!P2” means that the specified location Other approaches include Gradual Focus (Skowronski, does not contain a piece belonging to Player 2. Locations can Björnsson, and Winands 2009), a genetic programming ap- simultaneously have multiple qualifiers, e.g. a location with proach to find useful patterns (Hoock and Teytaud 2010), both “x” and “!I3” qualified would match an enemy piece at and the use of deep convolutional neural networks for learn- that location unless it has index 3 in the game definition. ing relevant features through self-play (Silver et al. 2016). Topology Rather than tying feature topology to any particular board or grid type, feature elements are described by their rela- tive locations on the underlying graph of the game board 1 2 (or more accurately its dual, which defines orthogonal cell adjacencies). Starting with a given anchor location and di- rection, the relative location of each element is defined as a walk through the underlying board graph. We assume that the orthogonal neighbours of each board location are listed in consecutive clockwise order, including null placeholders Figure 1: Bridge completion is a beneficial pattern in Hex. for off-board steps. Each walk is defined as a sequence of adjacent steps The features we are interested in here are geometric piece through the underlying board graph, where: patterns of arbitrary topology. For example, Figure 1 shows • 0 denotes a step forwards in the current direction. a pattern of pieces that represents a beneficial move for con- • − 31 denotes a3 counterclokwise turns in a cell with a sides nection games played on the hexagonal grid. If Black has (i.e. one turn in a triangle), then a step forwards, just played move 1 to intrude into the virtual connection of two White pieces (left), then White should reply imme- • + 24 denotes 2a 4 clockwise turns in a cell with a sides (i.e. diately with move 2 to complete that connection. Biasing two turns in a square), then a step forwards, etc. MCTS playouts to play such moves with higher probability Figure 2 shows the relative steps through cells with dif- was found to dramatically improve AI playing strength for ferent numbers of sides. Fractional turns are rounded to the connection games such as Hex and Y (Raiko and Peltonen nearest sensible fraction for any given cell (e.g., a turn of 14 2008). in a triangle becomes equal to a turn of 13 ). The basic mech- anism is similar in principle to the use of turtle commands Feature Definition in computer graphics. Each feature x defines: Note that cells with an odd number of sides, such as the triangular and pentagonal cells shown in Figure 2, do not 1. A pattern px which specifies one or more elements that have a forwards (0) direction. Such cases may be handled must be present or absent in certain locations in game by instantiating two pattern instances per ambiguous step: states for feature x to hold. one with 0 replaced by − a1 and one with 0 replaced by + a1 2. An action ax which may be encouraged or discouraged in for that step in a cell with a sides. game states where x holds. 0 For most board games, actions can be sufficiently speci- fied using one or two integers; a location to move “to”, and -1 1 -1 1 0 5 5 6 6 in some games a location to move “from”. For example, in games such as Hex and Go a single board location is suffi- -1 1 -1 1 3 3 4 4 -2 2 -1 1 cient to uniquely identify any action in any particular game 5 5 3 3 state. In games such as Draughts or Chess, the current loca- tion of the piece to be moved is also required. Figure 2: Steps through various cells based on CW turns. Syntax Each pattern consists of a set of elements. We define the fol- For example, a knight move may be described using this lowing element types for each potential feature element, rel- notation as {0, 0, 41 }, as shown in Figure 3 (left). This de- ative to the current mover: scription can be mapped directly to other grids of arbitrary • “-” Off-board location (for locating edges and corners). topology to produce plausible results, such as the semi- regular 3.4.6.4 tiling (right). Note that the turn of 14 is am- • “.” Empty board location. biguous in the semi-regular grid, and is therefore split into • “o” Friendly piece (i.e. of the mover’s colour). two possible patterns. • “x” Enemy piece (i.e. not of the mover’s colour). 3 • “Pn” Piece belonging to player n. The game definition maintains a unique index for each type of + + N N Figure 3: Relative knight move {0, 0, 14 } on the square grid (left) and two equivalent moves on the semi-regular 3.4.6.4 + grid (right). + Local vs Global Patterns Patterns may be defined as either: Figure 4: Reactive (bottom) and proactive (top) features. • Relative: Apply to all valid anchor locations across the board (taking null locations into account). • Absolute: Apply only to the specified anchor location. For example, if a game involves no more equipment than For convenience, the number of valid rotations and reflec- a board and uniform pieces in N colours, then the game state tions for each pattern may also be specified, so that all possi- is described by a ChunkSet subdivided into chunks of B ble instances of a given pattern may be generated across the bits per board cell, where B is the lowest power of 2 that board from a single description. provides enough bits to represent every possible state per cell (including state 0 for the empty cells).5 Reactive vs Proactive Patterns For each game, we seek to derive an optimal feature set F , which is the set of features that outperform all other known We distinguish between reactive patterns that trigger actions feature sets for that game on average, within a given compu- directly in response to the previous player’s last move,4 and tational budget. proactive patterns that can trigger actions anywhere across the board regardless of the previous player’s last move. This distinction is similar to the distinction between “responsive” Feature Instantiation and “non-responsive” patterns used in (Silver et al. 2016). As each game is loaded, an instance of every possible valid For example, the bottom row of Figure 4 shows the Hex rotation, reflection and translation is pre-generated once for pattern described earlier in Figure 1 (left) and a reactive pat- all component features fx of that game’s optimal feature tern that describes this situation (middle). The top row shows set F , including combinations of valid element types within a proactive pattern that encourages Hex players to play two each feature. Each feature definition can therefore generate disjoint steps away from existing friendly pieces. For clar- hundreds of instances. ity in the diagrams, friendly pieces are denoted as white Each instance is defined using the same custom BitSet disks and enemy pieces as black discs (the last move made class as the game state. Hence, each feature instance can is dotted). White dots indicate empty cells, edges between be matched to the given game state efficiently using bitwise elements indicate cell adjacency, and green “+” symbols in- parallel operations, such that only a few bitwise operations dicate the preferred action triggered by each pattern. The need to be applied per instance match test, regardless of the rightmost column of Figure 4 shows how these patterns map feature’s complexity. easily from the hexagonal grid to the square grid. Feature Application Implementation Feature sets are applied to bias MCTS playouts as follows: The L UDII system and associated feature mechanisms are implemented in Java 8. Internal game states are defined us- 1. For each player move, all legal moves are generated and ing a custom BitSet class, called a ChunkSet, that com- initialised with equal probability of being selected. presses the required state information in the minimum num- 2. Each reactive feature instance corresponding to the pre- ber of bits, based on each game’s definition. vious player’s last move is then checked for a match to the current game state, and if a match is detected then equipment it involves, including containers (boards, piles, hands, decks, etc.) and components (pieces, tiles, dice, cards, etc). 5 Chunk sizes are set to the lowest power of 2 to avoid issues 4 There may be more than one opponent. with chunks straddling consecutive long member variables. the feature’s weight is added to the probability of the fea- ture’s action being selected (note that feature weights can be negative for detrimental moves). 3. Each proactive feature instance is then applied across the board and checked for a match to the current game state, and if a match is detected then the feature’s weight is Figure 5: Patterns for a “line of 4” strategy (top) and a “not added to the probability of the feature’s action being se- line of 3” strategy (bottom). lected. 4. The move to be made is then selected randomly according to the resulting biased distribution. Features can similarly be used in other parts of MCTS, such as its selection phase. Note that reactive features are much more efficient to apply than proactive ones, as only those patterns relevant to the previous player’s last move need be tested. We therefore seek to generate reactive fea- tures, and to replace proactive features with their reactive equivalents, where possible. Feature Generation Figure 6: Patterns for a “form groups of 3” strategy. Manually generating features using expert knowledge, or ex- tracting them through supervised learning from human ex- pert games, are not suitable solutions for the current task The features shown in Figure 6 encourage the player to in which a wide variety of unfamiliar games must be sup- form groups of three pieces of their colour, by encouraging ported. Exhaustive generation of features can be done more singleton pieces to grow in any direction then discouraging easily, but has clear drawbacks due to the massive volume of growth beyond group size 3. Similarly, the features shown in features it can generate. Figure 7 encourage the player to form long, thin groups of One advantage of describing games by their component their pieces, by encouraging singleton pieces to grow in any ludemes is that these may be exploited to generate a plausi- direction, then encouraging friendly pairs to extend at the ble set of candidate feature patterns based on the specified ends but discouraging growth at the common points adjacent rules and equipment. Such patterns may not be suitable fea- to both pieces (which would create shorter, thicker groups). tures in and of themselves, but represent “minimum” pat- terns which must be present in every feature – for example, Feature Explanation every Hex or Go feature must contain an empty “to” action An additional benefit of the ludemic model for games is that location – allowing significant reductions in the number of features applicable to a given game may be readily trans- possible patterns to be generated. For many games, this kind ferred as candidate features for related games defined by of knowledge can be extracted automatically from ludemes. similar ludemes. Any feature that has proved relevant for a The subsequent fine-tuning of features to obtain the op- given context is a good starting point for similar contexts. timal feature set F for a given game is still a work-in- Further, we can potentially invert the causal relationship progress. Suffice it to say that frequent pattern mining (Ag- between ludemes and features to reverse engineer compre- garwal and Han 2014) approaches to iteratively building hensible explanations for the learnt strategies that optimal features from large numbers of randomly generated self- feature sets represent, by determining which ludemes are play games has not proved effective. Learning feature sets responsible for given features. This relationship between a against a random opponent can have disastrous – and some- game’s ludemes and its derived features is further strength- times amusing – results in the unexpected behaviours that ened by the fact that initial candidate feature sets are derived can thwart the random player. There does not seem to be directly from the game’s ludemic description during feature any escape from evaluating feature sets using more time- generation process. consuming but realistic tests against intelligent AI oppo- nents, with MCTS as the baseline yardstick. Each ludeme in the L UDII Ludeme Library corresponds to a Java class with an appropriate name, providing a source of Features and Strategies convenient plain English captions for learnt concepts. And even if the ludeme names do not provide sufficient expla- An attractive aspect of the proposed approach is that learnt features have the potential to encode simple strategies for the games to which they apply. For example, the features shown on the top row of Figure 5 encourage the player to make lines of four pieces of their colour, while the features shown on the bottom row encourage the player to not make lines of three pieces of their colour (the red “–” symbols indicate negative weights that discourage such actions). Figure 7: Patterns for a “form long thin groups” strategy. nation in themselves, they provide hints for further post- Finnsson, H., and Björnsson, Y. 2010. Learning simula- processing steps to find geometric relationships within the tion control in general game-playing agents. In Proceedings feature patterns. It is plausible that simple descriptions such of the Twenty-Fourth AAAI Conference on Artificial Intelli- as “make lines of 4” or “make groups of 3” may be derived gence, 954–959. AAAI Press. from such feature sets. Implementing such mechanisms for Gelly, S., and Silver, D. 2007. Combining online and offline explainable AI in the context of startegy learning for general knowledge in UCT. In Proceedings of the 24th International games will be a key focus of future work. Conference on Machine Learning, 273–280. Gelly, S.; Wang, Y.; Munos, R.; and Teytaud, O. 2006. Mod- Conclusion ification of UCT with patterns in Monte-Carlo Go. Technical Lightweight features based on geometric piece patterns have Report RR-6062, INRIA, Paris. a number of advantages for our work on the Anonymous Hoock, J.-P., and Teytaud, O. 2010. Bandit-based genetic Project. They improve MCTS playing strength, map readily programming. In Esparcia-Alcázar, A. I.; Ekárt, A.; Silva, to other geometries, encode simple strategies, can be asso- S.; Dignum, S.; and Uyar, A. Ş., eds., European Conference ciated with the underlying ludemic descriptions of games, on Genetic Programming, volume 6021 of Lecture Notes in and have the potential to help explain learnt strategies in Computer Science, 268–277. Springer. human-comprehensible terms. We will continue developing Lorentz, R. J., and Zosa IV, T. E. 2017. Machine learning in and testing this approach as the LUDII general game sys- the game of Breakthrough. In Winands, M. H. M.; van den tem matures to support an increasing range of game types, Herik, H.; and Kosters, W. A., eds., Advances in Computer and investigating the automated extraction and explanation Games, volume 10664 of Lecture Notes in Computer Sci- of relevant strategies from such learnt features. ence, 140–150. Springer. Acknowledgements Murray, H. 1952. A History of Board-Games Other Than Chess. Clarendon Press. This research is part of the European Research Council- Raiko, T., and Peltonen, J. 2008. Application of UCT search funded Digital Ludeme Project (ERC Consolidator Grant to the connection games of Hex, Y, *Star, and Renkula! #771292) being run by Cameron Browne. We would also In Proceedings of the Finnish Artificial Intelligence Confer- like to acknowledge the RIKEN Institute’s Advanced Intel- ence, 89–93. ligence Project (AIP), especially Kazuki Yoshizoe, for their generous support of prior research that led to this work. Silver, D.; Huang, A.; Maddison, C.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; References Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.; Aggarwal, C. C., and Han, J., eds. 2014. Frequent Pattern Leach, M.; Kavukcuoglu, K.; Graepel, T.; and Hassabis, D. Mining. Springer. 2016. Mastering the game of Go with deep neural networks Araki, N.; Yoshida, K.; Tsuruoka, Y.; and Tsujii, J. 2007. and tree search. Nature 529(7587):484–489. Move prediction in Go with the maximum entropy method. Skowronski, P.; Björnsson, Y.; and Winands, M. H. M. In Proceedings of the 2007 IEEE Symposium on Computa- 2009. Automated discovery of search-extension features. tional intelligence and Games, 189–195. IEEE. In van den Herik, H. J., and Spronck, P., eds., Advances in Borvo, A. 1977. Anatomie d’un jeu de cartes: L’Aluette ou Computer Games, volume 6048 of Lecture Notes in Com- le Jeu de Vache. Nantes: Librairie Nantaise Yves Vachon. puter Science. Springer, Berlin, Heidelberg. Bouzy, B., and Chaslot, G. 2005. Bayesian generation and Stern, D.; Herbrich, R.; and Graepel, T. 2006. Bayesian integration of K-nearest-neighbor patterns for 19x19 Go. In pattern ranking for move prediction in the game of Go. In Kendall, G., and Lucas, S., eds., Proceedings of the 2005 Cohen, W. W., and Moore, A., eds., Proceedings of the 23rd IEEE Symposium on Computational Intelligence in Games, International Conference on Machine Learning, 873–880. 176–181. IEEE. Stoutamire, D. 1991. Machine learning, game play, and Go. Browne, C.; Powley, E.; Whitehouse, D.; Lucas, S.; Cowl- Technical Report TR 91-128, Center for Automation and In- ing, P. I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samoth- telligent Systems Research, Case Western Reserve Univer- rakis, S.; and Colton, S. 2012. A survey of Monte Carlo sity. tree search methods. IEEE Transactions on Computational Sturtevant, N. R., and White, A. M. 2007. Feature con- Intelligence and AI in Games 4(1):1–49. struction for reinforcement learning in Hearts. In van den Browne, C. 2013. A problem case for UCT. IEEE Trans- Herik, H. J.; Ciancarini, P.; and Donkers, H. H. L. M., eds., actions on Computational Intelligence and AI in Games Computers and Games, volume 4630 of Lecture Notes in 5(1):69–74. Computer Science, 122–134. Springer. Coulom, R. 2007. Computing “ELO ratings” of move pat- van der Werf, E.; Uiterwijk, J. W. H. M.; Postma, E.; and terns in the game of Go. ICGA Journal 30(4):198–208. van den Herik, J. 2003. Local move prediction in Go. In Enderton, H. D. 1991. The Golem Go program. Techni- Schaeffer, J.; Müller, M.; and Björnsson, Y., eds., Comput- cal Report CMU-CS-92-101, School of Computer Science, ers and Games, volume 2883 of Lecture Notes in Computer Carnegie-Mellon University. Science. Springer.