=Paper= {{Paper |id=Vol-1682/CoSeCiVi16_paper_12 |storemode=property |title=A Neuroevolution Approach to Imitating Human-Like Play in Ms. Pac-Man Video Game |pdfUrl=https://ceur-ws.org/Vol-1682/CoSeCiVi16_paper_12.pdf |volume=Vol-1682 |authors=Maximiliano Miranda,Antonio A. Sánchez-Ruiz,Federico Peinado |dblpUrl=https://dblp.org/rec/conf/cosecivi/MirandaSP16 }} ==A Neuroevolution Approach to Imitating Human-Like Play in Ms. Pac-Man Video Game== https://ceur-ws.org/Vol-1682/CoSeCiVi16_paper_12.pdf
   A Neuroevolution Approach to Imitating
 Human-Like Play in Ms. Pac-Man Video Game

    Maximiliano Miranda, Antonio A. Sánchez-Ruiz, and Federico Peinado

         Departamento de Ingenierı́a del Software e Inteligencia Artificial
                       Universidad Complutense de Madrid
          c/ Profesor José Garcı́a Santesmases 9, 28040 Madrid (Spain)
        m.miranda@ucm.es - antsanch@ucm.es - email@federicopeinado.com
                            http://www.narratech.com



      Abstract. Simulating human behaviour when playing computer games
      has been recently proposed as a challenge for researchers on Artificial
      Intelligence. As part of our exploration of approaches that perform well
      both in terms of the instrumental similarity measure and the phenomeno-
      logical evaluation by human spectators, we are developing virtual players
      using Neuroevolution. This is a form of Machine Learning that employs
      Evolutionary Algorithms to train Artificial Neural Networks by consider-
      ing the weights of these networks as chromosomes for the application of
      genetic algorithms. Results strongly depend on the fitness function that
      is used, which tries to characterise the human-likeness of a machine-
      controlled player. Designing this function is complex and it must be
      implemented for specific games. In this work we use the classic game
      Ms. Pac-Man as an scenario for comparing two different methodologies.
      The first one uses raw data extracted directly from human traces, i.e.
      the set of movements executed by a real player and their corresponding
      time stamps. The second methodology adds more elaborated game-level
      parameters as the final score, the average distance to the closest ghost,
      and the number of changes in the player’s route. We assess the impor-
      tance of these features for imitating human-like play, aiming to obtain
      findings that would be useful for many other games.

      Keywords: Virtual Video Game Player, Human Behaviour Imitation,
      Artificial Neural Networks, Evolutionary Computing, Genetic Algorithms,
      Machine Learning, Artificial Intelligence


1   Introduction

Researchers on Artificial Intelligence (AI) are always looking for problems that
are challenging but feasible at the same time, in order to progress their mission of
computationally recreating intelligence. Imitating video game players has been
recently considered an interesting challenge for the AI research community. Ex-
amples of this trend are the competitions regarding the development of believable
video game characters that have taken place during the last years [5, 12].
    In the Digital Game Industry there is a widespread assumption that wherever
there is a machine-controlled character, the game experience will benefit from
this character to be controlled by the computer in a less robotic and more human-
like way. For this reason, player modelling in video games has been a increasingly
important field of study, not only for academics but for professional developers
as well [13].
    Human-like bots, as they are also called, can not only be used to confront
the human player, but also to collaborate with him or to demonstrate how to
succeed in a particular game level to help players who get stuck. It seems clear
that these demonstrations will be more useful if the bot imitates a human style
of playing. In fact, this last application has shown a great potential: there has
been a very important growth in the consumption of gaming videos during the
last years and 95% of people who plays video games watch also gaming videos
online [3]. There is a great interest in watching how other people play, either to
learn how a particular game works (how to improve some skills, how to solve
a particular puzzle, etc.), just for mere entertainment, or as business and mass
spectacle (e.g. the so-called electronic sports).
    Another possible application of human-like bots can be focused on the testing
phase in video game production process. These bots could be used to check that
the game levels have the right difficulty, that none of them are “broken” by
design, or even to find completely new ways for solving a puzzle. In case of
procedural generation of levels, a human-like bot could help us to select which
ones are the best to be played for real people.
    The goal of this work is to develop a human-like virtual player that, as it will
be explained later, has been developed using the Neuroevolution approach, i.e.
evolving an artificial neural network with the purpose of knowing what moves to
chose for automatically playing an action video game. This approach has proved
successful for training bots of classic games as Super Mario Bros [8]. In our case,
we will use another classic arcade game, Ms. Pac-Man, using a Java version
specifically designed for the development of bots that we have tested in previous
experiments [7].
    The rest of the paper is organised as follows. Next section reviews other
works related to the simulation of human players. In Section 3 the features of
our testbed and problem domain, Ms. Pac-Man vs Ghosts, are explained in
detail. Section 4 outlines our modelling process of the game state. Section 5 ex-
plains how do we obtain the training set and what features do we select for the
machine learning process. Our neuroevolutive algorithm is presented in the Sec-
tion 6 and the results of our experiments appear and are discussed in Section 7.
Finally Section 8 summarises our conclusions about this first attempt of using
the approach for creating player bots for Ms. Pac-Man.


2   Related Work

As a sub-field of Computer Science especially related to AI, Machine Learning
aims to gives computers the ability to learn by themselves, not following the
instructions of a static program. It includes the study and construction of algo-
rithms that can make predictions on data, building a computational model that
relates inputs with predictions or decisions in the form of outputs.
    Neuroevolution (NE) is a form of machine learning that employs an Evo-
lutionary Algorithm (EA) to train an Artificial Neural Network (ANN) [9] by
considering the weights of the ANN as the chromosomes of the EA.
    Regarding the imitation of human behaviour in video games, there is plenty
of work that could be categorised into direct and indirect imitation [11].
    In direct imitation developers use supervised learning in order to associate
the state of the controller with the actions the human player takes given the
game state. This imitation approach seems to be the most popular and its main
problem is the generalization of previously unseen situations which leads to
controllers that perform worse than the player whose traces were used in the
supervised learning. The main cause is this training aiming to repeat the same
decisions of the player given the same (or similar) state of the game, and not
playing “well” in terms of game goals.
    Indirect imitation tries to measure certain properties of the player’s be-
haviour, then using some form of reinforcement learning the algorithm optimise
a fitness function that measures the human-likeness of a controller behaviour.
    NE seems to perform well both in terms of the instrumental similarity mea-
sure and the phenomenological evaluation by human spectators when imitating
human playing style in games like Super Mario Bros [8]. However, in order to
use NE to imitate human behaviour, we need to design a fitness function to
characterise the human likeness of the automatic game controller. These fitness
functions are domain dependent and not easy to define. In this work we take
the first steps towards choosing an appropriate set of features for this fitness
function.
    In games with a restrained number of player outputs and a discrete set of
actions, controllers which implement indirect imitation like NE and dynamic
scripting perform better than those that implements direct imitation, both in
convincing human observers as well as in a raw score [8].


3   Ms. Pac-Man vs. Ghosts

Pac-Man is an arcade video game produced by Namco and created by Toru
Iwatani and Shigeo Fukani in 1980. Since its launch it has been considered as an
icon, not only for the video game industry, but for the twentieth century popular
culture [4]. In this game, the player has direct control over Pac-Man (a small
yellow character), pointing the direction it will follow in the next time-step (up,
down, left or right). The level is a simple maze full of white pills that Pac-Man
eats gaining points. There are four ghosts with different behaviours trying to
capture Pac-Man, causing it to lose one of its lives. Pac-Man initially has three
lives and the game ends when the player looses all of them. In the maze there
are also four special pills, bigger than the normal ones, which make the ghosts
                 Fig. 1. An screenshot of Ms. Pac-Man vs. Ghosts


to be “edible” during a short period of time. Every time Pac-Man eats one of
the ghosts during this period, the player is rewarded with several points.
    Two years after the release of the original Pac-Man video game, Midway
Manufacturing Corporation (distributors of the original version of Pac-Man in
the USA) produced the next version of the game: Ms. Pac-Man. This version of
the game introduced a female protagonist, Ms. Pac-Man, and some changes in
the gameplay: the game is slightly faster than the original one and, in contrast
with the prequel, the ghosts do not have a deterministic behaviour, being their
path through the maze not predefined [6].
    Ms. Pac-Man vs Ghosts (Figure 1) is a new implementation of the game that
provides a Java API to easily developing bots to control both the protagonist and
the antagonists of the game, allowing a full access to the game logic and internal
structures. In fact, there have been several academic competitions during the
recent years (for example in the CIG conference) to build bots with the obvious
goal of maximising the score. These bots are able to obtain very high scores but
their behaviour is usually not very human. For example, these bots can compute
optimal routes taking into account the speed of the ghosts and pass very close
to them while human players tend to keep more distance and avoid potential
dangerous situations.
    The decision of using Ms. Pac-Man as our testbed game is mainly due to three
factors. Firstly the game presents a discrete state space and the number of actions
required by the player is quite limited (continuous movement in only four possible
directions). Secondly, we assume that this game is sufficiently different to already
explored platform video games as Super Mario Bros, since the movement of the
characters is not limited to one axis (plus jumping) but in a two-dimensional
maze, the levels are designed with a labyrinthine structure with linear paths (in
       Fig. 2. Any visible tile of the maze internally contains up to 7 nodes.


which only “fits” one character on the way), and there are persistent enemies
that chase the player with a non deterministic behaviour. Finally, as it was
previously mentioned, we have already worked with this game before [7] and we
are familiar with the implementation of this type of bots.


4   Game State Representation

The API of Ms. Pac-Man vs. Ghosts represents the state of the game as a graph
in which each node corresponds to a passable region of the maze (a square of
2 x 2 pixels). Each node can be connected to up to other four nodes, one in
each direction (north, east, south and west), and can contain a small pill, a big
pill, one or more ghosts, Ms. Pac-Man herself or nothing at all (i.e. the node
represents an empty portion of the maze). Note that the graph only contains
passable regions, so the walls of the maze are never represented, i.e. there are
no edges connecting passable nodes to “wall nodes”. The full graph representing
the state of the game contains 1293 nodes.
    Although this graph representation is useful to compute routes for the char-
acters using the A* algorithm, it also provides an excessive level of detail. In
order to facilitate learning, we decided to simplify the game state using a tiled-
based representation in which each tile corresponds to a screen region of 8 x
8 pixels. One tile can contain either 4 nodes on one side (upwards or side to
side) or 7 nodes (see Figure 2): 3 in the base, 3 upwards and 1 shared in the
corner. Pills are located always in the center of the tiles (corresponding to the
left-bottom corner node, the purple one in the Figure 2) and separated by 4
nodes so that every game cycle Ms. Pac-Man moves one node and every four
moves in the same direction Ms. Pac-Man advances one tile. This way the whole
maze can be represented as a grid of 26 x 29 tiles (see Figure 3) where some
of them represent walls and other are passable. Each tile, in turn, can contain
a pill, a wall, one or more ghosts, Ms. Pac-Man or an empty space. We think
that this simplified state representation not only reduces the number of different
states to consider to a great extend, it is also closer to the human perception of
the maze.
    In order to reduce even more the number of possible states, in every game
cycle we will only consider a small window of tiles around the player (see Figure
             Fig. 3. The whole maze modelled as a grid of 26 x 29 tiles.


4). This approach has proven successful in previous works with artificial neural
networks in Pac-Man [2] and it is based on the intuition that the content of
nearby tiles is more important and therefore has more influence on the player’s
decisions. In our experiments we use a 7 x 7 tiles window centered on Ms. Pac-
Man.
    We keep record of three binary parameters for each tile in the window: (1)
if the cell is accessible (not a wall), (2) if there is a pill on it (big or small does
not matter) and (3) if there is a non-edible ghost in the cell, (edible ghosts are
currently ignored because they are not seen as a danger). Thus, we need 7 x 7 x 3
binary parameters to represent the current game state. Note that even working
with this simplified representation the number of potential different states is
quite big.


5    Training Set and Feature Selection

In order to train our bot to play like a human player, we asked a particular
player to play 100 different games. Each game trace contains an exhaustive
representation of the game state every cycle so that the games can be reproduced
anytime (a normal game takes around 1200 cycles and there are 254 different
parameters to consider in each cycle).
 Fig. 4. The state representation is a 7 x 7 tiles window centered on Ms. Pac-Man.



    Game states in consecutive cycles are very similar, so we decided to consider
only some of them in our training set. In particular, we collect a new game state
each time the player changes of tile or changes his direction. That way, from the
100 game traces we extract 34000 game states represented as a 7 x 7 tiles window
and for each state we store the direction in which the player was moving.
    This way, we can use any supervise machine learning algorithm in order to
learn the “right” direction given a game state. This approach is known as a direct
imitation strategy and, as we already explained has the problem of generalising
the decision-making knowledge to deal with new unseen situations.
    To complement the previous training set we also extracted some other pa-
rameters at the game level to characterise the style of the player with a higher
level of abstraction. These parameters try to measure the behaviour of the player
looking at whole games instead of looking at the particular moves:

 – Game duration: Until the first level is cleared or the player loses three lives.
 – Average distance to the closest non-edible ghost: It measures the risk the
   player is willing to take.
 – Average score per second : It measures the number of pills collected and
   ghosts eaten.
 – Number of direction changes per second : It defines different playing styles.
 – Number of ghosts eaten by Ms. Pac-Man in the game: It measures the player
   aggressiveness.

    Different players will be characterised by different values of these parameters.
For example, if the second parameter is very low you could say that the player
is playing with recklessness because during the game he does not keep a wide
distance from the ghosts. Or if this parameter is low, and also the fifth parameter
(number of ghosts eaten) is quite big, one could say that the player is quite
aggressive.
    These high level parameters will let us to implement a mix strategy in which
part of the fitness function depends on direct imitation and other part depends
on reproducing playing styles using an indirect approach.
6   Neuroevolutive Algorithm

Our bot is based on an ANN implemented using Neuroph [1], an open source
Java neural network framework. A multi-layer perceptron with a single hidden
layer and a sigmoid transfer function was used because it is a configuration that
has been proved successfully previously [2].
    The input of the network is based on the game-state window previously
described, in this way the input layer is formed by 7 x 7 x 3 neurons (7 x 7 cells,
and 3 parameters per cell), it has one hidden layer with 7 x 7 neurons and the
output layer has 4 neurons, representing each of the directions Ms. Pac-Man can
move. The direction of the bot is chosen based on the output neuron with the
greatest value. The total number of weights in the network is 7452.
    We used three different approaches to train the ANN: back-propagation, neu-
roevolution with a fitness function based on direct imitation and neuroevolution
with a fitness function extended with game-level parameters. In the first two
approaches we used the training set extracted from the 100 games played by the
human player. The precision of the network was computed as the ratio between
the number of times the ANN chose the same direction than the human player
and the total number of game states in the training set.
    The evolutionary algorithm used in this work for the last two approaches is a
genetic algorithm (GA) developed in Java to operate directly with the Ms. Pac-
Man vs ghosts API. This algorithm implements a codification based on floating
points genes for the chromosome of the individuals, so every chromosome in the
population represents a different ANN for its genes correspond to the weights of
this ANN. At the beginning the GA initialise a population with 200 individuals,
all of them created with random genes. In each generation, we use the usual
genetic operators (selection, crossover, mutation and replacement) to produce
new individuals, we evaluate them using a fitness function and discard the worst.
This process is repeated 500 times.
    The operators used in the experiments of this work were:

 – Selection by tournament: 10 individuals of the population are selected ran-
   domly. Among these individuals the one with the better fitness value is
   selected.
 – Plain Crossover : 2 descendants are generated. For every gene in the chro-
   mosome, the value of the gene of the descendant in the position i is chosen
   randomly in a range defined by the genes of the progenitors that are located
   in the same position.
 – Mutation by interchange: Randomly the 1% of the genes are selected and
   mutates and their values are interchanges.
 – Substitution of the worse: The descendants replace the individuals with worse
   fitness from all the population.

   We use two different fitness functions to evaluate the human-likeness of the
bot:
    Fig. 5. Evolution of the NE algorithm using cross-validation with the ANN.



 – The first one transforms each individual into an equivalent ANN and eval-
   uates the number of times the ANN produces the same direction than the
   human in each game state from the training set (precision value P1 ).
 – The second fitness function, besides computing P1 , also plays 50 new games
   measuring the values of the high level parameters described in the previous
   section and computes a second precision value based on the mean squared
   error between the values obtained by the bot and the values obtained by the
   human player (precision value P2 ). The final fitness value of the individual
   is computed as lineal combination of the 2 precision values. Initially in our
   experiments we assigned the same importance to both of them (50-50). Note
   that using different weights we can give more importance to the direct im-
   itation approach or to the indirect one, and in the future we plan to study
   how that affects the behaviour of the bot.


7   Results and Discussion

The results of our experiments are summarized in Figures 5, 6, 7 and 8.
   The precision of the ANN with back-propagation using cross-validation with
the training set of the 100 human games in 100 iteration is approximately 50%.
Using NE, the precision of the system after 500 generations is 60% (see Figure 5),
representing a 20% improvement over back-propagation.
   Using only the high level parameters obtained by the bot implemented for
the fitness function produced very poor results: %32 precision (see Figure 6).
   When we combine NE with high-level parameters (HL Params) and a virtual
player (50% ANN precision + 50% quadratic error of the HL Params obtained
by the bot) precision is close to 40%. The Figure 7 shows the evolution of the
NE algorithm with the final values (after adding the two fitness values), and the
Figure 8 shows the evolution with the different fitness values disaggregated (as
can be seen, the ANNs cross-validation is considerably better than the fitness
obtained by the bot and the HL Params).
 Fig. 6. Evolution of the NE algorithm using the HL Params obtained by the bot.




  Fig. 7. Evolution (min., max. and aver-    Fig. 8. Breakdown of the different fit-
  age precision) of the NE algorithm using   ness values in the evolution of the NE
  cross-validation with the ANN and the      algorithm.
  HL Params.


    At this preliminary stage in the system development, ANN virtual player is
an incompetent player. It easily enters in non-sense cycles, losing the game really
soon. It chooses non-accessible tiles frequently, which shows there is plenty of
room for improvement.
    Although the NE improves the ANN back-propagation training results in a
20%, we think that before using the NE approach it is important to improve our
ANN-only version. 50% of precision is a low value for just 4 directions available
to the virtual player.
    As part of our experiments we tried to complicate the ANNs topology by
adding more neurons in the hidden layer but this did not produced better results,
rather the opposite, precision was significantly worse. Despite this, We think the
topology of the network should be reconsidered, because it is very simple right
now.
    Results achieved using only the HL Params obtained by the bot suggest that
the controller is not able to play in a minimally acceptable way. As said, this
is because it enters easily in cyclical states being unable to explore the level
without getting stuck soon, so until this controller is not more competent there
is little point to measure these parameters. In addition, the accuracy obtained
when using both fitness (40%) is due in part to the fitness obtained by the bot.
     The train set used for these experiments is also not complex enough for
learning (34.000 game states are traced in our 100 game sessions, but only 600
are different), so we have plans to enrich it.


8   Conclusions

As part of our research on simulating human play in video games, we are working
on training virtual players using Neuroevolution. In our work we use classic
arcade Ms. Pac-Man as testbed to compare two different methods. First one uses
raw data extracted directly from human traces, while second one also considers
more elaborated and abstract parameters as the final score, the average distance
to the closest ghost or the number of times the player turns and changes its
route.
    Our findings suggest that it is necessary to establish a reasonable performance
level for automatic controllers before comparing virtual and human players from
an external point of view. A significant level of competence in the basics of the
game and its strategy seems to be a prerequisite for starting any training for
human-like imitation.
    It also seems necessary to extend the information on the domain of the game
state, as our 7 x 7 window representation does not take into account any type
of information outside its tiles. Thus the network does not know if there is an
important amount of pills more than 3 tiles away from Ms. Pac-Mans position.
A solution for this would be to extend the size of the window (extending signif-
icantly the entries in the ANNs) or adding some extra information to the state
(for example the number of pills and ghosts in each external quadrant to the
window).
    We assume that an improved game state representation will mean significant
changes in the accuracy of the ANN but also a more complex configuration in
the network will have a big impact. So rethink the topology of the ANN will be
an important point to consider.
    Once the ANN obtained better results, we would improve the bot extending
its logic to play in a more competent way, and then measure the HL Params
obtained in the fitness function of the NE algorithm.
    Another field of study we want to address is using NEAT (NeuroEvolution of
Augmenting Topologies) to make the NE not only evolve the weights of the ANN
but also its structure [10]. Probably we are also going to explore other areas of
machine learning, as Reinforcement Learning, which is a promising technique
different to standard supervised learning.


Acknowledgements

Antonio A. Sánchez-Ruiz would like to thank the support of the Spanish Ministry
of Economy and Competitiveness under grant TIN2014-55006-R.
References
 1. Apache Software Foundation: Neuroph (java neural network framework), http:
    //neuroph.sourceforge.net/
 2. Gallagher, M., Ledwich, M.: Evolving pac-man players: Can we learn from raw
    input? In: Proceedings of the 2007 IEEE Symposium on Computational Intelligence
    and Games, CIG 2007, Honolulu, Hawaii, USA, 1-5 April, 2007. pp. 282–287 (2007)
 3. Getomer, M.J., Okimoto, B.J.: Gamers on youtube: Evolving video con-
    sumption       (July     2013),     https://www.thinkwithgoogle.com/articles/
    youtube-marketing-to-gamers.html
 4. Goldberg, H.: All Your Base are Belong to Us: How 50 Years of Videogames Con-
    quered Pop Culture
 5. Hingston, P.: A new design for a turing test for bots. In: Proceedings of the 2010
    IEEE Conference on Computational Intelligence and Games, CIG 2010, Copen-
    hagen, Denmark, 18-21 August, 2010. pp. 345–350 (2010)
 6. Kent, S.: The Ultimate History of Video Games: From Pong to Pokémon and
    Beyond : the Story Behind the Craze that Touched Our Lives and Changed the
    World
 7. Miranda, M., Peinado, F.: Improving the performance of a computer-controlled
    player in a maze chase game using evolutionary programming on a finite-state
    machine. In: Proceedings 2o Congreso de la Sociedad Española para las Ciencias del
    Videojuego, Barcelona, Spain, June 24, 2015. pp. 13–23 (2015), http://ceur-ws.
    org/Vol-1394/paper_2.pdf
 8. Ortega, J., Shaker, N., Togelius, J., Yannakakis, G.N.: Imitating human playing
    styles in super mario bros. Entertainment Computing 4(2), 93 – 104 (2013)
 9. Schaffer, J.D., Whitley, D., Eshelman, L.J.: Combinations of genetic algorithms and
    neural networks: A survey of the state of the art. In: Whitley, D., Schaffer, J. (eds.)
    Proceedings of the International Workshop on Combinations of Genetic Algorithms
    and Neural Networks (COGANN-92). pp. 1–37. IEEE Computer Society Press
    (1992)
10. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting
    topologies. Evolutionary computation 7(1), 99,127 (6 2002)
11. Togelius, J., Nardi, R.D., Lucas, S.M.: Towards automatic personalised content cre-
    ation for racing games. In: 2007 IEEE Symposium on Computational Intelligence
    and Games. pp. 252–259 (2007)
12. Togelius, J., Yannakakis, G., Shaker, N., Karakovskiy, S.: Believable Bots, chap.
    Assessing believability, pp. 219–233. P. Hingston (Ed.) (2012)
13. Yannakakis, G.N., Maragoudakis, M.: Player modeling impact on player’s enter-
    tainment in computer games. In: User Modeling 2005, 10th International Confer-
    ence, UM 2005, Edinburgh, Scotland, UK, July 24-29, 2005, Proceedings. pp. 74–78
    (2005)