=Paper=
{{Paper
|id=None
|storemode=property
|title=Using a Genetic Algorithm to Evolve a D* Search Heuristic
|pdfUrl=https://ceur-ws.org/Vol-710/paper31.pdf
|volume=Vol-710
|dblpUrl=https://dblp.org/rec/conf/maics/GieseS11
}}
==Using a Genetic Algorithm to Evolve a D* Search Heuristic==
Using a Genetic Algorithm to Evolve a D* Search Heuristic
Andrew Giese and Jennifer Seitzer
University of Dayton
gieseanw@gmail.com, seitzer@udayton.edu
Abstract Additionally, we use the simulator that will eventually
Evolutionary computation (EC) is the sub-discipline of benefit from this optimized SEF to provide feedback in the
artificial intelligence that iteratively derives solutions using evolution process. That is, the simulator serves as the
techniques from genetics. In this work, we present a fitness function for the evolving SEF.
genetic algorithm that evolves a heuristic static evaluation
function (SEF) function to be used in a real-time search This work is novel in that it combines techniques of
navigation scheme of an autonomous agent. This coupling evolutionary computation using genetic algorithms and the
of algorithmic techniques (GAs with real time search by use and refinement of a heuristic for the D* algorithm.
autonomous agents) makes for interesting formalistic and There are many applications of genetic algorithms in
implementation challenges. Genetic evolution implies the diverse domains such as bioinformatics (Hill 2005),
need for a fitness function to guide a convergence in the gaming (Lucas 2006), music composition (Weale 2004),
solution being created. Thus, as part of this work, we and circuit optimization (Zhang 2006). Additionally, work
present a fitness function that dictates the efficacy of a in D* has been studied and developed in theory
generated static evaluation function. In this work, we (Ramalingam 1996) as well as specific applications such as
present algorithmic and formalistic designs,
robotics (Koenig 2005). We are using the all-inclusive
implementation details, and performance results of this
multi-layered software endeavor.
examination that genetic algorithms affords us to find the
perfect (or near perfect) heuristic function for a derivative
of the very traditional AI search, A*.
Introduction
The A* Algorithm is a greedy best-first search algorithm The A* and D* Algorithms
that is used to calculate an optimal path between two
points, or nodes, on a graph (Hart et. al 1968) The In this work, the evolution of a static evaluation function
algorithm can be adapted to a run in real-time by way of using a genetic algorithm is applied to an autonomous
restarting execution as new environment information agent operating in an environment provided by Infinite
becomes available, called a Dynamic A* (D*) search. The Mario Bros., an open-source, faithful recreation of
A* search uses a static evaluation function (SEF) that uses Nintendo’s Super Mario World. The agent (Mario) uses a
heuristics to find a path of state transformations from a realtime execution of an A* search, called D*, to direct its
start state to a goal state. The SEF assesses the merit of movement through the environment to ultimately reach the
each child state as it is generated by assigning it a numeric goal (the end of the level). Mario may use information
value based on information about that state. The score about what is currently visible onscreen, but beyond that
allows the A* to direct its search by prioritizing the nothing is known, making a calculation of an actual path to
expansion of child nodes that could potentially expand into the goal impossible. Therefore, the SEF of the D* must
a goal state while neglecting child nodes that are less likely direct Mario towards states that are on the path to the goal.
to lead to a goal.
Mario has a total of thirteen distinct action combinations
In a real time environment, information about the actual that allow him to negotiate the environment. These are
goal state is unavailable and unobtainable for any given move left, move right, duck, and two others—jump and
iteration of the search for an agent (because it is out of speed—that can be used in combination with the other
range of the agent’s sensors). Therefore, the SEF must actions and each other. Jump allows movement along the
direct the search to the most appropriate state that y-axis, and can be used in combination with right, left, and
anticipates system information as it becomes available. In duck. Speed allows for faster movement left or right, and
our work, the SEF seeks to maximize some aspects of an higher jumps. This means that from any state, there could
agent’s state while minimizing others. By evolving a be up to thirteen child nodes. Since the agent must operate
weight on each “aspect-variable”, we are able to create at 24 frames per second, the agent is allotted approximately
offline a highly effective SEF that can predict obstacles 40 milliseconds to perceive its current state, decide what to
and challenges that occur in real time during the execution do next, and return a chosen action. With up to thirteen
of D*. In this paper we present the offline pursuit of using child nodes from any node in the search tree, any algorithm
a genetic algorithm as a mechanism to evolve an optimal that decides what Mario is going to do next must do so
SEF to be used in the real-time execution of A*. quickly and efficiently. A “brute force” approach that
analyzes all possible children was infeasible given
available computing machinery, and therefore a dynamic performing action A from node M. A static evaluation
D* search is more appropriate. function provides weights on Mario’s X position, X
velocity, Y position, Y velocity, Mario’s damage taken,
The SEF of a D* search uses information about a state to whether Mario is carrying a shell, Mario’s X position
direct the search efficiently. For Mario, much information multiplied by X velocity, and Mario’s Y position
is immediately available from each percept provided by multiplied by Y velocity. These weights are values between
the environment. This information includes the position of -1 and 1. After multiplying weights to their associated state
Mario, the amount of damage Mario has taken, the variables, the sum of products forms the final SEF score for
positions and types of enemies onscreen, and position and that node.
types of obstacles onscreen. Other information can be
tracked over time, like number of kills, X velocity, Y
velocity, coins collected, time remaining, etc. The task of
our system was to discover what effects, if any, the values
of these variables should have on the valuation performed
by the SEF of a node in the search graph. A high value for
a variable might proportionally increase the cost to
transition to that state, or conversely could proportionally
decrease the transition cost.
The System
In 2009 and 2010 Julian Togelius of the ICE-GIC held a
competition for entrants to create an autonomous agent
(bot) that would play Markus Persson’s Infinite Mario
Bros. the best. “Best” in this sense means the distance a
bot could travel within a given level and time limit. If two Figure 1
bots finished a level they were awarded equal scores, but if
neither finished, the bot that travelled furthest was deemed This SEF score is an estimation of the amount of work
better. In both iterations of the competition, the same bot required to reach a goal state from the current node, and as
was victorious. This bot was written by Robin Baumgarten such nodes with lower SEF scores are preferable. As an A*
(Baumgarten). Robin’s bot used a D* search coupled with algorithm dictates, the level of the search tree at which the
an accurate method for expanding child nodes, and a node was discovered is also added to the score. This is the
human-generated static evaluation function for the D*. Our “greedy” part of an A* search where not just a solution is
system is a heavily modified version of Robin’s, with the desired, but the best solution. For Mario, the cost to
majority of the A* rewritten for legibility and efficiency transition from one state to another is uniform; all
while the means to produce child nodes was mostly neighboring states have the same arc cost to travel to a
preserved. neighbor. Adding the sum of arc costs into the SEF score
for a node is a means by which the “work” required to
Every 24 frames, the environment provides the agent with reach a node in the graph is represented, so that if the same
a percept that includes the locations and types of all sprites node is reached by two separate paths, the shortest path is
on the screen, including Mario. The agent must return an favored. Since the D* search operates in a partially
action to the environment that the environment then effects observable world, an admissible heuristic is difficult to
upon Mario. For each percept received, the agent runs an discern, hence the motivation for a Genetic Algorithm to
A* search for 39ms or until the agent has planned out to 15 search for the optimal weights to apply in the SEF.
levels of the search tree. The agent keeps an internal
representation of the world, and tracks Mario’s x and y During the D* search, children nodes are generated from
velocities among other things not provided by each the current state of the agent. Generated children are placed
percept. on an open list sorted from lowest to highest scores. The
child with the lowest score is taken from that list, and its
After ensuring that its internal representation is consistent children are generated. This process repeats until the agent
with the environment-provided one, the agent begins an has searched for 39ms or has searched 14 states (empirical
A* search from Mario’s current position and velocity. number), at which point it returns the action that leads to
Children are generated by considering which actions are the most optimal path for the current available information.
available to Mario at any node. That is, a child state
reflects where Mario would be and how fast he would be The values for the weights used in the agent’s SEF
moving if performed action A from node M. (Figure 1) A mentioned above are deemed to be “unknown” to the
child state also informs the search of whether Mario would system, and are provided via parameters supplied by an
take damage, die, kill an enemy, collect a coin, etc. upon external entity, in this case a Genetic Algorithm. The
genetic algorithm is implemented as defined in (Russell The Experiment
and Norvig 2003). The chromosome being evolved is an
array of 8 floating point values, each between -1.0 and 1.0. The Experiment was conducted across two iterations of the
The mutation rate was 1%. Genetic Algorithm. For the initial one, a starting population
of 10 chromosomes instantiated with random values was
Each generation of chromosomes was tested for fitness by created. A total of 800 generations were iterated over, with
running a simulation on a training level where the agent five children produced per generation. The test level had a
used the chromosome’s genes as the weights on state- time limit of 36 in-game seconds (~26 seconds in realtime),
variables evaluated by the SEF in a D* search. The fitness and a length of 300 blocks (~4800 pixels). The level’s
of the chromosome was a summation of Mario’s distance “seed” used by the level generation engine was 4 and the
travelled, and if he completed the level, also the remaining difficulty was set to 15. The program execution lasted over
time Mario had to complete the level. A higher fitness 20 hours.
score indicates a better, or more fit, chromosome. This is
in contrast to the Static Evaluation function where a lower After this initial iteration completed, five of the top-scoring
score indicates a more ideal state. agents were used as the starting population for the second
iteration of the Genetic Algorithm. The level length was
The test level that each bot was scored on had a variety of increased three-fold to a length of 900 blocks (~14400
characteristics. The most important of these is that the pixels), the time limit set to 100 in-game seconds, the
level was short. As each chromosome needed to be used in “seed” to 65, and the difficulty retained at 15. 320
an actual bot, a single fitness test could last upwards of a generations were evaluated, again with five children
minute even if the bot could finish the level successfully. produced for each generation. As this test level’s length
A short level guaranteed that if a bot was going to finish a and time were much larger than the first iteration of the
level, it could do so without much time spent. The second GA, the execution time prolonged to about 30 hours.
characteristic of the level was an imposed time limit. This
time limit places an upper bound on the possible time a bot Results
could spend in a level. Slow bots, bots that stood still, or
bots that got stuck therefore all required a maximum of N The technique of using a GA to evolve the SEF of a D*
seconds to evaluate. search allowed a system to generate an effective SEF in the
absence of a priori knowledge about what makes one agent
An ideal level must also contain challenges and obstacles state more desirable than another. The results of this
that a full level will have on its maximum difficulty. These experiment demonstrate little to no direct correlations
challenges include portions with a high volume of between individual weights and bot scores, implying a
enemies, some which that cannot be destroyed by landing trial-and-error search for a human would be difficult and
on their heads; portions with Bullet Bill towers of varying time-consuming.
heights; gaps of varying width; pipes with Piranha Plants
leaping out of them; and portions with mixtures of these For the initial iteration of the GA, over three thousand
scenarios. unique bots were evaluated over the course of 800
generations. 285 of those tied for the top score of 3955. An
Optimization to D* and GA interesting note is that the first bot to score this amount was
produced during the 11th generation of the GA.
The D* search still performed sub-optimally given
computing hardware, so the search tree needed to be pared The weights used in the bot’s SEF that the GA iterated on
down. Paring the tree followed a simple formula: if two
varied greatly. Figures 2 and 3 show typical scatter plots
child nodes generated the same score from the SEF, the
for the values of weights over the course of the GA’s
first child node was kept and the other discarded. In a execution.
further endeavor to pare the tree, the maximum degree for
a node was reduced from 13 to 11 by discounting nodes
reachable through the action of ducking by the Agent. In
an effort to avoid a bias in the reproduction phase of the
genetic algorithm, a generated and tested chromosome was
only added to the population if either its fitness score was
unique or, failing that, the genes on the chromosome were
unique among the chromosomes with the same fitness. If
this precaution was not taken, a glut of identical
chromosomes with the same score could skew the parental
selection process unfairly.
The scores that bots received likewise reached a local
maximum early (generation 11), and were unable to
improve thereafter (Figure 5).
Figure 2
Figure 5
Upon examining the possible correlation between weight
values and bot scores, a similar quandary is encountered
where bots received top scores despite the weight values
(Figures 6 and 7), save for the case of the weight on X
Position multiplied by X Velocity (Figure 8). In the case of
the value of the weight on the agent’s X Position multiplied
by the agent’s X Velocity, a negative weight positively
correlates to a higher score, and every single positive
weight has a score of 0.0 or less (a negative score indicates
the agent travelled backwards).
Figure 3
The weights on state-variables may appear to quickly
converge to a few values and remain there. However, over
time the amount of variance for any weight does not
decline linearly. Figure 4 shows a plot of the amount of
variation for each weight grouped by 50 generations. No r
declination of the standard deviation among populations of
weight values is present.
Figure 6
Figure 4
Figures 10 and 11 almost perfectly mirror Figures 6 and 7
in their distribution of scores for weight values on X
Position and Y Position. Figure 12 likewise mirrors the
data in Figure 8 that indicates negative weights on the
agent’s X Position multiplied by the Agent’s X Velocity
correlate to higher scores.
Figure 7
Figure 10
Figure 8
For the second iteration of the GA, similar results to the
first iteration were obtained. However, only 3 bots out of a
population of over a thousand shared the top score. Figure
9 presents the distribution of scores that bots received
during the course of execution. Since the initial population
of this iteration comprised top-scoring bots of the first
iteration, it is understandable that so many bots scored so Figure 11
well so early, however a clear ceiling to the scores is
visible, indicating the algorithm likely could not escape a
local maxima.
Figure 12
Although the weight values produced by the Genetic
Figure 9 Algorithm for the top bots were distributed across the
gamut of possible values, the end result was in fact bots
that performed roughly as well as Robin Baumgarten’s bot allow an agent to make an accurate prediction of an optimal
that won the ICE-GIC competition two years in a row. path before receiving its next percept.
Table 1 displays a comparison of bot scores between
Robin’s Bot (AStarAgent) and our bot
(AStarAgentEvolved) for a variety of levels whose
difficulties were set to 15.
Table 1
Conclusions References
In this work, we presented the novel technique of using a Baumgarten, Robin. Infinite Super Mario AI. 9 September
genetic algorithm as an offline meta-search for an optimal 2009. 8 February 2011
static evaluation function to be used by the D* search of a .
real-time autonomous agent. The end results were Static
Evaluation Function parameters that, upon use in the SEF Hart, Peter E., Nils J. Nilsson and Bertram Raphael. "A
for a real-time agent, enabled the agent to perform as well Formal Basis for the Heuristic Determination of Minimum
as the current best in its environment. Cost Paths." IEEE Transaction of Systems Science and
Cybernetics SSC-4, No. 2 (1968): 100-107.
The fact that our agent performed as well as the current
best is significant because we made very few assumptions
about the valuation of agent states in a static evaluation Hill T, Lundgren A, Fredriksson R, Schiöth HB (2005).
function. That is, the algorithms presented in this paper "Genetic algorithm for large-scale maximum parsimony
automated this task. The implication is that similar phylogenetic analysis of proteins". Biochimica et
techniques could be employed for autonomous agents in Biophysica Acta 1725 (1): 19–29.
other, possibly real-world, environments with high
confidence in the end result. Koenig S. and Likhachev M. Fast Replanning for
Navigation in Unknown Terrain. Transactions on Robotics,
Future Work 21, (3), 354–363, 2005
The work presented here has much potential for expansion. Lucas, S., and Kendell, G. (2006). Evolutionary
Future work should include utilizing parallel computing computation and games. IEEE Comput Intell Mag., pp. 10–
clusters like Beowulf to take advantage of the natural 18
independence between the analyses of members in a
population by the GA’s fitness function, as well as the Mitchell, M. (1998). An introduction to genetic algorithms.
evaluation of nodes in the open list of the D* algorithm by
MIT Press.
the algorithm’s SEF. This sort of capability will allow for
not only a deeper D* search, but shorter generations in the
Genetic Algorithm and therefore the ability to run the Ramalingam G., Reps T., An incremental algorithm for a
algorithm for more generations in the same amount of generalization of the shortest-path problem, Journal of
time. Algorithms 21 (1996) 267–305.
Potential future work could also include employing pattern Russel, Stuart J. and Peter Norvig. Artificial Intelligence: A
matching techniques to identify a discrete set of distinct Modern Approach. Upper Saddle River: Prentice
scenarios an agent would encounter. An agent could then Hall/Pearson Education, 2003.
utilize a separate SEF for each scenario.
Zhang, J., Lo, W.L., and Chung, H.
Under the notion of pattern-matching, even further future (2006).Pseudocoevolutionary Genetic Algorithms for
research would focus on generating probability tables for Power Electronic Circuits Optimization. IEEE Trans
the likelihood of scenarios occurring after each other. Systems, Man, and Cybernetics, 36C (4), pp. 590–598.
Knowing the probability of a scenario to occur next would