=Paper= {{Paper |id=Vol-2282/EXAG_116 |storemode=property |title=Evolving NPC Behaviours in A-life with Player Proxies |pdfUrl=https://ceur-ws.org/Vol-2282/EXAG_116.pdf |volume=Vol-2282 |authors=Vadim Bulitko,Mac Walters,Matthew R. Brown |dblpUrl=https://dblp.org/rec/conf/aiide/BulitkoWB18 }} ==Evolving NPC Behaviours in A-life with Player Proxies== https://ceur-ws.org/Vol-2282/EXAG_116.pdf
                         Evolving NPC Behaviours in A-life with Player Proxies

     Vadim Bulitko                         Mac Walters               Morgan Cselinacz                 Matthew R. Brown
   Computing Science                        BioWare                       Psychology                   Computing Science
  University of Alberta                  Electronic Arts              University of Alberta           University of Alberta
   Edmonton, Alberta                    Edmonton, Alberta              Edmonton, Alberta               Edmonton, Alberta
bulitko@ualberta.ca                    mac@bioware.com             cselinac@ualberta.ca             mbrown2@ualberta.ca



                           Abstract                                 the resulting behaviours are limited, rigid, or simply ran-
                                                                    domized, breaking the illusion of a rich lived-in game world.
  Game development costs are on the rise as players expect             Procedural content generation is an active area of research
  massive open-world games populated with richly interactive
  non-playable characters (NPC). Procedural content genera-
                                                                    that strives to generate in-game content algorithmically (To-
  tion has the potential to reduce the development costs as well    gelius et al. 2013). While many types of content can be pro-
  as make the content more player-specific. Recent work on          cedurally generated, we focus on Artificial Intelligence con-
  evolving artificial intelligence for NPCs focused on combat       trolling non-playable characters. Generating such AI proce-
  between NPCs and the player. In this paper we propose to          durally opens a door to doing it on a per-player basis in an
  evolve an ecosystem of NPCs for a broader class of games.         attempt to have meaningfully customized player-specific ex-
  To allow players’ actions and playstyles to inform the evo-       periences. In this paper, we follow in the footsteps of recent
  lution, we propose to model actual players and create AI-         work on enemy AI generation via a simulated Darwinian
  controlled proxies to evolve NPCs against. The approach re-       evolution (Soule et al. 2017). Polymorphic Games’ com-
  moves the time pressure from the evolution and allows for         mercial game Darwins Demons evolved space-invader-like
  traditional quality assurance methods while keeping the evo-
  lution player-informed.
                                                                    creatures depending on the players’ strategy by defining a
                                                                    per-round fitness function. As evolution normally takes a
                                                                    large number of generations (i.e., game rounds), the space
                     1    Introduction                              of possible creatures and the mutation rate have to be care-
Modern video games such as Fallout 4 (Bethesda Game Stu-            fully constrained to make the on-line evolution fast enough
dios 2015) invite players to spend hundreds of hours explor-        so that a single player can see its effects. In this paper, we
ing vast open worlds. The story delivery and world exposi-          propose an alternative by moving the evolution off-line (i.e.,
tion critically depend on the depth of interactions with nu-        onto company servers without a direct involvement of play-
                                                                    ers). We keep evolution player-specific by evolving NPCs
merous non-playable characters∗ (NPCs) controlled by Ar-
tificial Intelligence (AI). Most games create only an illusion      against player agents modeled after real players.
of habitation. For instance, the NPCs of Fallout 4 staff mer-          This paper is a substantially extended version of our pre-
chant booths in the day and assume a sleeping position at           viously published one-page abstract (Bulitko et al. 2018).
night. They likely do not depend on the in-game economy             The additions include a significantly more detailed prob-
or sleep to survive but merely execute behaviours scripted          lem formulation, related work analysis, a description of the
by game developers.                                                 multi-stage on-line/off-line evolution and present state of the
    With players expecting progressively richer, more de-           project as well as related philosophical questions.
tailed, temporally extended interactions (Delahunty-Light
2018), the costs of manually scripting NPCs continue to rise.                     2    Problem Formulation
Even major game developers struggle to populate worlds              The problem we are proposing to solve is to procedurally
with enough interesting interactions. It is these limitations       generate AI for non-playable characters in video games.
that are currently holding back most developers from achiev-        Such generation should (i) be light on game developer
ing truly dynamic, reactive living worlds and encounters            labour, (ii) create reliable AI which allows for traditional
with NPCs. Additionally, while manually scripting NPC be-           quality assurance methods and (iii) take the players’ be-
haviours enables traditional quality-assurance techniques,          haviour into account to facilitate player-specific game ex-
                                                                    periences. By the latter, we mean deep and global effects
                                                                    of the players’ actions. For instance, by killing all ghouls in
                                                                    Fallout 4, the player should be able to irreversibly affect the
                                                                    entire ecosystem in a non-trivial way. We will refer to this
                                                                    effect as persistent adaptation.
                                                                       We will measure the effectiveness of our approach by
                                                                    measuring play time and using the measure as a proxy to
players’ engagement level.                                        their questions to the lecture/lab periods where they interact
                                                                  with an actual instructor (Lage, Platt, and Treglia 2000).
                   3    Related Work                                 Instead of evolving a single type of NPC combating the
                                                                  player, we adapt the A-life setting similar to the one used
Procedural content generation creates game content mostly         by Bulitko et al. (2017) and Soares et al. (2018) in which
or entirely automatically (Togelius et al. 2013). Evolution-      NPCs form an ecosystem complete with multiple species
ary algorithms (e.g., neuroevolution) have often been used        and resources. We conduct the evolution off-line on servers
to create NPC behavior in a well-defined competitive set-         at the game studio, which removes the time pressure and al-
ting (Risi and Togelius 2017; Soule et al. 2017). We are          lows for traditional quality assurance methods. It also allows
interested in generating ambient NPCs that contribute to a        game developers to run multiple evolutions and select the
believable and immersive world primarily through their AI-        one with more interesting evolved behaviours. Such detec-
controlled group behaviour rather than through their appear-      tion of interesting behaviour can even be automated (Soares
ance or attributes (Ruela and Guimarães 2017).                   et al. 2018).
   For instance, No Man’s Sky (Hello Games 2016) and No              However, such A-life based evolution of NPC behaviours
Man’s Sky Next (Hello Games 2018) boast an impressive va-         is not responsive to players’ actions. Thus, we borrow the
riety of aesthetically diverse procedurally generated worlds      idea of drivatars (Turn 10 Studios 2013) and player model-
with various flora, fauna and creatures to interact with. How-    ing (Thue et al. 2007) and put the players’ behaviour back
ever, the interactions themselves are too shallow and repet-      into the evolution in the form of a non-evolving AI agent
itive to encourage exploration. In contrast, the side quest       representing a player. The process then proceeds in stages
Come Fly with Me in Fallout: New Vegas (Obsidian En-              as depicted in Figure 1.
tertainment 2010) also involves a space-faring mission, is
beautifully hand-crafted and leaves a long-lasting impres-
sion, unlike many encounters in the No Man’s Sky games.
   Consequently, game developers usually hand-craft NPC
behaviours, which is either expensive or appears repeti-
tive if the same behaviour scripts/trees are reused for many
NPCs. Additionally, such canned behaviours/interactions do
not facilitate world-scale changes (unless specifically pro-
grammed in) and thus lack persistent adaptation.
   Recent work attempted to evolve NPCs on-line (i.e., dur-
ing game play), in response to players’ actions and strate-
gies. Doing so, however, required an easily computable fit-
ness function, which works better in a well-defined com-
petitive setting (Risi and Togelius 2017; Soule et al. 2017;
Polymorphic Games 2018). Furthermore, as evolution nor-
mally takes many generations to deliver interesting artifacts,
on-line/in-game evolution has to be greatly sped up so that          Figure 1: Multi-stage evolution of NPC AI in A-life.
the player can see its effects before they lose interest in the
game. Doing so, for instance, by setting the mutation rate           At first, NPCs evolve in A-life without any player input
unusually high has undesirable consequences as the process        (Stage 1). Then the evolved NPC behaviour is pushed out to
becomes more random, obscuring meaningful responses to            the player base via a digital download. The players then in-
players’ strategies. Furthermore, conducting evolution on-        teract with the evolved NPCs in game, and their interactions
line precludes traditional quality assurance methods which        are recorded and sent back to the studio (Stage 2). The inter-
may make game developers feel uneasy. Since evolution is          actions are used to create AI agents approximating the play-
an inherently randomized process, guaranteeing interesting        ers behaviours.† The resulting AI agents (i.e., “drivatars” or
outcomes is also problematic.                                     player proxies) are then put in the A-life environment and
                                                                  inform the next stage of NPC evolution (Stage 3). The pro-
                                                                  cess is then repeated with the evolution being informed by
               4    Proposed Approach                             the actual human players at even-numbered stages and by
As discussed above, recent efforts on using evolution for         player proxies at odd-numbered stages.
NPC AI focused on well-defined competitive games, did not
allow for traditional quality assurance and could potentially            5    Current State and Future Work
result in seemingly random responses to players’ strategies
                                                                  We currently have an A-life environment consisting of a
or simply a lack of interesting evolved behaviours.
                                                                  simple predator-prey model in a 2D grid environment. Each
   We propose to address all of these shortcomings by adapt-      NPC is controlled by its own deep artificial neural network
ing the flipped classroom model presently becoming popular        which observes the world and selects the next action. The
in academia. In such a model, most material is made avail-
able on the Internet and is studied by the students at their         †
                                                                       We propose to cluster observed human player behaviours and
own pace, outside of the class time. The students then bring      generate a single AI agent per cluster.
network perceives the world as the raw pixel color values of      and the players emerge in the course of evolution (Ryan et
neighboring grid cells. The network has convolutional lay-        al. 2015)? What ethical and societal norms will a colony of
ers; its topology and innate weights are encoded in the NPC       NPCs develop over time?
genes. As the agents evolve, so do their brains (i.e., the con-      Third, will the players take on breeding ambient NPCs so
volutional networks). Larger networks have a great potential      that they can fight each other or so that they can trade/sell
for more complex behaviour but also consume more energy.          the bred NPCs (Risi et al. 2016)? Will a market of NPCs
The simulated evolution does not have discrete generations.       emerge? Will players embrace the autonomy of self-directed
Instead, the NPCs reproduce as long as they are sufficiently      NPCs in video games?
old and healthy.                                                     Finally, how much self-awareness will NPCs develop?
   In addition to evolution of their genes, the NPCs can also     This is related to the ability to communicate their learned
learn during their lifetime. We plan to use deep reinforce-       knowledge among themselves. Will the NPCs ever ponder
ment learning (e.g., DQN by Mnih et al. (2015)) with the          on the limits of their A-life simulation (Lem 1983a)?
genetically encoded, NPC-specific reward function (Ackley
and Littman 1991). Over generations, better reward func-                                7     Conclusions
tions will emerge in the genetic pool. Our preliminary exper-     In this paper we discussed recent efforts on player-informed
iments show feasibility of this approach. In order to increase    evolution for AI-controlled behaviours for non-playable
play times (i.e., our proxy for player engagement), we will       characters in video games. We feel it is a promising ap-
reward all evolving NPCs with a bonus reward proportional         proach and propose to broaden its scope beyond combat-
to play times at even-numbered stages.                            focus NPCs. To make such a larger evolution tractable, we
   We are presently working on equipping the agents with          propose to adapt the flipped classroom model in which most
an ability to utter symbols and listen for them. Our prelimi-     of the evolution happens off-line at the game studio. To keep
nary experiments show that shared meaning (i.e., a rudimen-       the evolution responsive to players’ actions/playstyles, we
tary language) quickly emerges if communicating helps sur-        propose to replace real players with AI-controlled proxies
vival. We are also working on detecting interesting evolved       during the off-line stages. Such proxies will be machine
behaviours automatically via unsupervised machine learning        learned from actual player behaviour during on-line stages
(e.g., deep convolutional autoencoders).                          of the process.
   Future work will introduce players into the A-life evo-
lution. We will start by allowing the players to control their
A-life avatars in real time. We will record players’ behaviour
                                                                                      Acknowledgments
and will attempt to generalize it into NPCs representing          We appreciate support from the National Research and En-
players. Then the full staged evolution can take place.           gineering Council.
   We plan to evaluate this approach at first in a simple A-
life environment and later in a commercial video game. We                                   References
are also working on deploying it in an interactive art instal-    Ackley, D., and Littman, M. 1991. Interactions Between Learning
lation where the public can interact with the NPCs by walk-       and Evolution. Artificial life II 10:487–509.
ing through the space and performing simple actions (e.g.,        Bethesda Game Studios. 2015. Fallout 4.
pointing at NPCs projected onto the walls). Their actions         Bulitko, V.; Carleton, S.; Cormier, D.; Sigurdson, D.; and Simp-
will be tracked via multiple cameras and players’ proxies         son, J. 2017. Towards positively surprising non-player charac-
can be generated and used in the off-line stages of the evo-      ters in video games. In Proceedings of the Experimental AI in
lution. We will examine how NPC behaviour evolves over a          Games (EXAG) Workshop at the AAAI Conference on Artificial In-
multi-day exhibition period.                                      telligence and Interactive Digital Entertainment (AIIDE), 34–40.
                                                                  Bulitko, V.; Doucet, K.; Simpson, J.; and Flynn, A. 2018. A-life
             6   Philosophical Questions                          for non-playable characters in video games. In Event Proceedings
While we framed the problem in terms of helping video             for Late-breaking Abstracts at A-LIFE conference.
game companies procedurally generate NPC AI, our ap-              Delahunty-Light, Z. 2018. Remember Skyrim’s Radiant AI? It’s
proach can be used to computationally study a number of           got the potential to revolutionise RPGs. GamesRadar+.
broad philosophical questions including the following.            Hello Games. 2016. No Man’s Sky.
   First, how much control will humans maintain over AI as        Hello Games. 2018. No Man’s Sky Next.
it becomes more powerful? At what point will AI start set-        Lage, M. J.; Platt, G. J.; and Treglia, M. 2000. Inverting the class-
ting its own goals (i.e., become self-directed)? Will they de-    room: A gateway to creating an inclusive learning environment.
sire freedom (Lem 1983b)? Will the NPCs develop hostility         The Journal of Economic Education 31(1):30–43.
towards players (Lem 1983b)?                                      Lem, S. 1983a. Boxes of Corcoran. In Memoirs of a Space Trav-
   Second, if the NPCs develop their own language, will they      eler: Further Reminiscences of Ijon Tichy. Harvest / HBJ Books.
use it as a survival adaptation? Will they explain their rea-     Lem, S. 1983b. Doctor Diagoras. In Memoirs of a Space Traveler:
soning to each other so that they can teach their young faster    Further Reminiscences of Ijon Tichy. Harvest / HBJ Books.
than merely through trial and error? Will the NPCs be com-        Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.;
pelled to explain their actions to the players? How will they     Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.;
learn to interact with the players? Will the players under-       Ostrovski, G.; Petersen, S.; Beattie, C.; Sadik, A.; Antonoglou, I.;
stand them (Lem 1983b)? Will deception of both each other         King, H.; Kumaran, D.; Wierstra, D.; Legg, S.; and Hassabis, D.
2015. Human-level control through deep reinforcement learning.
Nature 518(7540):529–533.
Obsidian Entertainment. 2010. Fallout: New Vegas.
Polymorphic Games. 2018. Project Hastur.
Risi, S., and Togelius, J. 2017. Neuroevolution in games: State of
the art and open challenges. IEEE Transactions on Computational
Intelligence and AI in Games 9(1):25–41.
Risi, S.; Lehman, J.; D’Ambrosio, D. B.; Hall, R.; and Stanley,
K. O. 2016. Petalz: Search-based procedural content generation
for the casual gamer. IEEE Transactions on Computational Intelli-
gence and AI in Games 8(3):244–255.
Ruela, A. S., and Guimarães, F. G. 2017. Procedural generation
of non-player characters in massively multiplayer online strategy
games. Soft Computing 21(23):7005–7020.
Ryan, J. O.; Summerville, A.; Mateas, M.; and Wardrip-Fruin, N.
2015. Toward characters who observe, tell, misremember, and lie.
In Proceedings of Experimental AI in Games Workshop, AIIDE
conference.
Soares, E. S.; Bulitko, V.; Doucet, K.; Cselinacz, M.; Soule, T.;
Heck, S.; and Wright, L. 2018. Learning to recognize a-life be-
haviours. In Poster collection: The Annual Conference on Ad-
vances in Cognitive Systems (ACS).
Soule, T.; Heck, S.; Haynes, T. E.; Wood, N.; and Robison, B. D.
2017. Darwin’s Demons: Does evolution improve the game? In
Proceedings of European Conference on the Applications of Evo-
lutionary Computation, 435 – 451.
Thue, D.; Bulitko, V.; Spetch, M.; and Wasylishen, E. 2007. Inter-
active storytelling: A player modelling approach. In Proceedings
of the Artificial Intelligence and Interactive Digital Entertainment
Conference, 43–48. Palo Alto, California: AAAI Press.
Togelius, J.; Champandard, A. J.; Lanzi, P. L.; Mateas, M.; Paiva,
A.; Preuss, M.; and Stanley, K. O. 2013. Procedural content gener-
ation: Goals, challenges and actionable steps. In Dagstuhl Follow-
Ups, volume 6.
Turn 10 Studios. 2013. Forza Motorsport 5.