=Paper= {{Paper |id=Vol-2844/games1 |storemode=property |title=Metis: Multi-Agent Based Crisis Simulation System |pdfUrl=https://ceur-ws.org/Vol-2844/games1.pdf |volume=Vol-2844 |authors=George Sidiropoulos,Chairi Kiourt,Lefteris Moussiades |dblpUrl=https://dblp.org/rec/conf/setn/SidiropoulosKM20 }} ==Metis: Multi-Agent Based Crisis Simulation System== https://ceur-ws.org/Vol-2844/games1.pdf
                  Metis: Multi-Agent Based Crisis Simulation System
             George Sidiropoulos                                            Chairi Kiourt                            Lefteris Moussiades
              georsidi@teiemt.gr                                      chairiq@athenarc.gr                              lmous@cs.ihu.gr
       Department of Computer Science,                       Athena-Research & Innovation Center               Department of Computer Science,
       International Hellenic University                      in Information Communication and                 International Hellenic University
                Kavala, Greece                                      Knowledge Technologies                              Kavala, Greece
                                                                        Xanthi, Greece
ABSTRACT                                                                               emotion contagion management, collision avoidance for pedestri-
With the advent of the computational technologies (Graphics Pro-                       ans, accurate decision models etc. are some of the most popular
cessing Units - GPUs) and Machine Learning, the research domain of                     subjects studied as part of the CS domain. Towards these direc-
crowd simulation for crisis management has flourished. Along with                      tions, the application of Machine Learning (ML) and especially
the new techniques and methodologies that have been proposed all                       Deep Learning (DL) approaches have increased and have also been
those years, aiming to increase the realism of crowd simulation, sev-                  applied in many case studies, with Reinforcement Learning (RL)
eral crisis simulation systems/tools have been developed, but most                     being an important leader in this studies and closely corelated with
of them focus on special cases without providing users the ability                     CS [18], [19] and [25].
to adapt them based on their needs. Towards these directions, in                          The domain of crowd management and analysis had seen inter-
this paper, we introduce a novel multi-agent-based crisis simulation                   est as early as 1958 [12], resulting more and more positive social
system for indoor cases. The main advantage of the system is its                       and scientific impact and being continually studied until now [5],
ease of use feature, focusing on non-expert users (users with little                   [14] and [31]. Most of these studies, focus on developing a level-
to no programming skills) that can exploit its capabilities a, adapt                   of-service concepts, designing elements of pedestrian facilities or
the entire environment based on their needs (case studies) and set                     planning guidelines [13]. Although the goals have remained the
up building evacuation planning experiments with some of the                           same, the demand and simulation scale has increased drastically.
most popular Reinforcement Learning algorithms. Simply put, the                        Nowadays, the complexity of planning correct emergency evac-
system’s features focus on dynamic environment design and crisis                       uations of large and small-scale buildings or building blocks has
management, interconnection with popular Reinforcement Learn-                          increased, requiring extensive and accurate planning focusing on
ing libraries, agents with different characteristics (behaviors), fire                 different architecture styles, appearances, functionalities and visitor
propagation parameterization, realistic physics based on a popular                     behaviors [26]. All those features (and many other), have become
game engine, GPU-accelerated agents training and simulation end                        an important aspect of designing a building for efficient evacuation
conditions. A case study exploiting a popular reinforcement learn-                     planning, which are important factors in simulations systems for
ing algorithm, for training of the agents, presents the dynamics and                   crisis scenarios, for example an evacuation of a building due to fire
the capabilities of the proposed systems and the paper is concluded                    or earthquake. This type of scenarios aim to improve the proce-
with the highlights of the system and some future directions.                          dure of risk assessments, emergency plans and the evacuation itself.
                                                                                       Also, they are usually tackled by crisis management preparation
CCS CONCEPTS                                                                           procedures, which include mock crisis scenarios (e.g. fire drills or
                                                                                       “mock evacuations”). Unfortunately, these types of procedures in
• Computing methodologies → Multi-agent systems; Rein-
                                                                                       many cases fail to prepare humans and are often ignored [8]. Thus,
forcement learning.
                                                                                       results obtained from those preparation projects cannot be used
                                                                                       to design accurate policies. For this reason, simulations systems
KEYWORDS                                                                               can be used as an additional method of evaluating a security policy
Multi-agent systems, Modeling and simulation, Agent-based system,                      of indoor or outdoor facilities. Simulations can take into account
Crowd evacuation, Crisis simulation                                                    the impact of different environmental, emotional and informational
                                                                                       conditions [29], but in most cases the simulation tools have been
1    INTRODUCTION                                                                      designed with specific facilities for specific cases
With the advancements of the recent years in computing capa-                              The research domain of Crowd Simulation for Crisis Manage-
bilities, Artificial Intelligence and web technologies the research                    ment (CSCM) has experienced an increasing interest the past years.
domain of Crowd Simulation (CS) has gained more and more inter-                        Crowd simulation is the process of simulating how a number of
est. This field has grown a lot the last decade (and keeps growing)                    entities (commonly large) move inside a virtual scene with a specific
and as a consequence, there are more and more techniques and                           setting [28]. Crisis simulations are systems that include entities with
methods proposed. For example, crowd behaviors simulation [21],                        more roles and responsibilities, on top of the existing techniques and
                                                                                       algorithms required for the physical and even psychological simu-
                                                                                       lation of those same entities. Moreover, the setting of the simulated
GAITECUS0, September 02–04, 2020, Athens, Greece                                       scenario varies a lot, from film production and military simulation
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons   to urban planning, which all require high realism concerning the
License Attribution 4.0 International (CC BY 4.0).
movements of those entities, their grouping and their behaviors in       highlighting some key features of the systems and presents some
general.                                                                 future work towards the enhancement of the prototype system.
   The most suitable approach of crowd and crisis simulation sys-
tems is the simulation of multiple individual entities [10]. Systems     2    RELATED WORKS
that follow this approach are called Multi-Agent Systems (MAS)           As mentioned before, nowadays there has been a plethora of sys-
and are consisted of multiple agents (entities to be simulated) and      tems developed for the simulation of different crisis scenarios. For
their environment (the setting in which they exist and can interact      example, Becker-Asano et al. presented a multi-agent system fo-
with each other) in which they may cooperate or compete towards          cused on first-persona perception and signs, taking dynamically
specific tasks/goals [32]. Based on the agents’ interactions and their   changing occlusions into account [2]. The implementation was
perception they perform actions to achieve their goal. Their struc-      done using Unity game engine1 , while also making it possible for
ture makes them befitting for crowd and crisis simulation research.      participants to be tested in the same virtual airport terminal, with
   In this paper, a novel crisis simulation system is introduced,        the combination of a head-mounted display “Oculus Rift”. Simonov
focusing towards the creation of a prototype system, that takes          et al. proposed a system for building composite behavior struc-
advantage of the plethora of simulation and performance enhancing        tures for large number of agents [26]. Their system was based on a
capabilities of a well-known game engine. The system’s key features      decision-making algorithm, implemented in Unreal Engine 42 . The
are:                                                                     path finding system exploited the Menge simulation with plugins
                                                                         and the system also included animation support, dynamic models,
   (1) Ease of use: users with little to no programming skills or        a visualization module and utility-based strategic level algorithms.
       experience can setup a crisis scenario and simulate it through        ESCAPES, a multi-agent evacuation simulation system, presented
       a user-friendly Graphical User Interface (GUI).                   in 2011, which incorporated different agent types with emotional,
   (2) Dynamic environment design: a feature that allows the users       informational and behavioral interaction [29]. The agent types in-
       to create their own building and environment based on their       clude individual travelers, families, authority and security agents.
       needs and case studies.                                           Additionally, the system incorporated information spreading to
   (3) Interconnection with popular Reinforcement Learning li-           agents, emotional interaction and contagion and the Social Compar-
       braries: allowing researchers to exploit popular RL algo-         ison Theory [9]. Evakuierungsassistent (translated as Evacuation
       rithms for the training of the agents or to try their own         Assistant) is another simulation system focused on the simulation
       algorithms.                                                       of evacuation of mass events (e.g. football stadiums), incorporat-
   (4) Dynamic crisis management: allowing the user to model a           ing realistic methods for real-time simulation [30]. The system
       specific structured pipeline of a crisis, for example two fires   is agent-based and exploits Cellular Automata (CA) methods and
       starting from different places.                                   Generalized Centrifugal Force Models [6].
   (5) GPU-accelerated agents’ training and simulation, for the              In 2013, De Oliveira Carneiro et al. presented a simulation system
       support of large multi-agent systems.                             to study the crowd’s behavior while evacuating a soccer stadium [7].
   (6) Simulation end options, which allows the user to specify          The system exploits the use of 2D CA defined over multiple grids
       when a simulation will automatically end.                         that represented different levels (state spaces) of simulated environ-
                                                                         ment. The system has the ability to simulate environments with
Additionally, we tested the new introduced system with a state-of-       complex structures composed of multiple floors. Sharm et al. [4],
the-art Deep Reinforcement Learning algorithm (DRL), resulting           proposed the first fire evacuation environment based on the OpenAI
high accuracy in training and quite well evacuations of agents in an     gym3 [25]. Moreover, they proposed a new approach that entails
indoor environment. It should be highlighted that GPU accelerated        pretraining an agent based on a Deep Q-Network (DQN) algorithm
training of the DRL was much faster than the CPU based training          [20] focusing in the discovery of the shortest path to the exit. A very
approaches. which boosted the hyperparameters tuning process             popular platform that adapts to large-scale and complex models is
(time consuming process) of the implemented case study.                  the GAMA platform [27]. It has its own agent-oriented modeling
   Crisis simulation systems have several social and scientific im-      language called Gama Modeling Language (GAML) that follows
pacts focusing, mainly, in the development of the civilization, by       the object-oriented paradigm. Additionally, the models include spa-
helping humans design safe buildings that gather many visits through-    tial components used to represent their 3D representation in the
out the day. Additionally, such systems help humans to be prepared       environment. Furthermore, another key feature of the platform
for various crisis situations (e.g. evacuation planning in indoor fire   is the agent’s architecture is based on the Belief Desire Intention
cases) by exploiting Artificial Intelligence technologies that sim-      (BDI) method [3], that proposes a straightforward formalization
ulate human behaviors. In addition to those, this kind of systems,       of the human reasoning through intuitive concepts. It also sup-
provide persons the ability to design their own environments based       ports multi-threaded simulations and running multiple simulations
on their need and monitor/see how these kinds of scenarios are           at the same time. Lastly, iCrowd [17] is an agent-based behavior
unfolded.                                                                modeling and crowd simulation system that has many different
   The rest of the paper is organized as follows: Section 2 briefly      applications, from crowd simulation in crisis evacuations to social
presents some crisis simulation platforms and the research that
                                                                         1 A game engine developed by Unity Technologies (https://unity.com)
has been done on CSCM, Section 3 introduces the prototype of             2 A game engine developed by Epic Games. (https://www.unrealengine.com)
the new introduced crisis simulation system, followed by Section 4       3 A toolkit for developing and comparing reinforcement learning algorithms.
presenting a case study. Lastly, Section 5 concludes the paper by        (https://gym.openai.com)
behavior and urban/maritime traffic simulation. It makes use of
modern, multithreaded and data-oriented approaches that provides
architecture extensibility. The system supports studies based on
human movements (collision avoidance and path planning) and
agent-based behavior modelling.
   Several literature reviews and surveys have been published in
the last years, presenting advancements and important observa-
tions regarding the direction and parts that require focus in the
domain of CSCM. Some of the most important stages when devel-
oping a simulation tool can be derived by reading some of those
reviews. For example, S. Abar’s et al. [1] reviewed the literature for
quickly assessing the ease of use of a simulation tool. Some of the                        Figure 1: The system framework diagram of Metis.
comparison criteria were the tool’s coding language/Application
Programming Interface (API), model development effort, modelling
strength and scalability level. N. Pelechano and A. Malkawi [22]
on their review stated that the physical interactions, psychological                  of the building, designate which doors are exits, mark areas of the
elements, improved human movement, agent-based approaches                             environment in which the pedestrians will be safe and place fires
and communication between agents are important features. Lastly,                      in different places. The last layer (ES), handles the evacuation pro-
J. Xiao et al. [33] focused on the use of hardware accelerators (es-                  cess and the modules responsible for the simulation. The ES layer,
pecially GPUs) for agent-based simulations.                                           exploits machine learning models (interconnection with popular
   The presented system is developed taking into account obser-                       RL libraries), includes the management of the spreading of the fires,
vations/highlights from the aforementioned related works. It has                      handles the ending of simulation and gives the user the ability to
numerous advantages compared to the aforementioned systems,                           manage a dynamically changing crisis. Dynamic crisis management
with the most important ones being its ease of use and its usefulness                 is the ability to model the pipeline of a crisis, in our case two fires
as a research tool for future studies. The development and design of                  from different places. In the future, it could be a tsunami followed
the system focuses on its usability and how easily a user, without                    by an earthquake etc. A fist-view screenshot of the main compo-
any programming skills can setup an environment and test it based                     nent of the system is depicted in Figure 2. The main components
on its needs. Moreover, the system can be used as a research tool                     of the interface are consisted of 2 User Interface Layers (UIL) and
by scientists to test different RL algorithms/models and apply them                   from those, one is also split into two sub-UILs (number 1 and 2 in
on a plethora of different crisis scenarios. This is achieved via the                 Figure 2). These UILs focus on the design of the environment, the
interconnection of Metis with popular RL libraries through the                        experiment and in conducting the final evacuation experiments,
game engine. Additionally, by using a game engine to develop such                     without the need of expert skills. Also, the main structure of the
a system, scalability, physical interactions (physics) and exploita-                  UILs is based on the framework presented in the Figure 1. Each
tion of GPUs is a given, as games take advantage of those things                      UIL is consisted of multiple interactive buttons with the following
with very realistic results. The combination of these advantages,                     functionalities (left to right, top to bottom):
can highlight it as a unique tool for scientific communities and the                     (1) This UIL1 has four main buttons and a scroll view content
general public.                                                                              which includes additional interactive buttons with labels.
                                                                                             The three buttons on the top-left change the category of
3    THE "METIS" SYSTEM                                                                      objects that could be placed in the environment, they appear
In this section we present a prototype version4 of a novel multi-                            in the scroll view after a category is selected. The first button
agent crisis simulation system, developed over the Unity game                                it will show all the static objects that can be placed in the en-
engine, called Metis56 . The main structure diagram of Metis is                              vironment, the second all available types of pedestrians and
shown in Figure 1, which consists of three major layers: Dynamic                             the last one all the sample simulation buildings (areas) that
Environment Development (DED), Scenario Design (SD) and Evac-                                can be placed. Sample buildings are buildings created before-
uation Simulation (ES). The first layer (DED) allows the user to                             hand and provided for the user, with each building including
design and setup the entire environment and building to be evac-                             placed objects and having different layout and number of
uated, dynamically. The second layer (SD), follows the concept of                            rooms. Clicking on an object in the scroll content will allow
dynamic design of evacuation scenario, giving the user the ability                           the user to place the specific object into the environment.
to place pedestrians (various number of agents) in different parts                           The last button with the magnifier icon in top-right, allows
                                                                                             the user to filter the list of objects through a text field.
4 The assets currently used are from “Standard Assets (for Unity 2017.3)”                (2) This UIL2 includes functionalities that can change the mode
(https://assetstore.unity.com/packages/essentials/asset-packs/standard-                      of the mouse. On the left column the button allows the user
assets-for-unity-2017-3-32351)     and     the   “Snaps    Prototype     |  Office”
(https://assetstore.unity.com/packages/3d/environments/snaps-prototype-office-               to assign a safe area in the environment. On the middle
137490) Unity packages.                                                                      column the buttons allow the user to place fires, walls, floors
5 The name comes from Metis, one of the elder Okeanides and the Titan-goddess of
                                                                                             and doors. In the last column the buttons reset the mouse
good counsel, planning, cunning and wisdom. Counsel, planning and wisdom are also
required when a building is designed. https://www.theoi.com/Titan/TitanisMetis.html          to default and does nothing and the last button to grab and
6 https://sites.google.com/view/metissimulationsystem                                        place already placed objects.
                                                                          3.3      Evacuation Simulation
                                                                          The ES layer is responsible for all the functions running during
                                                                          the simulation of an evacuation procedure. Starting with the fire
                                                                          propagation, a very simple algorithm is employed. The fire is firstly
                                                                          placed in a point in the building, with a specific maximum area and
                                                                          is represented by a particle emitting object, which damages any
                                                                          object that touches it. Then, when the simulation starts, the area
                                                                          grows periodically and multiple fire objects are created at random
                                                                          places inside the area. Simply put, the propagation of the fire, cur-
Figure 2: The main User Interface Layers of the Metis sys-                rently, works with a random speed and direction, while the contact
tem.                                                                      of the fire with the pedestrian is enabled with collisions interfaces.
                                                                          Having control over when the simulation automatically ends is
                                                                          an important feature. The prototype version currently supports
   (3) This UIL3 includes buttons regarding the simulation pro-           end conditions like when all or a specific number of pedestrians
       cess. The play button starts the simulation process and the        are safe/dead. The pedestrian’s evacuation can be done by training
       gear button shows all the available options regarding the          the agents with RL algorithms. By exploiting the capabilities of
       simulation ending conditions.                                      ML-Agents toolkit [15] (Section 4) the agents of the Metis system
                                                                          can be trained with popular RL algorithm such as Proximal Policy
                                                                          Optimization (PPO) [23] and Soft Actor-Critic (SAC) [11]. In ad-
3.1    Dynamic Environment Development                                    dition to that the Metis system can be easily interconnected with
As mentioned above, the DED layer of the system allows the user           popular RL libraries such as RLlib7 and Baselines8 , also custom
to design the layout of the building to be evacuated, by placing          python RL algorithms can be developed. A typical RL training is
the building’s walls. This layer can be characterized as the layer        done by creating learning environments in which the agent collects
responsible for the content generation of the environment, com-           observations and acts based on them. For the training of a general
monly used in games [24] and [34], and allows for the creation of         model which can evacuate buildings during a crisis situation, the
dynamic environments [16] during the environment design process.          typical procedure of creating a building environment was followed,
The walls are placed using the wall placement tool from the UIL1,         setting up doors, designating the exits and placing objects which
which places a part of it and can be extended to any direction by         also acted as obstacles inside the different rooms. Dynamic crisis
dragging the mouse. As a second step, doors can be placed on the          management is a part of the simulation that allows the user to
walls, allowing pedestrians to move through the rooms. Lastly, a          manage the crisis currently unfolding. For now, it allows the user
plethora of objects can be placed anywhere inside the building to         to start a fire in a different part of the environment than the initial.
decorate it and to act as obstacles during the evacuation, which          This makes the system more effective and allows the user to observe
mimic the real-world indoor objects. During any placement pro-            the pedestrians’ behaviors while the crisis changes dynamically.
cedure, the walls, doors and objects snap to each other so that the
placement can be easier. The DED is considered to be a powerful           3.4      Pedestrian agents training approach
tool, which provides the ability to users to create their own in-         In this section we present and analyze how the pedestrian agents
door realistic environments based on their needs and their cases.         were trained, the features that the agents gather in each step, the
Thus, giving the opportunity to test several different environments       actions the agents take to evacuate a building and the environment
without the need of programming skills.                                   in which it was trained. Figure 3 depicts an indoor environment
                                                                          created using the Metis system and then saving it as a single area
3.2    Scenario Design                                                    object for ease of use during the training environment setup and to
                                                                          keep the environment setup for further analysis. The highlighted
The SD layer is responsible for designing the scenarios, meaning
                                                                          areas with light green, inside the building, are the possible areas that
where the fire will start, how many pedestrians will have to evacu-
                                                                          an agent could spawn when an episode begun. Initially, all agents
ate the building (multi-agent approach) and where their starting
                                                                          spawned inside the room marked as “1” and each one unlocked the
positions will be, which exit they will try to reach and where they
                                                                          next area (light green) one by one. An agent could unlock an area
will be safe. The fires’ positions can be chosen with the fire place-
                                                                          once the mean reward of the RL in the last 20 episodes was equal or
ment tool of the UIL2, which, during the design process, allows the
                                                                          higher to 0.925, as consequence of a good training for a specific area.
user to specify from which position the fire will start to spread. The
                                                                          Every time an episode begins, the agent chooses randomly between
fires start spreading when the simulation starts. Different “types”
                                                                          five possible spawn areas. Those five areas are the most recent areas
of pedestrians can be placed and each type has different attributes,
                                                                          the agent unlocked. Eventually, when the agent has a mean reward
like speed, size, color, health points etc. giving the ability to simu-
                                                                          of over 0.925 in average of all areas, it can spawn in any area. This
late different human behaviors. A door is marked as an exit door
                                                                          was done so that the agent learned gradually and all the areas were
by right clicking on it. At least one door has to be marked as an
                                                                          unlocked so that it can generalize correctly (without learning only
exit due to the way the pedestrians were trained. Lastly, safe areas
are used to mark a pedestrian as safe from the crisis during the          7 https://docs.ray.io/en/master/rllib.html

simulation, where they are trying to escape to.                           8 https://github.com/openai/baselines
                                                                                          actions per episode (equal to 10.000 during the training). Distance()
                                                                                          calculates the distance between the exit and the agent.
                                                                                             It should be noted that the positions of the objects are normalized
                                                                                          according to their relative position inside the building. Additionally,
                                                                                          the reasoning behind the choice of the negative rewards is to make
                                                                                          the agent reach the exit as soon as possible and to collide with as
                                                                                          less objects as possible. Lastly, the reward and episode reset from
                                                                                          touching the fire makes the agent avoid the fires and not touch
                                                                                          them, by assigning the reward of the episode to -1.
                                                                                             The available actions for each agent are provided through two
                                                                                          output branches, each with two possible actions. The first branch
                                                                                          is responsible for the agent’s horizontal movement (left or right)
                                                                                          and the second branch for the vertical movement (backward or
    Figure 3: Building used to train the pedestrian agent.                                forward). With this setup the agent is able to navigate to a safe exit.
                                                                                             Based on these parameters the agents can be trained with many
                                                                                          different RL algorithms. By having access to the source code of the
to escape from the specific point of the building). The red cubes                         Metis system, all these parameters and many others can be adjusted
inside the building are dummy fire objects which, when touched,                           based on the need of the experiment. At this point it should be
reset the agent’s episode and set its reward to -1. The reasoning                         highlighted that the Metis system will be provided under open
behind the use of dummy objects instead of the actual fire was to                         source licensing.
check if the agent, using the aforementioned raycast components,
would eventually learn to avoid those fires. While the intense green
is considered to be a safe exit for the pedestrians. The features                         4   CASE STUDY
gathered by each agent during the training/learning were 70 in                            For the sake of clarity, a case scenario was setup to present the
total, from which 64 were gathered using three “Ray Perception                            procedure of setting up a building for a crisis evacuation planning
Sensor 3D”9 components and 6 were calculated manually:                                    and to also evaluate the quality of the RL based model that was
   (1) The first raycast component detected objects (static objects                       trained in the previous section at escaping a different building in
        and fires) and is blocked by walls and doors, it casts 20 rays                    a fire crisis scenario. The creation of the building was done with
        of 15 length (by default 1:1 meters in Unity), in a 140 degrees                   the UILs and its architecture was quite simple, consisted of three
        arc in front of the agent, responsible for detecting objects                      main rooms and one hall. The hall was empty and was connected
        that have to be avoided (inside the room the agent currently                      to the other three rooms (east, north and west rooms). Each room
        is).                                                                              was decorated with many static objects, such as desks, small and
   (2) The second raycast component detects doors, safe exit doors                        large cabinets, small and large shelves and plants. The objects were
        and walls, with 20 rays of 25 length in an 80 degrees arc in                      placed in such way to make the evacuation of the building harder, as
        front of the agent, responsible for detecting, doors and safe                     it can be seen in the north room in Figure 5. Four doors were placed,
        exit doors that are close.                                                        three connecting the rooms with the hall and one being the safe
   (3) The third raycast components also detects doors, safe exit                         exit on the south. The safe exit door was designated as an exit by
        doors and walls, with 24 rays of 50 length in a 140 degrees                       right clicking on the door. The next step, when setting up a building
        arc in front of the agent, responsible for detecting doors and                    in the Metis system, is to designate a safe area, which when the
        safe exit doors that are far away.                                                pedestrians touch, are considered as being safe. In our case study,
   (4) The manually calculated features were:                                             we designated a safe are just outside the exit. The following step
      (a) The normalized x and z values of the safe exit door                             is to place the pedestrians into the building. We placed a total of
     (b) The agent’s position and                                                         25 pedestrians, scattering them around all the rooms, placing some
      (c) The normalized direction from the agent to the exit door.                       on difficult areas. Lastly, fires were placed on all rooms, in such
During the training, the agent gets −0.4/𝑚𝑎𝑥𝑆𝑡𝑒𝑝 reward for each                          way that some pedestrians’ paths are partly blocked during the
step (action) taken, −0.3/𝑚𝑎𝑥𝑆𝑡𝑒𝑝 if collides with something (static                      evacuation. Figure 5 shows the layout of our designed building for
objects, walls and closed doors) and small positive rewards depend-                       the case study, along with all the objects, agents and fires placed.
ing on its distance from the exit door (Eq. 1). Additionally, when                        During the training procedure, the environment spawned 60 agents
the agent reached the safe area the reward was set to +1.                                 during and each one individually started training. This is a common
                                                                                          methodology to speed up the training process. It should be noted
                  𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (𝑛𝑜𝑟𝑚(𝑒𝑥𝑖𝑡𝑝𝑜𝑠 ), 𝑛𝑜𝑟𝑚(𝑎𝑔𝑒𝑛𝑡𝑝𝑜𝑠 ))                              that, due to the fact that there were multiple agents in the same
 𝑅𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 =                                                 ∗ 0.3 (1)
                                 𝑚𝑎𝑥𝑆𝑡𝑒𝑝                                                  environment, they ignored each other, both physically and feature-
Where 𝑛𝑜𝑟𝑚(𝑒𝑥𝑖𝑡𝑝𝑜𝑠 ) the normalized exit’s position, 𝑛𝑜𝑟𝑚(𝑎𝑔𝑒𝑛𝑡𝑝𝑜𝑠 )                      wise. For the training procedure the PPO algorithm was exploited,
the normalized agent’s position, maxStep the maximum number of                            which is considered to be one of the most effective RL algorithms
                                                                                          for agents’ adopting raycast observations.
9 Rays that are cast into the physics world, and the objects that are hit determine the       In our experiment the setup of the chosen PPO algorithm was
observation vector that is produced.                                                      based on a neural network which approximated the ideal function
                     Figure 4: Cumulative reward, an increase during a successful training of agents training.




            Figure 5: Layout and case study setup.                                    Figure 6: Beginning of the evacuation.



                                                                         better understand the whole process of creating a building and set-
that mapped the agent’s observations to the best action an agent         ting up the evacuation and simulation procedure, we have created
could take in a given state. The neural network set up was, input:       a demo video10 . When a simulation procedure ends, for any reason,
70, hidden layers: 512 and output: 4, with discount factor for future    a results window pops up to inform the user about the evacuation
rewards set to 𝛾 = 0.995 and the learning rate set to 𝜆 = 0.0003.        and the statistics. Figure 7 shows a snapshot of the info-window,
Figure 4 depicts the agent’s training results. The agent was trained     highlighting the results of our case study. The info-window informs
for 14.55 million steps (actions). The reward initially was set to       the user about the total pedestrians that survived and died at the
[-1, 1]. The multiple drops in the cumulative reward that can be         end of the simulation. In this case study, from the total 25 pedestri-
seen are due to the different “difficulty” areas that were unlocked      ans, 17 survived (agents which reached the green safe area) and 8
(new rooms starting points). This, naturally, had as a result to         died (died or didn’t evacuate successfully in case of manual end).
drop the total reward as the environment was different from the
previous. Eventually, at the end (where all the areas are unlocked)      5    CONCLUSIONS AND FUTURE WORK
the cumulative reward reached 0.96. After the training of the agent
was completed, to start the simulation and therefore the building        Game engines have become more and more popular and have been
evacuation, the play button was pressed near the top right corner.       exploited for many different applications besides their main target,
When this button is pressed, if there is any ongoing crisis (that        the development of games. In this paper we present a prototype of
is, at least one fire has been placed), all the pedestrians will start   a novel crisis simulation system called Metis. Metis is developed
individually evacuating the building.                                    using a very popular game engine, Unity, and exploits many of
    Figure 6 shows a snapshot at the beginning of the evacuation         its optimizations such as physics, particle effects, cross platform
after the training, when the pedestrians started running towards         development etc. In addition to that, the Metis system can make
the exit. It can be seen that most of the pedestrians find their way     use of trending Reinforcement Learning algorithms, to improve the
towards it immediately. Despite that, some of them can be seen           simulation realism and the evacuation planning. Its interconnection
struggling, with some being stuck running into a corner of a room.
Note that to demonstrate how easy it is to setup a scenario and to       10 https://tinyurl.com/MetisMABCSSDemo
                                                                         includes the introduction of other features and functionalities, such
                                                                         as: real time simulation statistics, more explanatory and graphical
                                                                         statistics at the end of a simulation, ability to build multi-level and
                                                                         multiple buildings, more realistic fire propagation, more types of
                                                                         crisis (in addition to fire, such as earthquake, flooding etc.) and allow
                                                                         pedestrians to interact with each other. Moreover, some important
                                                                         considerations towards the future improvement of the system are
                                                                         the incorporation of emotional and psychological features into the
                                                                         agents. This aspect is an important one and has been extensively
                                                                         studied in the literature of CSCM. Lastly, an important feature of a
                                                                         system that aims for longevity and extensibility is to add support
                                                                         for the user (auto guide) to extend the system’s functionalities.


                                                                         ACKNOWLEDGMENTS
                                                                         This work is supported by the MPhil program "Advanced Technolo-
       Figure 7: Window with the simulation’s results.                   gies in Informatics and Computers", hosted by the Department of
                                                                         Computer Science, International Hellenic University.

with popular RL libraries and its dynamic content (environment) de-      REFERENCES
velopment can establish it as a powerful research tool for basic and      [1] Sameera Abar, Georgios K. Theodoropoulos, Pierre Lemarinier, and Gregory M.P.
applied high-level research. Due to the fact that it is developed over        O’Hare. 2017. Agent Based Modelling and Simulation tools: A review of the
a game engine that supports cross-platform development, it can be             state-of-art software. Computer Science Review 24 (2017), 13–33. https://doi.org/
                                                                              10.1016/j.cosrev.2017.03.001
considered as a system that can run in multiple operating systems.        [2] Christian Becker-Asano, Felix Ruzzoli, Christoph Hölscher, and Bernhard Nebel.
As mentioned above, the most important key features of the system             2014. A multi-agent system based on unity 4 for virtual perception and wayfind-
are its ease of use for scenario design and simulation, the ability to        ing. Transportation Research Procedia 2 (2014), 452–455. https://doi.org/10.1016/
                                                                              j.trpro.2014.09.059
build case study environments dynamically, dynamic management             [3] Lars Braubach, Alexander Pokahr, and Winfried Lamersdorf. 2005. Jadex: A
of the crisis situation (multiple crisis situations), exploitation of         BDI-Agent System Combining Middleware and Reasoning. Software Agent-
                                                                              Based Applications, Platforms and Development Kits (2005), 143–168. https:
various RL algorithms and well known libraries, inherent GPU-                 //doi.org/10.1007/3-7643-7348-2_7
accelerated agent simulations, agents with various characteristics        [4] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schul-
and behaviors and, lastly, the ability to specify simulation end con-         man, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. (2016), 1–4.
                                                                              arXiv:1606.01540 http://arxiv.org/abs/1606.01540
ditions.                                                                  [5] R. L. Carstens and S. L. Ring. 1970. Pedestrian capacities of shelter entrances.
   Furthermore, although the system is in Alpha version, the pre-             Traffic Engineering 41, 3 (1970), 38–43.
sented experimental results are encouraging and promising. To             [6] Mohcine Chraibi, Armin Seyfried, and Andreas Schadschneider. 2010. Generalized
                                                                              centrifugal-force model for pedestrian dynamics. Physical Review E - Statistical,
sum-up, by using the Metis system one can design their own build-             Nonlinear, and Soft Matter Physics 82, 4 (2010). https://doi.org/10.1103/PhysRevE.
ing layout, place a variety of objects, agents and fires, towards the         82.046111 arXiv:1008.4297
                                                                          [7] Lílian De Oliveira Carneiro, Joaquim Bento Cavalcante-Neto, Creto Augusto
development of personal evacuation plan. Due to its simplicity the            Vidal, and Teófilo Bezerra Dutra. 2013. Crowd evacuation using cellular automata:
Metis system can be used by everyone, even from users without                 Simulation in a soccer stadium. Proceedings - 2013 15th Symposium on Virtual
special programming skills. The aforementioned features of Metis              and Augmented Reality, SVR 2013 (2013), 240–243. https://doi.org/10.1109/SVR.
                                                                              2013.29
focus on a key concept of a dynamic and general system, especially        [8] R. Fahy and G. Proulx. 1997. Human Behavior In The World Trade Center
due to the dynamic crisis management. The user can start a crisis at          Evacuation. Fire Safety Science 5 (1997), 713–724. https://doi.org/10.3801/IAFSS.
different moments during the simulation, creating unique scenarios            FSS.5-713
                                                                          [9] Leon Festinger. 1954. A Theory of Social Comparison Processes. Human Relations
and allowing them to observe the pedestrians’ reactions.                      7, 2 (may 1954), 117–140. https://doi.org/10.1177/001872675400700202
   From the results it is obvious that there is room for improvement.    [10] G. Nigel. Gilbert and Klaus G. Troitzsch. 2005. Simulation for the social scientist.
                                                                              295 pages.
First and foremost, not all pedestrians found their way towards          [11] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon
the exit, which means a different training approach with different            Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and
RL algorithms has to be tested. The best one would be to train the            Sergey Levine. 2018. Soft Actor-Critic Algorithms and Applications. (dec 2018).
                                                                              arXiv:1812.05905 http://arxiv.org/abs/1812.05905
agent for much longer, with a dynamic change of the environment          [12] B. D. Hankin and R. A. Wright. 1958. Passenger Flow in Subways. OR 9, 2 (jun
(random or via curriculum learning). This means that the placed               1958), 81. https://doi.org/10.2307/3006732
fires’ positions have to be changed, along with the exit door, the       [13] Dirk Helbing, Péter Molnár, Illés J. Farkas, and Kai Bolay. 2001. Self-organizing
                                                                              pedestrian movement. Environment and Planning B: Planning and Design 28, 3
building’s layout and the agent’s attributes (speed and size), every          (2001), 361–383. https://doi.org/10.1068/b2697
time the agent finishes an episode. In general, a better fine-tuning     [14] L. A. Hoel. 1968. Pedestrian travel rates in central business districts. Traffic
                                                                              Engineering (1968), 10–13.
of the algorithms could provide more accurate evacuations with           [15] Arthur Juliani, Vincent-Pierre Berges, Esh Vckay, Yuan Gao, Hunter Henry,
fewer losses. Moreover, allowing pedestrians to interact with each            Marwan Mattar, and Danny Lange. 2018. Unity: A General Platform for Intelligent
other (cooperative learning) will require the agents to be trained in         Agents. (sep 2018). arXiv:1809.02627 http://arxiv.org/abs/1809.02627
                                                                         [16] Chairi Kiourt, Anestis Koutsoudis, and George Pavlidis. 2016. DynaMus: A fully
such way that they take into account the number of agents near                dynamic 3D virtual museum framework. Journal of Cultural Heritage 22 (nov
them or near an exit. In addition to that, future work on the system          2016), 984–991. https://doi.org/10.1016/j.culher.2016.06.007
[17] Vassilios I. Kountouriotis, Manolis Paterakis, and Stelios C. A. Thomopoulos.            utility-based behavior models: Sochi Olympic Park Station use case. Procedia
     2016. iCrowd: agent-based behavior modeling and crowd simulator, Ivan Kadar              Computer Science 136 (2018), 453–462. https://doi.org/10.1016/j.procs.2018.08.266
     (Ed.). 98420Q. https://doi.org/10.1117/12.2223109                                   [27] Patrick Taillandier, Benoit Gaudou, Arnaud Grignard, Quang-Nghi Huynh, Nico-
[18] Sheng Yan Lim. 2015. Crowd Behavioural Simulation via Multi-Agent Reinforce-             las Marilleau, Philippe Caillou, Damien Philippon, and Alexis Drogoul. 2019.
     ment Learning. Ph.D. Dissertation.                                                       Building, composing and experimenting complex spatial models with the GAMA
[19] Francisco Martinez-Gil, Miguel Lozano, and Fernando Fernández. 2014. Strategies          platform. GeoInformatica 23, 2 (apr 2019), 299–322. https://doi.org/10.1007/
     for simulating pedestrian navigation with multiple reinforcement learning agents.        s10707-018-00339-6
     Autonomous Agents and Multi-Agent Systems 29, 1 (2014), 98–130. https://doi.        [28] Daniel Thalmann. 2016. Crowd Simulation. In Encyclopedia of Computer Graphics
     org/10.1007/s10458-014-9252-6                                                            and Games. Springer International Publishing, Cham, 1–8. https://doi.org/10.
[20] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis                    1007/978-3-319-08234-9_69-1
     Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with          [29] Jason Tsai, Natalie Fridman, Emma Bowring, Matthew Brown, Shira Epstein, Gal
     Deep Reinforcement Learning. (2013), 1–9. arXiv:1312.5602 http://arxiv.org/abs/          Kaminka, Stacy Marsella, Andrew Ogden, Inbal Rika, Ankur Sheel, Matthew E.
     1312.5602                                                                                Taylor, Xuezhi Wang, Avishay Zilka, and Milind Tambe. 2011. ESCAPES - Evacu-
[21] Rahul Narain, Abhinav Golas, Sean Curtis, and Ming C. Lin. 2009. Aggregate               ation simulation with children, authorities, parents, emotions, and social com-
     Dynamics for Dense Crowd Simulation. ACM Transactions on Graphics 28, 5                  parison. 10th International Conference on Autonomous Agents and Multiagent
     (2009), 1–8. https://doi.org/10.1145/1618452.1618468                                     Systems 2011, AAMAS 2011 1 (2011), 425–432.
[22] Nuria Pelechano and Ali Malkawi. 2008. Evacuation simulation models: Chal-          [30] Armel Ulrich Kemloh Wagoum, Mohcine Chraibi, Jonas Mehlich, Armin Seyfried,
     lenges in modeling high rise building evacuation with cellular automata ap-              and Andreas Schadschneider. 2012. Efficient and validated simulation of crowds
     proaches. Automation in Construction 17, 4 (2008), 377–385. https://doi.org/10.          for an evacuation assistant. Computer Animation and Virtual Worlds 23, 1 (feb
     1016/j.autcon.2007.06.005                                                                2012), 3–15. https://doi.org/10.1002/cav.1420
[23] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov.      [31] Ulrich Weidmann. 1993. Transporttechnik der Fussgänger. https://doi.org/10.
     2017. Proximal Policy Optimization Algorithms. (jul 2017). arXiv:1707.06347              3929/ethz-a-010025751
     http://arxiv.org/abs/1707.06347                                                     [32] Michael Wooldridge. 2009. An Introduction to MultiAgent Systems, 2nd Edition.
[24] Noor Shaker, Julian Togelius, and Mark J. Nelson. 2016. Procedural Content               484 pages.
     Generation in Games. Springer International Publishing, Cham. https://doi.org/      [33] Jiajian Xiao, Philipp Andelfinger, David Eckhoff, Wentong Cai, and Alois Knoll.
     10.1007/978-3-319-42716-4                                                                2019. A survey on agent-based simulation using hardware accelerators. Comput.
[25] Jivitesh Sharma, Per-Arne Andersen, Ole-Chrisoffer Granmo, and Morten Good-              Surveys 51, 6 (2019). https://doi.org/10.1145/3291048
     win. 2019. Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire           [34] Georgios N. Yannakakis and Julian Togelius. 2018. Artificial Intelligence and
     Evacuation Environment. (2019), 1–21. arXiv:1905.09673 http://arxiv.org/abs/             Games. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-
     1905.09673                                                                               319-63519-4
[26] Andrey Simonov, Aleksandr Lebin, Bogdan Shcherbak, Aleksandr Zagarskikh,
     and Andrey Karsakov. 2018. Multi-agent crowd simulation on large areas with