<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Metis: Multi-Agent Based Crisis Simulation System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>George Sidiropoulos</string-name>
          <email>georsidi@teiemt.gr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chairi Kiourt</string-name>
          <email>chairiq@athenarc.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lefteris Moussiades</string-name>
          <email>lmous@cs.ihu.gr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Multi-agent systems, Modeling and simulation, Agent-based system,</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Athena-Research &amp; Innovation Center, in Information Communication and</institution>
          ,
          <addr-line>Knowledge Technologies, Xanthi</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Crowd evacuation</institution>
          ,
          <addr-line>Crisis simulation</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Computer Science, International Hellenic University</institution>
          ,
          <addr-line>Kavala</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the advent of the computational technologies (Graphics Processing Units - GPUs) and Machine Learning, the research domain of crowd simulation for crisis management has flourished. Along with the new techniques and methodologies that have been proposed all those years, aiming to increase the realism of crowd simulation, several crisis simulation systems/tools have been developed, but most of them focus on special cases without providing users the ability to adapt them based on their needs. Towards these directions, in this paper, we introduce a novel multi-agent-based crisis simulation system for indoor cases. The main advantage of the system is its ease of use feature, focusing on non-expert users (users with little to no programming skills) that can exploit its capabilities a, adapt the entire environment based on their needs (case studies) and set up building evacuation planning experiments with some of the most popular Reinforcement Learning algorithms. Simply put, the system's features focus on dynamic environment design and crisis management, interconnection with popular Reinforcement Learning libraries, agents with diferent characteristics (behaviors), fire propagation parameterization, realistic physics based on a popular game engine, GPU-accelerated agents training and simulation end conditions. A case study exploiting a popular reinforcement learning algorithm, for training of the agents, presents the dynamics and the capabilities of the proposed systems and the paper is concluded with the highlights of the system and some future directions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Computing methodologies → Multi-agent systems;
Reinforcement learning.</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        With the advancements of the recent years in computing
capabilities, Artificial Intelligence and web technologies the research
domain of Crowd Simulation (CS) has gained more and more
interest. This field has grown a lot the last decade (and keeps growing)
and as a consequence, there are more and more techniques and
methods proposed. For example, crowd behaviors simulation [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ],
emotion contagion management, collision avoidance for
pedestrians, accurate decision models etc. are some of the most popular
subjects studied as part of the CS domain. Towards these
directions, the application of Machine Learning (ML) and especially
Deep Learning (DL) approaches have increased and have also been
applied in many case studies, with Reinforcement Learning (RL)
being an important leader in this studies and closely corelated with
CS [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
      </p>
      <p>
        The domain of crowd management and analysis had seen
interest as early as 1958 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], resulting more and more positive social
and scientific impact and being continually studied until now [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. Most of these studies, focus on developing a
levelof-service concepts, designing elements of pedestrian facilities or
planning guidelines [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Although the goals have remained the
same, the demand and simulation scale has increased drastically.
Nowadays, the complexity of planning correct emergency
evacuations of large and small-scale buildings or building blocks has
increased, requiring extensive and accurate planning focusing on
diferent architecture styles, appearances, functionalities and visitor
behaviors [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. All those features (and many other), have become
an important aspect of designing a building for eficient evacuation
planning, which are important factors in simulations systems for
crisis scenarios, for example an evacuation of a building due to fire
or earthquake. This type of scenarios aim to improve the
procedure of risk assessments, emergency plans and the evacuation itself.
Also, they are usually tackled by crisis management preparation
procedures, which include mock crisis scenarios (e.g. fire drills or
“mock evacuations”). Unfortunately, these types of procedures in
many cases fail to prepare humans and are often ignored [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Thus,
results obtained from those preparation projects cannot be used
to design accurate policies. For this reason, simulations systems
can be used as an additional method of evaluating a security policy
of indoor or outdoor facilities. Simulations can take into account
the impact of diferent environmental, emotional and informational
conditions [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], but in most cases the simulation tools have been
designed with specific facilities for specific cases
      </p>
      <p>
        The research domain of Crowd Simulation for Crisis
Management (CSCM) has experienced an increasing interest the past years.
Crowd simulation is the process of simulating how a number of
entities (commonly large) move inside a virtual scene with a specific
setting [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. Crisis simulations are systems that include entities with
more roles and responsibilities, on top of the existing techniques and
algorithms required for the physical and even psychological
simulation of those same entities. Moreover, the setting of the simulated
scenario varies a lot, from film production and military simulation
to urban planning, which all require high realism concerning the
movements of those entities, their grouping and their behaviors in
general.
      </p>
      <p>
        The most suitable approach of crowd and crisis simulation
systems is the simulation of multiple individual entities [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Systems
that follow this approach are called Multi-Agent Systems (MAS)
and are consisted of multiple agents (entities to be simulated) and
their environment (the setting in which they exist and can interact
with each other) in which they may cooperate or compete towards
specific tasks/goals [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. Based on the agents’ interactions and their
perception they perform actions to achieve their goal. Their
structure makes them befitting for crowd and crisis simulation research.
      </p>
      <p>In this paper, a novel crisis simulation system is introduced,
focusing towards the creation of a prototype system, that takes
advantage of the plethora of simulation and performance enhancing
capabilities of a well-known game engine. The system’s key features
are:
(1) Ease of use: users with little to no programming skills or
experience can setup a crisis scenario and simulate it through
a user-friendly Graphical User Interface (GUI).
(2) Dynamic environment design: a feature that allows the users
to create their own building and environment based on their
needs and case studies.
(3) Interconnection with popular Reinforcement Learning
libraries: allowing researchers to exploit popular RL
algorithms for the training of the agents or to try their own
algorithms.
(4) Dynamic crisis management: allowing the user to model a
specific structured pipeline of a crisis, for example two fires
starting from diferent places.
(5) GPU-accelerated agents’ training and simulation, for the
support of large multi-agent systems.
(6) Simulation end options, which allows the user to specify
when a simulation will automatically end.</p>
      <p>Additionally, we tested the new introduced system with a
state-ofthe-art Deep Reinforcement Learning algorithm (DRL), resulting
high accuracy in training and quite well evacuations of agents in an
indoor environment. It should be highlighted that GPU accelerated
training of the DRL was much faster than the CPU based training
approaches. which boosted the hyperparameters tuning process
(time consuming process) of the implemented case study.</p>
      <p>Crisis simulation systems have several social and scientific
impacts focusing, mainly, in the development of the civilization, by
helping humans design safe buildings that gather many visits
throughout the day. Additionally, such systems help humans to be prepared
for various crisis situations (e.g. evacuation planning in indoor fire
cases) by exploiting Artificial Intelligence technologies that
simulate human behaviors. In addition to those, this kind of systems,
provide persons the ability to design their own environments based
on their need and monitor/see how these kinds of scenarios are
unfolded.</p>
      <p>The rest of the paper is organized as follows: Section 2 briefly
presents some crisis simulation platforms and the research that
has been done on CSCM, Section 3 introduces the prototype of
the new introduced crisis simulation system, followed by Section 4
presenting a case study. Lastly, Section 5 concludes the paper by
highlighting some key features of the systems and presents some
future work towards the enhancement of the prototype system.
2</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORKS</title>
      <p>
        As mentioned before, nowadays there has been a plethora of
systems developed for the simulation of diferent crisis scenarios. For
example, Becker-Asano et al. presented a multi-agent system
focused on first-persona perception and signs, taking dynamically
changing occlusions into account [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The implementation was
done using Unity game engine1, while also making it possible for
participants to be tested in the same virtual airport terminal, with
the combination of a head-mounted display “Oculus Rift”. Simonov
et al. proposed a system for building composite behavior
structures for large number of agents [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Their system was based on a
decision-making algorithm, implemented in Unreal Engine 42 . The
path finding system exploited the Menge simulation with plugins
and the system also included animation support, dynamic models,
a visualization module and utility-based strategic level algorithms.
      </p>
      <p>
        ESCAPES, a multi-agent evacuation simulation system, presented
in 2011, which incorporated diferent agent types with emotional,
informational and behavioral interaction [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. The agent types
include individual travelers, families, authority and security agents.
Additionally, the system incorporated information spreading to
agents, emotional interaction and contagion and the Social
Comparison Theory [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Evakuierungsassistent (translated as Evacuation
Assistant) is another simulation system focused on the simulation
of evacuation of mass events (e.g. football stadiums),
incorporating realistic methods for real-time simulation [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]. The system
is agent-based and exploits Cellular Automata (CA) methods and
Generalized Centrifugal Force Models [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        In 2013, De Oliveira Carneiro et al. presented a simulation system
to study the crowd’s behavior while evacuating a soccer stadium [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
The system exploits the use of 2D CA defined over multiple grids
that represented diferent levels (state spaces) of simulated
environment. The system has the ability to simulate environments with
complex structures composed of multiple floors. Sharm et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
proposed the first fire evacuation environment based on the OpenAI
gym3 [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Moreover, they proposed a new approach that entails
pretraining an agent based on a Deep Q-Network (DQN) algorithm
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] focusing in the discovery of the shortest path to the exit. A very
popular platform that adapts to large-scale and complex models is
the GAMA platform [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. It has its own agent-oriented modeling
language called Gama Modeling Language (GAML) that follows
the object-oriented paradigm. Additionally, the models include
spatial components used to represent their 3D representation in the
environment. Furthermore, another key feature of the platform
is the agent’s architecture is based on the Belief Desire Intention
(BDI) method [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], that proposes a straightforward formalization
of the human reasoning through intuitive concepts. It also
supports multi-threaded simulations and running multiple simulations
at the same time. Lastly, iCrowd [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] is an agent-based behavior
modeling and crowd simulation system that has many diferent
applications, from crowd simulation in crisis evacuations to social
1A game engine developed by Unity Technologies (https://unity.com)
2A game engine developed by Epic Games. (https://www.unrealengine.com)
3A toolkit for developing and comparing reinforcement learning algorithms.
(https://gym.openai.com)
behavior and urban/maritime trafic simulation. It makes use of
modern, multithreaded and data-oriented approaches that provides
architecture extensibility. The system supports studies based on
human movements (collision avoidance and path planning) and
agent-based behavior modelling.
      </p>
      <p>
        Several literature reviews and surveys have been published in
the last years, presenting advancements and important
observations regarding the direction and parts that require focus in the
domain of CSCM. Some of the most important stages when
developing a simulation tool can be derived by reading some of those
reviews. For example, S. Abar’s et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] reviewed the literature for
quickly assessing the ease of use of a simulation tool. Some of the
comparison criteria were the tool’s coding language/Application
Programming Interface (API), model development efort, modelling
strength and scalability level. N. Pelechano and A. Malkawi [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]
on their review stated that the physical interactions, psychological
elements, improved human movement, agent-based approaches
and communication between agents are important features. Lastly,
J. Xiao et al. [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] focused on the use of hardware accelerators
(especially GPUs) for agent-based simulations.
      </p>
      <p>The presented system is developed taking into account
observations/highlights from the aforementioned related works. It has
numerous advantages compared to the aforementioned systems,
with the most important ones being its ease of use and its usefulness
as a research tool for future studies. The development and design of
the system focuses on its usability and how easily a user, without
any programming skills can setup an environment and test it based
on its needs. Moreover, the system can be used as a research tool
by scientists to test diferent RL algorithms/models and apply them
on a plethora of diferent crisis scenarios. This is achieved via the
interconnection of Metis with popular RL libraries through the
game engine. Additionally, by using a game engine to develop such
a system, scalability, physical interactions (physics) and
exploitation of GPUs is a given, as games take advantage of those things
with very realistic results. The combination of these advantages,
can highlight it as a unique tool for scientific communities and the
general public.
3</p>
    </sec>
    <sec id="sec-4">
      <title>THE "METIS" SYSTEM</title>
      <p>In this section we present a prototype version4 of a novel
multiagent crisis simulation system, developed over the Unity game
engine, called Metis56. The main structure diagram of Metis is
shown in Figure 1, which consists of three major layers: Dynamic
Environment Development (DED), Scenario Design (SD) and
Evacuation Simulation (ES). The first layer (DED) allows the user to
design and setup the entire environment and building to be
evacuated, dynamically. The second layer (SD), follows the concept of
dynamic design of evacuation scenario, giving the user the ability
to place pedestrians (various number of agents) in diferent parts
4The assets currently used are from “Standard Assets (for Unity 2017.3)”
(https://assetstore.unity.com/packages/essentials/asset-packs/standardassets-for-unity-2017-3-32351) and the “Snaps Prototype | Ofice”
(https://assetstore.unity.com/packages/3d/environments/snaps-prototype-ofice137490) Unity packages.
5The name comes from Metis, one of the elder Okeanides and the Titan-goddess of
good counsel, planning, cunning and wisdom. Counsel, planning and wisdom are also
required when a building is designed. https://www.theoi.com/Titan/TitanisMetis.html
6https://sites.google.com/view/metissimulationsystem
of the building, designate which doors are exits, mark areas of the
environment in which the pedestrians will be safe and place fires
in diferent places. The last layer (ES), handles the evacuation
process and the modules responsible for the simulation. The ES layer,
exploits machine learning models (interconnection with popular
RL libraries), includes the management of the spreading of the fires,
handles the ending of simulation and gives the user the ability to
manage a dynamically changing crisis. Dynamic crisis management
is the ability to model the pipeline of a crisis, in our case two fires
from diferent places. In the future, it could be a tsunami followed
by an earthquake etc. A fist-view screenshot of the main
component of the system is depicted in Figure 2. The main components
of the interface are consisted of 2 User Interface Layers (UIL) and
from those, one is also split into two sub-UILs (number 1 and 2 in
Figure 2). These UILs focus on the design of the environment, the
experiment and in conducting the final evacuation experiments,
without the need of expert skills. Also, the main structure of the
UILs is based on the framework presented in the Figure 1. Each
UIL is consisted of multiple interactive buttons with the following
functionalities (left to right, top to bottom):
(1) This UIL1 has four main buttons and a scroll view content
which includes additional interactive buttons with labels.
The three buttons on the top-left change the category of
objects that could be placed in the environment, they appear
in the scroll view after a category is selected. The first button
it will show all the static objects that can be placed in the
environment, the second all available types of pedestrians and
the last one all the sample simulation buildings (areas) that
can be placed. Sample buildings are buildings created
beforehand and provided for the user, with each building including
placed objects and having diferent layout and number of
rooms. Clicking on an object in the scroll content will allow
the user to place the specific object into the environment.
The last button with the magnifier icon in top-right, allows
the user to filter the list of objects through a text field.
(2) This UIL2 includes functionalities that can change the mode
of the mouse. On the left column the button allows the user
to assign a safe area in the environment. On the middle
column the buttons allow the user to place fires, walls, floors
and doors. In the last column the buttons reset the mouse
to default and does nothing and the last button to grab and
place already placed objects.</p>
      <p>(3) This UIL3 includes buttons regarding the simulation
process. The play button starts the simulation process and the
gear button shows all the available options regarding the
simulation ending conditions.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Dynamic Environment Development</title>
      <p>
        As mentioned above, the DED layer of the system allows the user
to design the layout of the building to be evacuated, by placing
the building’s walls. This layer can be characterized as the layer
responsible for the content generation of the environment,
commonly used in games [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ], and allows for the creation of
dynamic environments [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] during the environment design process.
The walls are placed using the wall placement tool from the UIL1,
which places a part of it and can be extended to any direction by
dragging the mouse. As a second step, doors can be placed on the
walls, allowing pedestrians to move through the rooms. Lastly, a
plethora of objects can be placed anywhere inside the building to
decorate it and to act as obstacles during the evacuation, which
mimic the real-world indoor objects. During any placement
procedure, the walls, doors and objects snap to each other so that the
placement can be easier. The DED is considered to be a powerful
tool, which provides the ability to users to create their own
indoor realistic environments based on their needs and their cases.
Thus, giving the opportunity to test several diferent environments
without the need of programming skills.
3.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>Scenario Design</title>
      <p>
        The SD layer is responsible for designing the scenarios, meaning
where the fire will start, how many pedestrians will have to
evacuate the building (multi-agent approach) and where their starting
positions will be, which exit they will try to reach and where they
will be safe. The fires’ positions can be chosen with the fire
placement tool of the UIL2, which, during the design process, allows the
user to specify from which position the fire will start to spread. The
ifres start spreading when the simulation starts. Diferent “types”
of pedestrians can be placed and each type has diferent attributes,
like speed, size, color, health points etc. giving the ability to
simulate diferent human behaviors. A door is marked as an exit door
by right clicking on it. At least one door has to be marked as an
exit due to the way the pedestrians were trained. Lastly, safe areas
are used to mark a pedestrian as safe from the crisis during the
simulation, where they are trying to escape to.
3.3
The ES layer is responsible for all the functions running during
the simulation of an evacuation procedure. Starting with the fire
propagation, a very simple algorithm is employed. The fire is firstly
placed in a point in the building, with a specific maximum area and
is represented by a particle emitting object, which damages any
object that touches it. Then, when the simulation starts, the area
grows periodically and multiple fire objects are created at random
places inside the area. Simply put, the propagation of the fire,
currently, works with a random speed and direction, while the contact
of the fire with the pedestrian is enabled with collisions interfaces.
Having control over when the simulation automatically ends is
an important feature. The prototype version currently supports
end conditions like when all or a specific number of pedestrians
are safe/dead. The pedestrian’s evacuation can be done by training
the agents with RL algorithms. By exploiting the capabilities of
ML-Agents toolkit [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] (Section 4) the agents of the Metis system
can be trained with popular RL algorithm such as Proximal Policy
Optimization (PPO) [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and Soft Actor-Critic (SAC) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In
addition to that the Metis system can be easily interconnected with
popular RL libraries such as RLlib7 and Baselines8, also custom
python RL algorithms can be developed. A typical RL training is
done by creating learning environments in which the agent collects
observations and acts based on them. For the training of a general
model which can evacuate buildings during a crisis situation, the
typical procedure of creating a building environment was followed,
setting up doors, designating the exits and placing objects which
also acted as obstacles inside the diferent rooms. Dynamic crisis
management is a part of the simulation that allows the user to
manage the crisis currently unfolding. For now, it allows the user
to start a fire in a diferent part of the environment than the initial.
This makes the system more efective and allows the user to observe
the pedestrians’ behaviors while the crisis changes dynamically.
3.4
      </p>
    </sec>
    <sec id="sec-7">
      <title>Pedestrian agents training approach</title>
      <p>In this section we present and analyze how the pedestrian agents
were trained, the features that the agents gather in each step, the
actions the agents take to evacuate a building and the environment
in which it was trained. Figure 3 depicts an indoor environment
created using the Metis system and then saving it as a single area
object for ease of use during the training environment setup and to
keep the environment setup for further analysis. The highlighted
areas with light green, inside the building, are the possible areas that
an agent could spawn when an episode begun. Initially, all agents
spawned inside the room marked as “1” and each one unlocked the
next area (light green) one by one. An agent could unlock an area
once the mean reward of the RL in the last 20 episodes was equal or
higher to 0.925, as consequence of a good training for a specific area.
Every time an episode begins, the agent chooses randomly between
ifve possible spawn areas. Those five areas are the most recent areas
the agent unlocked. Eventually, when the agent has a mean reward
of over 0.925 in average of all areas, it can spawn in any area. This
was done so that the agent learned gradually and all the areas were
unlocked so that it can generalize correctly (without learning only
7https://docs.ray.io/en/master/rllib.html
8https://github.com/openai/baselines
to escape from the specific point of the building). The red cubes
inside the building are dummy fire objects which, when touched,
reset the agent’s episode and set its reward to -1. The reasoning
behind the use of dummy objects instead of the actual fire was to
check if the agent, using the aforementioned raycast components,
would eventually learn to avoid those fires. While the intense green
is considered to be a safe exit for the pedestrians. The features
gathered by each agent during the training/learning were 70 in
total, from which 64 were gathered using three “Ray Perception
Sensor 3D”9 components and 6 were calculated manually:
(1) The first raycast component detected objects (static objects
and fires) and is blocked by walls and doors, it casts 20 rays
of 15 length (by default 1:1 meters in Unity), in a 140 degrees
arc in front of the agent, responsible for detecting objects
that have to be avoided (inside the room the agent currently
is).
(2) The second raycast component detects doors, safe exit doors
and walls, with 20 rays of 25 length in an 80 degrees arc in
front of the agent, responsible for detecting, doors and safe
exit doors that are close.
(3) The third raycast components also detects doors, safe exit
doors and walls, with 24 rays of 50 length in a 140 degrees
arc in front of the agent, responsible for detecting doors and
safe exit doors that are far away.
(4) The manually calculated features were:
(a) The normalized x and z values of the safe exit door
(b) The agent’s position and
(c) The normalized direction from the agent to the exit door.
During the training, the agent gets −0.4/ reward for each
step (action) taken, −0.3/ if collides with something (static
objects, walls and closed doors) and small positive rewards
depending on its distance from the exit door (Eq. 1). Additionally, when
the agent reached the safe area the reward was set to +1.
 =
 ( ( ),  ( ))

∗ 0.3 (1)
Where  ( ) the normalized exit’s position,  ( )
the normalized agent’s position, maxStep the maximum number of
9Rays that are cast into the physics world, and the objects that are hit determine the
observation vector that is produced.
actions per episode (equal to 10.000 during the training). Distance()
calculates the distance between the exit and the agent.</p>
      <p>It should be noted that the positions of the objects are normalized
according to their relative position inside the building. Additionally,
the reasoning behind the choice of the negative rewards is to make
the agent reach the exit as soon as possible and to collide with as
less objects as possible. Lastly, the reward and episode reset from
touching the fire makes the agent avoid the fires and not touch
them, by assigning the reward of the episode to -1.</p>
      <p>The available actions for each agent are provided through two
output branches, each with two possible actions. The first branch
is responsible for the agent’s horizontal movement (left or right)
and the second branch for the vertical movement (backward or
forward). With this setup the agent is able to navigate to a safe exit.</p>
      <p>Based on these parameters the agents can be trained with many
diferent RL algorithms. By having access to the source code of the
Metis system, all these parameters and many others can be adjusted
based on the need of the experiment. At this point it should be
highlighted that the Metis system will be provided under open
source licensing.
4</p>
    </sec>
    <sec id="sec-8">
      <title>CASE STUDY</title>
      <p>For the sake of clarity, a case scenario was setup to present the
procedure of setting up a building for a crisis evacuation planning
and to also evaluate the quality of the RL based model that was
trained in the previous section at escaping a diferent building in
a fire crisis scenario. The creation of the building was done with
the UILs and its architecture was quite simple, consisted of three
main rooms and one hall. The hall was empty and was connected
to the other three rooms (east, north and west rooms). Each room
was decorated with many static objects, such as desks, small and
large cabinets, small and large shelves and plants. The objects were
placed in such way to make the evacuation of the building harder, as
it can be seen in the north room in Figure 5. Four doors were placed,
three connecting the rooms with the hall and one being the safe
exit on the south. The safe exit door was designated as an exit by
right clicking on the door. The next step, when setting up a building
in the Metis system, is to designate a safe area, which when the
pedestrians touch, are considered as being safe. In our case study,
we designated a safe are just outside the exit. The following step
is to place the pedestrians into the building. We placed a total of
25 pedestrians, scattering them around all the rooms, placing some
on dificult areas. Lastly, fires were placed on all rooms, in such
way that some pedestrians’ paths are partly blocked during the
evacuation. Figure 5 shows the layout of our designed building for
the case study, along with all the objects, agents and fires placed.
During the training procedure, the environment spawned 60 agents
during and each one individually started training. This is a common
methodology to speed up the training process. It should be noted
that, due to the fact that there were multiple agents in the same
environment, they ignored each other, both physically and
featurewise. For the training procedure the PPO algorithm was exploited,
which is considered to be one of the most efective RL algorithms
for agents’ adopting raycast observations.</p>
      <p>
        In our experiment the setup of the chosen PPO algorithm was
based on a neural network which approximated the ideal function
that mapped the agent’s observations to the best action an agent
could take in a given state. The neural network set up was, input:
70, hidden layers: 512 and output: 4, with discount factor for future
rewards set to  = 0.995 and the learning rate set to  = 0.0003.
Figure 4 depicts the agent’s training results. The agent was trained
for 14.55 million steps (actions). The reward initially was set to
[
        <xref ref-type="bibr" rid="ref1">-1, 1</xref>
        ]. The multiple drops in the cumulative reward that can be
seen are due to the diferent “dificulty” areas that were unlocked
(new rooms starting points). This, naturally, had as a result to
drop the total reward as the environment was diferent from the
previous. Eventually, at the end (where all the areas are unlocked)
the cumulative reward reached 0.96. After the training of the agent
was completed, to start the simulation and therefore the building
evacuation, the play button was pressed near the top right corner.
When this button is pressed, if there is any ongoing crisis (that
is, at least one fire has been placed), all the pedestrians will start
individually evacuating the building.
      </p>
      <p>Figure 6 shows a snapshot at the beginning of the evacuation
after the training, when the pedestrians started running towards
the exit. It can be seen that most of the pedestrians find their way
towards it immediately. Despite that, some of them can be seen
struggling, with some being stuck running into a corner of a room.
Note that to demonstrate how easy it is to setup a scenario and to
better understand the whole process of creating a building and
setting up the evacuation and simulation procedure, we have created
a demo video10. When a simulation procedure ends, for any reason,
a results window pops up to inform the user about the evacuation
and the statistics. Figure 7 shows a snapshot of the info-window,
highlighting the results of our case study. The info-window informs
the user about the total pedestrians that survived and died at the
end of the simulation. In this case study, from the total 25
pedestrians, 17 survived (agents which reached the green safe area) and 8
died (died or didn’t evacuate successfully in case of manual end).
5</p>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>Game engines have become more and more popular and have been
exploited for many diferent applications besides their main target,
the development of games. In this paper we present a prototype of
a novel crisis simulation system called Metis. Metis is developed
using a very popular game engine, Unity, and exploits many of
its optimizations such as physics, particle efects, cross platform
development etc. In addition to that, the Metis system can make
use of trending Reinforcement Learning algorithms, to improve the
simulation realism and the evacuation planning. Its interconnection
10https://tinyurl.com/MetisMABCSSDemo
with popular RL libraries and its dynamic content (environment)
development can establish it as a powerful research tool for basic and
applied high-level research. Due to the fact that it is developed over
a game engine that supports cross-platform development, it can be
considered as a system that can run in multiple operating systems.
As mentioned above, the most important key features of the system
are its ease of use for scenario design and simulation, the ability to
build case study environments dynamically, dynamic management
of the crisis situation (multiple crisis situations), exploitation of
various RL algorithms and well known libraries, inherent
GPUaccelerated agent simulations, agents with various characteristics
and behaviors and, lastly, the ability to specify simulation end
conditions.</p>
      <p>Furthermore, although the system is in Alpha version, the
presented experimental results are encouraging and promising. To
sum-up, by using the Metis system one can design their own
building layout, place a variety of objects, agents and fires, towards the
development of personal evacuation plan. Due to its simplicity the
Metis system can be used by everyone, even from users without
special programming skills. The aforementioned features of Metis
focus on a key concept of a dynamic and general system, especially
due to the dynamic crisis management. The user can start a crisis at
diferent moments during the simulation, creating unique scenarios
and allowing them to observe the pedestrians’ reactions.</p>
      <p>From the results it is obvious that there is room for improvement.
First and foremost, not all pedestrians found their way towards
the exit, which means a diferent training approach with diferent
RL algorithms has to be tested. The best one would be to train the
agent for much longer, with a dynamic change of the environment
(random or via curriculum learning). This means that the placed
ifres’ positions have to be changed, along with the exit door, the
building’s layout and the agent’s attributes (speed and size), every
time the agent finishes an episode. In general, a better fine-tuning
of the algorithms could provide more accurate evacuations with
fewer losses. Moreover, allowing pedestrians to interact with each
other (cooperative learning) will require the agents to be trained in
such way that they take into account the number of agents near
them or near an exit. In addition to that, future work on the system
includes the introduction of other features and functionalities, such
as: real time simulation statistics, more explanatory and graphical
statistics at the end of a simulation, ability to build multi-level and
multiple buildings, more realistic fire propagation, more types of
crisis (in addition to fire, such as earthquake, flooding etc.) and allow
pedestrians to interact with each other. Moreover, some important
considerations towards the future improvement of the system are
the incorporation of emotional and psychological features into the
agents. This aspect is an important one and has been extensively
studied in the literature of CSCM. Lastly, an important feature of a
system that aims for longevity and extensibility is to add support
for the user (auto guide) to extend the system’s functionalities.</p>
    </sec>
    <sec id="sec-10">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work is supported by the MPhil program "Advanced
Technologies in Informatics and Computers", hosted by the Department of
Computer Science, International Hellenic University.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Sameera</given-names>
            <surname>Abar</surname>
          </string-name>
          ,
          <string-name>
            <surname>Georgios K. Theodoropoulos</surname>
          </string-name>
          , Pierre Lemarinier, and
          <string-name>
            <surname>Gregory M.P. O'Hare</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Agent Based Modelling and Simulation tools: A review of the state-of-art software</article-title>
          .
          <source>Computer Science Review</source>
          <volume>24</volume>
          (
          <year>2017</year>
          ),
          <fpage>13</fpage>
          -
          <lpage>33</lpage>
          . https://doi.org/ 10.1016/j.cosrev.
          <year>2017</year>
          .
          <volume>03</volume>
          .001
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Becker-Asano</surname>
          </string-name>
          , Felix Ruzzoli, Christoph Hölscher, and Bernhard Nebel.
          <year>2014</year>
          .
          <article-title>A multi-agent system based on unity 4 for virtual perception and wayfinding</article-title>
          .
          <source>Transportation Research Procedia</source>
          <volume>2</volume>
          (
          <year>2014</year>
          ),
          <fpage>452</fpage>
          -
          <lpage>455</lpage>
          . https://doi.org/10.1016/ j.trpro.
          <year>2014</year>
          .
          <volume>09</volume>
          .059
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Lars</given-names>
            <surname>Braubach</surname>
          </string-name>
          , Alexander Pokahr, and
          <string-name>
            <given-names>Winfried</given-names>
            <surname>Lamersdorf</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Jadex: A BDI-Agent System Combining Middleware and Reasoning</article-title>
          .
          <source>Software AgentBased Applications</source>
          , Platforms and
          <string-name>
            <given-names>Development</given-names>
            <surname>Kits</surname>
          </string-name>
          (
          <year>2005</year>
          ),
          <fpage>143</fpage>
          -
          <lpage>168</lpage>
          . https: //doi.org/10.1007/3-7643-7348-
          <issue>2</issue>
          _
          <fpage>7</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Greg</given-names>
            <surname>Brockman</surname>
          </string-name>
          , Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and
          <string-name>
            <given-names>Wojciech</given-names>
            <surname>Zaremba</surname>
          </string-name>
          .
          <year>2016</year>
          . OpenAI Gym. (
          <year>2016</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . arXiv:
          <volume>1606</volume>
          .01540 http://arxiv.org/abs/1606.01540
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Carstens</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Ring</surname>
          </string-name>
          .
          <year>1970</year>
          .
          <article-title>Pedestrian capacities of shelter entrances</article-title>
          .
          <source>Trafic Engineering</source>
          <volume>41</volume>
          ,
          <issue>3</issue>
          (
          <year>1970</year>
          ),
          <fpage>38</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Mohcine</given-names>
            <surname>Chraibi</surname>
          </string-name>
          , Armin Seyfried, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Schadschneider</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Generalized centrifugal-force model for pedestrian dynamics</article-title>
          . Physical Review E - Statistical, Nonlinear, and
          <source>Soft Matter Physics</source>
          <volume>82</volume>
          ,
          <issue>4</issue>
          (
          <year>2010</year>
          ). https://doi.org/10.1103/PhysRevE. 82.046111 arXiv:
          <fpage>1008</fpage>
          .
          <fpage>4297</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Lílian</given-names>
            <surname>De Oliveira Carneiro</surname>
          </string-name>
          ,
          <string-name>
            <surname>Joaquim Bento</surname>
          </string-name>
          Cavalcante-Neto,
          <source>Creto Augusto Vidal, and Teófilo Bezerra Dutra</source>
          .
          <year>2013</year>
          .
          <article-title>Crowd evacuation using cellular automata: Simulation in a soccer stadium</article-title>
          .
          <source>Proceedings - 2013 15th Symposium on Virtual and Augmented Reality</source>
          ,
          <string-name>
            <surname>SVR</surname>
          </string-name>
          <year>2013</year>
          (
          <year>2013</year>
          ),
          <fpage>240</fpage>
          -
          <lpage>243</lpage>
          . https://doi.org/10.1109/SVR.
          <year>2013</year>
          .29
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fahy</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Proulx</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Human Behavior In The World Trade Center Evacuation</article-title>
          .
          <source>Fire Safety Science</source>
          <volume>5</volume>
          (
          <year>1997</year>
          ),
          <fpage>713</fpage>
          -
          <lpage>724</lpage>
          . https://doi.org/10.3801/IAFSS. FSS.5-
          <fpage>713</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Leon</given-names>
            <surname>Festinger</surname>
          </string-name>
          .
          <year>1954</year>
          .
          <article-title>A Theory of Social Comparison Processes</article-title>
          .
          <source>Human Relations</source>
          <volume>7</volume>
          ,
          <issue>2</issue>
          (may
          <year>1954</year>
          ),
          <fpage>117</fpage>
          -
          <lpage>140</lpage>
          . https://doi.org/10.1177/001872675400700202
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Nigel. Gilbert</surname>
          </string-name>
          and
          <string-name>
            <given-names>Klaus G.</given-names>
            <surname>Troitzsch</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Simulation for the social scientist</article-title>
          .
          <volume>295</volume>
          pages.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Tuomas</surname>
            <given-names>Haarnoja</given-names>
          </string-name>
          , Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Levine</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Soft Actor-Critic Algorithms</article-title>
          and Applications. (dec
          <year>2018</year>
          ). arXiv:
          <year>1812</year>
          .05905 http://arxiv.org/abs/
          <year>1812</year>
          .05905
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Hankin</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Wright</surname>
          </string-name>
          .
          <year>1958</year>
          .
          <article-title>Passenger Flow in Subways</article-title>
          .
          <source>OR 9</source>
          ,
          <issue>2</issue>
          (jun
          <year>1958</year>
          ),
          <volume>81</volume>
          . https://doi.org/10.2307/3006732
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Dirk</surname>
            <given-names>Helbing</given-names>
          </string-name>
          , Péter Molnár,
          <string-name>
            <given-names>Illés J.</given-names>
            <surname>Farkas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Kai</given-names>
            <surname>Bolay</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Self-organizing pedestrian movement</article-title>
          .
          <source>Environment and Planning B: Planning and Design</source>
          <volume>28</volume>
          ,
          <issue>3</issue>
          (
          <year>2001</year>
          ),
          <fpage>361</fpage>
          -
          <lpage>383</lpage>
          . https://doi.org/10.1068/b2697
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Hoel</surname>
          </string-name>
          .
          <year>1968</year>
          .
          <article-title>Pedestrian travel rates in central business districts</article-title>
          .
          <source>Trafic Engineering</source>
          (
          <year>1968</year>
          ),
          <fpage>10</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Arthur</surname>
            <given-names>Juliani</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincent-Pierre</surname>
            <given-names>Berges</given-names>
          </string-name>
          , Esh Vckay,
          <string-name>
            <given-names>Yuan</given-names>
            <surname>Gao</surname>
          </string-name>
          , Hunter Henry, Marwan Mattar, and
          <string-name>
            <given-names>Danny</given-names>
            <surname>Lange</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Unity: A General Platform for Intelligent Agents</article-title>
          . (sep
          <year>2018</year>
          ). arXiv:
          <year>1809</year>
          .02627 http://arxiv.org/abs/
          <year>1809</year>
          .02627
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Chairi</surname>
            <given-names>Kiourt</given-names>
          </string-name>
          , Anestis Koutsoudis, and
          <string-name>
            <given-names>George</given-names>
            <surname>Pavlidis</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>DynaMus: A fully dynamic 3D virtual museum framework</article-title>
          .
          <source>Journal of Cultural Heritage</source>
          <volume>22</volume>
          (nov
          <year>2016</year>
          ),
          <fpage>984</fpage>
          -
          <lpage>991</lpage>
          . https://doi.org/10.1016/j.culher.
          <year>2016</year>
          .
          <volume>06</volume>
          .007
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Vassilios</surname>
            <given-names>I. Kountouriotis</given-names>
          </string-name>
          , Manolis Paterakis, and
          <string-name>
            <surname>Stelios</surname>
            <given-names>C. A.</given-names>
          </string-name>
          <string-name>
            <surname>Thomopoulos</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>iCrowd: agent-based behavior modeling and crowd simulator</article-title>
          , Ivan Kadar (Ed.).
          <year>98420Q</year>
          . https://doi.org/10.1117/12.2223109
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Sheng</surname>
            <given-names>Yan</given-names>
          </string-name>
          <string-name>
            <surname>Lim</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Crowd Behavioural Simulation via Multi-Agent Reinforcement Learning</article-title>
          .
          <source>Ph.D. Dissertation.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Francisco</surname>
            Martinez-Gil,
            <given-names>Miguel</given-names>
          </string-name>
          <string-name>
            <surname>Lozano</surname>
            , and
            <given-names>Fernando</given-names>
          </string-name>
          <string-name>
            <surname>Fernández</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Strategies for simulating pedestrian navigation with multiple reinforcement learning agents</article-title>
          .
          <source>Autonomous Agents and Multi-Agent Systems 29</source>
          ,
          <issue>1</issue>
          (
          <year>2014</year>
          ),
          <fpage>98</fpage>
          -
          <lpage>130</lpage>
          . https://doi. org/10.1007/s10458-014-9252-6
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Volodymyr</surname>
            <given-names>Mnih</given-names>
          </string-name>
          , Koray Kavukcuoglu, David Silver,
          <string-name>
            <given-names>Alex</given-names>
            <surname>Graves</surname>
          </string-name>
          , Ioannis Antonoglou, Daan Wierstra, and
          <string-name>
            <given-names>Martin</given-names>
            <surname>Riedmiller</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Playing Atari with Deep Reinforcement Learning</article-title>
          . (
          <year>2013</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . arXiv:
          <volume>1312</volume>
          .5602 http://arxiv.org/abs/ 1312.5602
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Rahul</surname>
            <given-names>Narain</given-names>
          </string-name>
          , Abhinav Golas, Sean Curtis, and
          <string-name>
            <surname>Ming</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Aggregate Dynamics for Dense Crowd Simulation</article-title>
          .
          <source>ACM Transactions on Graphics 28</source>
          ,
          <issue>5</issue>
          (
          <year>2009</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . https://doi.org/10.1145/1618452.1618468
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Nuria</given-names>
            <surname>Pelechano</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ali</given-names>
            <surname>Malkawi</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Evacuation simulation models: Challenges in modeling high rise building evacuation with cellular automata approaches</article-title>
          .
          <source>Automation in Construction 17</source>
          ,
          <issue>4</issue>
          (
          <year>2008</year>
          ),
          <fpage>377</fpage>
          -
          <lpage>385</lpage>
          . https://doi.org/10. 1016/j.autcon.
          <year>2007</year>
          .
          <volume>06</volume>
          .005
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>John</surname>
            <given-names>Schulman</given-names>
          </string-name>
          , Filip Wolski, Prafulla Dhariwal, Alec Radford, and
          <string-name>
            <given-names>Oleg</given-names>
            <surname>Klimov</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Proximal Policy Optimization Algorithms</article-title>
          . (jul
          <year>2017</year>
          ). arXiv:
          <volume>1707</volume>
          .06347 http://arxiv.org/abs/1707.06347
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Noor</surname>
            <given-names>Shaker</given-names>
          </string-name>
          , Julian Togelius, and
          <string-name>
            <given-names>Mark J.</given-names>
            <surname>Nelson</surname>
          </string-name>
          .
          <year>2016</year>
          . Procedural Content Generation in Games. Springer International Publishing, Cham. https://doi.org/ 10.1007/978-3-
          <fpage>319</fpage>
          -42716-4
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Jivitesh</surname>
            <given-names>Sharma</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Per-Arne</surname>
            <given-names>Andersen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ole-Chrisofer Granmo</surname>
            , and
            <given-names>Morten</given-names>
          </string-name>
          <string-name>
            <surname>Goodwin</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire Evacuation Environment</article-title>
          . (
          <year>2019</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          . arXiv:
          <year>1905</year>
          .09673 http://arxiv.org/abs/
          <year>1905</year>
          .09673
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Andrey</surname>
            <given-names>Simonov</given-names>
          </string-name>
          , Aleksandr Lebin, Bogdan Shcherbak, Aleksandr Zagarskikh, and
          <string-name>
            <given-names>Andrey</given-names>
            <surname>Karsakov</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Multi-agent crowd simulation on large areas with utility-based behavior models: Sochi Olympic Park Station use case</article-title>
          .
          <source>Procedia Computer Science</source>
          <volume>136</volume>
          (
          <year>2018</year>
          ),
          <fpage>453</fpage>
          -
          <lpage>462</lpage>
          . https://doi.org/10.1016/j.procs.
          <year>2018</year>
          .
          <volume>08</volume>
          .266
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Patrick</surname>
            <given-names>Taillandier</given-names>
          </string-name>
          , Benoit Gaudou, Arnaud Grignard,
          <string-name>
            <surname>Quang-Nghi</surname>
            <given-names>Huynh</given-names>
          </string-name>
          , Nicolas Marilleau, Philippe Caillou, Damien Philippon, and
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Drogoul</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Building, composing and experimenting complex spatial models with the GAMA platform</article-title>
          .
          <source>GeoInformatica 23</source>
          , 2 (apr
          <year>2019</year>
          ),
          <fpage>299</fpage>
          -
          <lpage>322</lpage>
          . https://doi.org/10.1007/ s10707-018-00339-6
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Thalmann</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Crowd Simulation</article-title>
          .
          <source>In Encyclopedia of Computer Graphics and Games</source>
          . Springer International Publishing, Cham,
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . https://doi.org/10. 1007/978-3-
          <fpage>319</fpage>
          -08234-9_
          <fpage>69</fpage>
          -
          <lpage>1</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Jason</surname>
            <given-names>Tsai</given-names>
          </string-name>
          , Natalie Fridman, Emma Bowring, Matthew Brown, Shira Epstein,
          <string-name>
            <given-names>Gal</given-names>
            <surname>Kaminka</surname>
          </string-name>
          , Stacy Marsella, Andrew Ogden, Inbal Rika, Ankur Sheel, Matthew E. Taylor, Xuezhi Wang,
          <string-name>
            <surname>Avishay Zilka</surname>
            , and
            <given-names>Milind</given-names>
          </string-name>
          <string-name>
            <surname>Tambe</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>ESCAPES - Evacuation simulation with children, authorities, parents, emotions, and social comparison</article-title>
          .
          <source>10th International Conference on Autonomous Agents and Multiagent Systems</source>
          <year>2011</year>
          , AAMAS
          <year>2011</year>
          1 (
          <issue>2011</issue>
          ),
          <fpage>425</fpage>
          -
          <lpage>432</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Armel</given-names>
            <surname>Ulrich Kemloh Wagoum</surname>
          </string-name>
          , Mohcine Chraibi, Jonas Mehlich, Armin Seyfried, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Schadschneider</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Eficient and validated simulation of crowds for an evacuation assistant</article-title>
          .
          <source>Computer Animation and Virtual Worlds</source>
          <volume>23</volume>
          ,
          <issue>1</issue>
          (feb
          <year>2012</year>
          ),
          <fpage>3</fpage>
          -
          <lpage>15</lpage>
          . https://doi.org/10.1002/cav.1420
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Ulrich</given-names>
            <surname>Weidmann</surname>
          </string-name>
          .
          <year>1993</year>
          .
          <article-title>Transporttechnik der Fussgänger</article-title>
          . https://doi.org/10. 3929/ethz-a-
          <volume>010025751</volume>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Wooldridge</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>An Introduction to MultiAgent Systems</article-title>
          ,
          <source>2nd Edition</source>
          . 484 pages.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Jiajian</surname>
            <given-names>Xiao</given-names>
          </string-name>
          , Philipp Andelfinger, David Eckhof,
          <string-name>
            <given-names>Wentong</given-names>
            <surname>Cai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Alois</given-names>
            <surname>Knoll</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A survey on agent-based simulation using hardware accelerators</article-title>
          .
          <source>Comput. Surveys</source>
          <volume>51</volume>
          ,
          <issue>6</issue>
          (
          <year>2019</year>
          ). https://doi.org/10.1145/3291048
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Georgios</surname>
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Yannakakis</surname>
            and
            <given-names>Julian</given-names>
          </string-name>
          <string-name>
            <surname>Togelius</surname>
          </string-name>
          .
          <source>2018. Artificial Intelligence and Games</source>
          . Springer International Publishing, Cham. https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -63519-4
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>