Metis: Multi-Agent Based Crisis Simulation System George Sidiropoulos Chairi Kiourt Lefteris Moussiades georsidi@teiemt.gr chairiq@athenarc.gr lmous@cs.ihu.gr Department of Computer Science, Athena-Research & Innovation Center Department of Computer Science, International Hellenic University in Information Communication and International Hellenic University Kavala, Greece Knowledge Technologies Kavala, Greece Xanthi, Greece ABSTRACT emotion contagion management, collision avoidance for pedestri- With the advent of the computational technologies (Graphics Pro- ans, accurate decision models etc. are some of the most popular cessing Units - GPUs) and Machine Learning, the research domain of subjects studied as part of the CS domain. Towards these direc- crowd simulation for crisis management has flourished. Along with tions, the application of Machine Learning (ML) and especially the new techniques and methodologies that have been proposed all Deep Learning (DL) approaches have increased and have also been those years, aiming to increase the realism of crowd simulation, sev- applied in many case studies, with Reinforcement Learning (RL) eral crisis simulation systems/tools have been developed, but most being an important leader in this studies and closely corelated with of them focus on special cases without providing users the ability CS [18], [19] and [25]. to adapt them based on their needs. Towards these directions, in The domain of crowd management and analysis had seen inter- this paper, we introduce a novel multi-agent-based crisis simulation est as early as 1958 [12], resulting more and more positive social system for indoor cases. The main advantage of the system is its and scientific impact and being continually studied until now [5], ease of use feature, focusing on non-expert users (users with little [14] and [31]. Most of these studies, focus on developing a level- to no programming skills) that can exploit its capabilities a, adapt of-service concepts, designing elements of pedestrian facilities or the entire environment based on their needs (case studies) and set planning guidelines [13]. Although the goals have remained the up building evacuation planning experiments with some of the same, the demand and simulation scale has increased drastically. most popular Reinforcement Learning algorithms. Simply put, the Nowadays, the complexity of planning correct emergency evac- system’s features focus on dynamic environment design and crisis uations of large and small-scale buildings or building blocks has management, interconnection with popular Reinforcement Learn- increased, requiring extensive and accurate planning focusing on ing libraries, agents with different characteristics (behaviors), fire different architecture styles, appearances, functionalities and visitor propagation parameterization, realistic physics based on a popular behaviors [26]. All those features (and many other), have become game engine, GPU-accelerated agents training and simulation end an important aspect of designing a building for efficient evacuation conditions. A case study exploiting a popular reinforcement learn- planning, which are important factors in simulations systems for ing algorithm, for training of the agents, presents the dynamics and crisis scenarios, for example an evacuation of a building due to fire the capabilities of the proposed systems and the paper is concluded or earthquake. This type of scenarios aim to improve the proce- with the highlights of the system and some future directions. dure of risk assessments, emergency plans and the evacuation itself. Also, they are usually tackled by crisis management preparation CCS CONCEPTS procedures, which include mock crisis scenarios (e.g. fire drills or “mock evacuations”). Unfortunately, these types of procedures in • Computing methodologies → Multi-agent systems; Rein- many cases fail to prepare humans and are often ignored [8]. Thus, forcement learning. results obtained from those preparation projects cannot be used to design accurate policies. For this reason, simulations systems KEYWORDS can be used as an additional method of evaluating a security policy Multi-agent systems, Modeling and simulation, Agent-based system, of indoor or outdoor facilities. Simulations can take into account Crowd evacuation, Crisis simulation the impact of different environmental, emotional and informational conditions [29], but in most cases the simulation tools have been 1 INTRODUCTION designed with specific facilities for specific cases With the advancements of the recent years in computing capa- The research domain of Crowd Simulation for Crisis Manage- bilities, Artificial Intelligence and web technologies the research ment (CSCM) has experienced an increasing interest the past years. domain of Crowd Simulation (CS) has gained more and more inter- Crowd simulation is the process of simulating how a number of est. This field has grown a lot the last decade (and keeps growing) entities (commonly large) move inside a virtual scene with a specific and as a consequence, there are more and more techniques and setting [28]. Crisis simulations are systems that include entities with methods proposed. For example, crowd behaviors simulation [21], more roles and responsibilities, on top of the existing techniques and algorithms required for the physical and even psychological simu- lation of those same entities. Moreover, the setting of the simulated GAITECUS0, September 02–04, 2020, Athens, Greece scenario varies a lot, from film production and military simulation Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons to urban planning, which all require high realism concerning the License Attribution 4.0 International (CC BY 4.0). movements of those entities, their grouping and their behaviors in highlighting some key features of the systems and presents some general. future work towards the enhancement of the prototype system. The most suitable approach of crowd and crisis simulation sys- tems is the simulation of multiple individual entities [10]. Systems 2 RELATED WORKS that follow this approach are called Multi-Agent Systems (MAS) As mentioned before, nowadays there has been a plethora of sys- and are consisted of multiple agents (entities to be simulated) and tems developed for the simulation of different crisis scenarios. For their environment (the setting in which they exist and can interact example, Becker-Asano et al. presented a multi-agent system fo- with each other) in which they may cooperate or compete towards cused on first-persona perception and signs, taking dynamically specific tasks/goals [32]. Based on the agents’ interactions and their changing occlusions into account [2]. The implementation was perception they perform actions to achieve their goal. Their struc- done using Unity game engine1 , while also making it possible for ture makes them befitting for crowd and crisis simulation research. participants to be tested in the same virtual airport terminal, with In this paper, a novel crisis simulation system is introduced, the combination of a head-mounted display “Oculus Rift”. Simonov focusing towards the creation of a prototype system, that takes et al. proposed a system for building composite behavior struc- advantage of the plethora of simulation and performance enhancing tures for large number of agents [26]. Their system was based on a capabilities of a well-known game engine. The system’s key features decision-making algorithm, implemented in Unreal Engine 42 . The are: path finding system exploited the Menge simulation with plugins and the system also included animation support, dynamic models, (1) Ease of use: users with little to no programming skills or a visualization module and utility-based strategic level algorithms. experience can setup a crisis scenario and simulate it through ESCAPES, a multi-agent evacuation simulation system, presented a user-friendly Graphical User Interface (GUI). in 2011, which incorporated different agent types with emotional, (2) Dynamic environment design: a feature that allows the users informational and behavioral interaction [29]. The agent types in- to create their own building and environment based on their clude individual travelers, families, authority and security agents. needs and case studies. Additionally, the system incorporated information spreading to (3) Interconnection with popular Reinforcement Learning li- agents, emotional interaction and contagion and the Social Compar- braries: allowing researchers to exploit popular RL algo- ison Theory [9]. Evakuierungsassistent (translated as Evacuation rithms for the training of the agents or to try their own Assistant) is another simulation system focused on the simulation algorithms. of evacuation of mass events (e.g. football stadiums), incorporat- (4) Dynamic crisis management: allowing the user to model a ing realistic methods for real-time simulation [30]. The system specific structured pipeline of a crisis, for example two fires is agent-based and exploits Cellular Automata (CA) methods and starting from different places. Generalized Centrifugal Force Models [6]. (5) GPU-accelerated agents’ training and simulation, for the In 2013, De Oliveira Carneiro et al. presented a simulation system support of large multi-agent systems. to study the crowd’s behavior while evacuating a soccer stadium [7]. (6) Simulation end options, which allows the user to specify The system exploits the use of 2D CA defined over multiple grids when a simulation will automatically end. that represented different levels (state spaces) of simulated environ- ment. The system has the ability to simulate environments with Additionally, we tested the new introduced system with a state-of- complex structures composed of multiple floors. Sharm et al. [4], the-art Deep Reinforcement Learning algorithm (DRL), resulting proposed the first fire evacuation environment based on the OpenAI high accuracy in training and quite well evacuations of agents in an gym3 [25]. Moreover, they proposed a new approach that entails indoor environment. It should be highlighted that GPU accelerated pretraining an agent based on a Deep Q-Network (DQN) algorithm training of the DRL was much faster than the CPU based training [20] focusing in the discovery of the shortest path to the exit. A very approaches. which boosted the hyperparameters tuning process popular platform that adapts to large-scale and complex models is (time consuming process) of the implemented case study. the GAMA platform [27]. It has its own agent-oriented modeling Crisis simulation systems have several social and scientific im- language called Gama Modeling Language (GAML) that follows pacts focusing, mainly, in the development of the civilization, by the object-oriented paradigm. Additionally, the models include spa- helping humans design safe buildings that gather many visits through- tial components used to represent their 3D representation in the out the day. Additionally, such systems help humans to be prepared environment. Furthermore, another key feature of the platform for various crisis situations (e.g. evacuation planning in indoor fire is the agent’s architecture is based on the Belief Desire Intention cases) by exploiting Artificial Intelligence technologies that sim- (BDI) method [3], that proposes a straightforward formalization ulate human behaviors. In addition to those, this kind of systems, of the human reasoning through intuitive concepts. It also sup- provide persons the ability to design their own environments based ports multi-threaded simulations and running multiple simulations on their need and monitor/see how these kinds of scenarios are at the same time. Lastly, iCrowd [17] is an agent-based behavior unfolded. modeling and crowd simulation system that has many different The rest of the paper is organized as follows: Section 2 briefly applications, from crowd simulation in crisis evacuations to social presents some crisis simulation platforms and the research that 1 A game engine developed by Unity Technologies (https://unity.com) has been done on CSCM, Section 3 introduces the prototype of 2 A game engine developed by Epic Games. (https://www.unrealengine.com) the new introduced crisis simulation system, followed by Section 4 3 A toolkit for developing and comparing reinforcement learning algorithms. presenting a case study. Lastly, Section 5 concludes the paper by (https://gym.openai.com) behavior and urban/maritime traffic simulation. It makes use of modern, multithreaded and data-oriented approaches that provides architecture extensibility. The system supports studies based on human movements (collision avoidance and path planning) and agent-based behavior modelling. Several literature reviews and surveys have been published in the last years, presenting advancements and important observa- tions regarding the direction and parts that require focus in the domain of CSCM. Some of the most important stages when devel- oping a simulation tool can be derived by reading some of those reviews. For example, S. Abar’s et al. [1] reviewed the literature for quickly assessing the ease of use of a simulation tool. Some of the Figure 1: The system framework diagram of Metis. comparison criteria were the tool’s coding language/Application Programming Interface (API), model development effort, modelling strength and scalability level. N. Pelechano and A. Malkawi [22] on their review stated that the physical interactions, psychological of the building, designate which doors are exits, mark areas of the elements, improved human movement, agent-based approaches environment in which the pedestrians will be safe and place fires and communication between agents are important features. Lastly, in different places. The last layer (ES), handles the evacuation pro- J. Xiao et al. [33] focused on the use of hardware accelerators (es- cess and the modules responsible for the simulation. The ES layer, pecially GPUs) for agent-based simulations. exploits machine learning models (interconnection with popular The presented system is developed taking into account obser- RL libraries), includes the management of the spreading of the fires, vations/highlights from the aforementioned related works. It has handles the ending of simulation and gives the user the ability to numerous advantages compared to the aforementioned systems, manage a dynamically changing crisis. Dynamic crisis management with the most important ones being its ease of use and its usefulness is the ability to model the pipeline of a crisis, in our case two fires as a research tool for future studies. The development and design of from different places. In the future, it could be a tsunami followed the system focuses on its usability and how easily a user, without by an earthquake etc. A fist-view screenshot of the main compo- any programming skills can setup an environment and test it based nent of the system is depicted in Figure 2. The main components on its needs. Moreover, the system can be used as a research tool of the interface are consisted of 2 User Interface Layers (UIL) and by scientists to test different RL algorithms/models and apply them from those, one is also split into two sub-UILs (number 1 and 2 in on a plethora of different crisis scenarios. This is achieved via the Figure 2). These UILs focus on the design of the environment, the interconnection of Metis with popular RL libraries through the experiment and in conducting the final evacuation experiments, game engine. Additionally, by using a game engine to develop such without the need of expert skills. Also, the main structure of the a system, scalability, physical interactions (physics) and exploita- UILs is based on the framework presented in the Figure 1. Each tion of GPUs is a given, as games take advantage of those things UIL is consisted of multiple interactive buttons with the following with very realistic results. The combination of these advantages, functionalities (left to right, top to bottom): can highlight it as a unique tool for scientific communities and the (1) This UIL1 has four main buttons and a scroll view content general public. which includes additional interactive buttons with labels. The three buttons on the top-left change the category of 3 THE "METIS" SYSTEM objects that could be placed in the environment, they appear In this section we present a prototype version4 of a novel multi- in the scroll view after a category is selected. The first button agent crisis simulation system, developed over the Unity game it will show all the static objects that can be placed in the en- engine, called Metis56 . The main structure diagram of Metis is vironment, the second all available types of pedestrians and shown in Figure 1, which consists of three major layers: Dynamic the last one all the sample simulation buildings (areas) that Environment Development (DED), Scenario Design (SD) and Evac- can be placed. Sample buildings are buildings created before- uation Simulation (ES). The first layer (DED) allows the user to hand and provided for the user, with each building including design and setup the entire environment and building to be evac- placed objects and having different layout and number of uated, dynamically. The second layer (SD), follows the concept of rooms. Clicking on an object in the scroll content will allow dynamic design of evacuation scenario, giving the user the ability the user to place the specific object into the environment. to place pedestrians (various number of agents) in different parts The last button with the magnifier icon in top-right, allows the user to filter the list of objects through a text field. 4 The assets currently used are from “Standard Assets (for Unity 2017.3)” (2) This UIL2 includes functionalities that can change the mode (https://assetstore.unity.com/packages/essentials/asset-packs/standard- of the mouse. On the left column the button allows the user assets-for-unity-2017-3-32351) and the “Snaps Prototype | Office” (https://assetstore.unity.com/packages/3d/environments/snaps-prototype-office- to assign a safe area in the environment. On the middle 137490) Unity packages. column the buttons allow the user to place fires, walls, floors 5 The name comes from Metis, one of the elder Okeanides and the Titan-goddess of and doors. In the last column the buttons reset the mouse good counsel, planning, cunning and wisdom. Counsel, planning and wisdom are also required when a building is designed. https://www.theoi.com/Titan/TitanisMetis.html to default and does nothing and the last button to grab and 6 https://sites.google.com/view/metissimulationsystem place already placed objects. 3.3 Evacuation Simulation The ES layer is responsible for all the functions running during the simulation of an evacuation procedure. Starting with the fire propagation, a very simple algorithm is employed. The fire is firstly placed in a point in the building, with a specific maximum area and is represented by a particle emitting object, which damages any object that touches it. Then, when the simulation starts, the area grows periodically and multiple fire objects are created at random places inside the area. Simply put, the propagation of the fire, cur- Figure 2: The main User Interface Layers of the Metis sys- rently, works with a random speed and direction, while the contact tem. of the fire with the pedestrian is enabled with collisions interfaces. Having control over when the simulation automatically ends is an important feature. The prototype version currently supports (3) This UIL3 includes buttons regarding the simulation pro- end conditions like when all or a specific number of pedestrians cess. The play button starts the simulation process and the are safe/dead. The pedestrian’s evacuation can be done by training gear button shows all the available options regarding the the agents with RL algorithms. By exploiting the capabilities of simulation ending conditions. ML-Agents toolkit [15] (Section 4) the agents of the Metis system can be trained with popular RL algorithm such as Proximal Policy Optimization (PPO) [23] and Soft Actor-Critic (SAC) [11]. In ad- 3.1 Dynamic Environment Development dition to that the Metis system can be easily interconnected with As mentioned above, the DED layer of the system allows the user popular RL libraries such as RLlib7 and Baselines8 , also custom to design the layout of the building to be evacuated, by placing python RL algorithms can be developed. A typical RL training is the building’s walls. This layer can be characterized as the layer done by creating learning environments in which the agent collects responsible for the content generation of the environment, com- observations and acts based on them. For the training of a general monly used in games [24] and [34], and allows for the creation of model which can evacuate buildings during a crisis situation, the dynamic environments [16] during the environment design process. typical procedure of creating a building environment was followed, The walls are placed using the wall placement tool from the UIL1, setting up doors, designating the exits and placing objects which which places a part of it and can be extended to any direction by also acted as obstacles inside the different rooms. Dynamic crisis dragging the mouse. As a second step, doors can be placed on the management is a part of the simulation that allows the user to walls, allowing pedestrians to move through the rooms. Lastly, a manage the crisis currently unfolding. For now, it allows the user plethora of objects can be placed anywhere inside the building to to start a fire in a different part of the environment than the initial. decorate it and to act as obstacles during the evacuation, which This makes the system more effective and allows the user to observe mimic the real-world indoor objects. During any placement pro- the pedestrians’ behaviors while the crisis changes dynamically. cedure, the walls, doors and objects snap to each other so that the placement can be easier. The DED is considered to be a powerful 3.4 Pedestrian agents training approach tool, which provides the ability to users to create their own in- In this section we present and analyze how the pedestrian agents door realistic environments based on their needs and their cases. were trained, the features that the agents gather in each step, the Thus, giving the opportunity to test several different environments actions the agents take to evacuate a building and the environment without the need of programming skills. in which it was trained. Figure 3 depicts an indoor environment created using the Metis system and then saving it as a single area 3.2 Scenario Design object for ease of use during the training environment setup and to keep the environment setup for further analysis. The highlighted The SD layer is responsible for designing the scenarios, meaning areas with light green, inside the building, are the possible areas that where the fire will start, how many pedestrians will have to evacu- an agent could spawn when an episode begun. Initially, all agents ate the building (multi-agent approach) and where their starting spawned inside the room marked as “1” and each one unlocked the positions will be, which exit they will try to reach and where they next area (light green) one by one. An agent could unlock an area will be safe. The fires’ positions can be chosen with the fire place- once the mean reward of the RL in the last 20 episodes was equal or ment tool of the UIL2, which, during the design process, allows the higher to 0.925, as consequence of a good training for a specific area. user to specify from which position the fire will start to spread. The Every time an episode begins, the agent chooses randomly between fires start spreading when the simulation starts. Different “types” five possible spawn areas. Those five areas are the most recent areas of pedestrians can be placed and each type has different attributes, the agent unlocked. Eventually, when the agent has a mean reward like speed, size, color, health points etc. giving the ability to simu- of over 0.925 in average of all areas, it can spawn in any area. This late different human behaviors. A door is marked as an exit door was done so that the agent learned gradually and all the areas were by right clicking on it. At least one door has to be marked as an unlocked so that it can generalize correctly (without learning only exit due to the way the pedestrians were trained. Lastly, safe areas are used to mark a pedestrian as safe from the crisis during the 7 https://docs.ray.io/en/master/rllib.html simulation, where they are trying to escape to. 8 https://github.com/openai/baselines actions per episode (equal to 10.000 during the training). Distance() calculates the distance between the exit and the agent. It should be noted that the positions of the objects are normalized according to their relative position inside the building. Additionally, the reasoning behind the choice of the negative rewards is to make the agent reach the exit as soon as possible and to collide with as less objects as possible. Lastly, the reward and episode reset from touching the fire makes the agent avoid the fires and not touch them, by assigning the reward of the episode to -1. The available actions for each agent are provided through two output branches, each with two possible actions. The first branch is responsible for the agent’s horizontal movement (left or right) and the second branch for the vertical movement (backward or Figure 3: Building used to train the pedestrian agent. forward). With this setup the agent is able to navigate to a safe exit. Based on these parameters the agents can be trained with many different RL algorithms. By having access to the source code of the to escape from the specific point of the building). The red cubes Metis system, all these parameters and many others can be adjusted inside the building are dummy fire objects which, when touched, based on the need of the experiment. At this point it should be reset the agent’s episode and set its reward to -1. The reasoning highlighted that the Metis system will be provided under open behind the use of dummy objects instead of the actual fire was to source licensing. check if the agent, using the aforementioned raycast components, would eventually learn to avoid those fires. While the intense green is considered to be a safe exit for the pedestrians. The features 4 CASE STUDY gathered by each agent during the training/learning were 70 in For the sake of clarity, a case scenario was setup to present the total, from which 64 were gathered using three “Ray Perception procedure of setting up a building for a crisis evacuation planning Sensor 3D”9 components and 6 were calculated manually: and to also evaluate the quality of the RL based model that was (1) The first raycast component detected objects (static objects trained in the previous section at escaping a different building in and fires) and is blocked by walls and doors, it casts 20 rays a fire crisis scenario. The creation of the building was done with of 15 length (by default 1:1 meters in Unity), in a 140 degrees the UILs and its architecture was quite simple, consisted of three arc in front of the agent, responsible for detecting objects main rooms and one hall. The hall was empty and was connected that have to be avoided (inside the room the agent currently to the other three rooms (east, north and west rooms). Each room is). was decorated with many static objects, such as desks, small and (2) The second raycast component detects doors, safe exit doors large cabinets, small and large shelves and plants. The objects were and walls, with 20 rays of 25 length in an 80 degrees arc in placed in such way to make the evacuation of the building harder, as front of the agent, responsible for detecting, doors and safe it can be seen in the north room in Figure 5. Four doors were placed, exit doors that are close. three connecting the rooms with the hall and one being the safe (3) The third raycast components also detects doors, safe exit exit on the south. The safe exit door was designated as an exit by doors and walls, with 24 rays of 50 length in a 140 degrees right clicking on the door. The next step, when setting up a building arc in front of the agent, responsible for detecting doors and in the Metis system, is to designate a safe area, which when the safe exit doors that are far away. pedestrians touch, are considered as being safe. In our case study, (4) The manually calculated features were: we designated a safe are just outside the exit. The following step (a) The normalized x and z values of the safe exit door is to place the pedestrians into the building. We placed a total of (b) The agent’s position and 25 pedestrians, scattering them around all the rooms, placing some (c) The normalized direction from the agent to the exit door. on difficult areas. Lastly, fires were placed on all rooms, in such During the training, the agent gets −0.4/𝑚𝑎𝑥𝑆𝑡𝑒𝑝 reward for each way that some pedestrians’ paths are partly blocked during the step (action) taken, −0.3/𝑚𝑎𝑥𝑆𝑡𝑒𝑝 if collides with something (static evacuation. Figure 5 shows the layout of our designed building for objects, walls and closed doors) and small positive rewards depend- the case study, along with all the objects, agents and fires placed. ing on its distance from the exit door (Eq. 1). Additionally, when During the training procedure, the environment spawned 60 agents the agent reached the safe area the reward was set to +1. during and each one individually started training. This is a common methodology to speed up the training process. It should be noted 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (𝑛𝑜𝑟𝑚(𝑒𝑥𝑖𝑡𝑝𝑜𝑠 ), 𝑛𝑜𝑟𝑚(𝑎𝑔𝑒𝑛𝑡𝑝𝑜𝑠 )) that, due to the fact that there were multiple agents in the same 𝑅𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = ∗ 0.3 (1) 𝑚𝑎𝑥𝑆𝑡𝑒𝑝 environment, they ignored each other, both physically and feature- Where 𝑛𝑜𝑟𝑚(𝑒𝑥𝑖𝑡𝑝𝑜𝑠 ) the normalized exit’s position, 𝑛𝑜𝑟𝑚(𝑎𝑔𝑒𝑛𝑡𝑝𝑜𝑠 ) wise. For the training procedure the PPO algorithm was exploited, the normalized agent’s position, maxStep the maximum number of which is considered to be one of the most effective RL algorithms for agents’ adopting raycast observations. 9 Rays that are cast into the physics world, and the objects that are hit determine the In our experiment the setup of the chosen PPO algorithm was observation vector that is produced. based on a neural network which approximated the ideal function Figure 4: Cumulative reward, an increase during a successful training of agents training. Figure 5: Layout and case study setup. Figure 6: Beginning of the evacuation. better understand the whole process of creating a building and set- that mapped the agent’s observations to the best action an agent ting up the evacuation and simulation procedure, we have created could take in a given state. The neural network set up was, input: a demo video10 . When a simulation procedure ends, for any reason, 70, hidden layers: 512 and output: 4, with discount factor for future a results window pops up to inform the user about the evacuation rewards set to 𝛾 = 0.995 and the learning rate set to 𝜆 = 0.0003. and the statistics. Figure 7 shows a snapshot of the info-window, Figure 4 depicts the agent’s training results. The agent was trained highlighting the results of our case study. The info-window informs for 14.55 million steps (actions). The reward initially was set to the user about the total pedestrians that survived and died at the [-1, 1]. The multiple drops in the cumulative reward that can be end of the simulation. In this case study, from the total 25 pedestri- seen are due to the different “difficulty” areas that were unlocked ans, 17 survived (agents which reached the green safe area) and 8 (new rooms starting points). This, naturally, had as a result to died (died or didn’t evacuate successfully in case of manual end). drop the total reward as the environment was different from the previous. Eventually, at the end (where all the areas are unlocked) 5 CONCLUSIONS AND FUTURE WORK the cumulative reward reached 0.96. After the training of the agent was completed, to start the simulation and therefore the building Game engines have become more and more popular and have been evacuation, the play button was pressed near the top right corner. exploited for many different applications besides their main target, When this button is pressed, if there is any ongoing crisis (that the development of games. In this paper we present a prototype of is, at least one fire has been placed), all the pedestrians will start a novel crisis simulation system called Metis. Metis is developed individually evacuating the building. using a very popular game engine, Unity, and exploits many of Figure 6 shows a snapshot at the beginning of the evacuation its optimizations such as physics, particle effects, cross platform after the training, when the pedestrians started running towards development etc. In addition to that, the Metis system can make the exit. It can be seen that most of the pedestrians find their way use of trending Reinforcement Learning algorithms, to improve the towards it immediately. Despite that, some of them can be seen simulation realism and the evacuation planning. Its interconnection struggling, with some being stuck running into a corner of a room. Note that to demonstrate how easy it is to setup a scenario and to 10 https://tinyurl.com/MetisMABCSSDemo includes the introduction of other features and functionalities, such as: real time simulation statistics, more explanatory and graphical statistics at the end of a simulation, ability to build multi-level and multiple buildings, more realistic fire propagation, more types of crisis (in addition to fire, such as earthquake, flooding etc.) and allow pedestrians to interact with each other. Moreover, some important considerations towards the future improvement of the system are the incorporation of emotional and psychological features into the agents. This aspect is an important one and has been extensively studied in the literature of CSCM. Lastly, an important feature of a system that aims for longevity and extensibility is to add support for the user (auto guide) to extend the system’s functionalities. ACKNOWLEDGMENTS This work is supported by the MPhil program "Advanced Technolo- Figure 7: Window with the simulation’s results. gies in Informatics and Computers", hosted by the Department of Computer Science, International Hellenic University. with popular RL libraries and its dynamic content (environment) de- REFERENCES velopment can establish it as a powerful research tool for basic and [1] Sameera Abar, Georgios K. Theodoropoulos, Pierre Lemarinier, and Gregory M.P. applied high-level research. Due to the fact that it is developed over O’Hare. 2017. Agent Based Modelling and Simulation tools: A review of the a game engine that supports cross-platform development, it can be state-of-art software. Computer Science Review 24 (2017), 13–33. https://doi.org/ 10.1016/j.cosrev.2017.03.001 considered as a system that can run in multiple operating systems. [2] Christian Becker-Asano, Felix Ruzzoli, Christoph Hölscher, and Bernhard Nebel. As mentioned above, the most important key features of the system 2014. A multi-agent system based on unity 4 for virtual perception and wayfind- are its ease of use for scenario design and simulation, the ability to ing. Transportation Research Procedia 2 (2014), 452–455. https://doi.org/10.1016/ j.trpro.2014.09.059 build case study environments dynamically, dynamic management [3] Lars Braubach, Alexander Pokahr, and Winfried Lamersdorf. 2005. Jadex: A of the crisis situation (multiple crisis situations), exploitation of BDI-Agent System Combining Middleware and Reasoning. Software Agent- Based Applications, Platforms and Development Kits (2005), 143–168. https: various RL algorithms and well known libraries, inherent GPU- //doi.org/10.1007/3-7643-7348-2_7 accelerated agent simulations, agents with various characteristics [4] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schul- and behaviors and, lastly, the ability to specify simulation end con- man, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. (2016), 1–4. arXiv:1606.01540 http://arxiv.org/abs/1606.01540 ditions. [5] R. L. Carstens and S. L. Ring. 1970. Pedestrian capacities of shelter entrances. Furthermore, although the system is in Alpha version, the pre- Traffic Engineering 41, 3 (1970), 38–43. sented experimental results are encouraging and promising. To [6] Mohcine Chraibi, Armin Seyfried, and Andreas Schadschneider. 2010. Generalized centrifugal-force model for pedestrian dynamics. Physical Review E - Statistical, sum-up, by using the Metis system one can design their own build- Nonlinear, and Soft Matter Physics 82, 4 (2010). https://doi.org/10.1103/PhysRevE. ing layout, place a variety of objects, agents and fires, towards the 82.046111 arXiv:1008.4297 [7] Lílian De Oliveira Carneiro, Joaquim Bento Cavalcante-Neto, Creto Augusto development of personal evacuation plan. Due to its simplicity the Vidal, and Teófilo Bezerra Dutra. 2013. Crowd evacuation using cellular automata: Metis system can be used by everyone, even from users without Simulation in a soccer stadium. Proceedings - 2013 15th Symposium on Virtual special programming skills. The aforementioned features of Metis and Augmented Reality, SVR 2013 (2013), 240–243. https://doi.org/10.1109/SVR. 2013.29 focus on a key concept of a dynamic and general system, especially [8] R. Fahy and G. Proulx. 1997. Human Behavior In The World Trade Center due to the dynamic crisis management. The user can start a crisis at Evacuation. Fire Safety Science 5 (1997), 713–724. https://doi.org/10.3801/IAFSS. different moments during the simulation, creating unique scenarios FSS.5-713 [9] Leon Festinger. 1954. A Theory of Social Comparison Processes. Human Relations and allowing them to observe the pedestrians’ reactions. 7, 2 (may 1954), 117–140. https://doi.org/10.1177/001872675400700202 From the results it is obvious that there is room for improvement. [10] G. Nigel. Gilbert and Klaus G. Troitzsch. 2005. Simulation for the social scientist. 295 pages. First and foremost, not all pedestrians found their way towards [11] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon the exit, which means a different training approach with different Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and RL algorithms has to be tested. The best one would be to train the Sergey Levine. 2018. Soft Actor-Critic Algorithms and Applications. (dec 2018). arXiv:1812.05905 http://arxiv.org/abs/1812.05905 agent for much longer, with a dynamic change of the environment [12] B. D. Hankin and R. A. Wright. 1958. Passenger Flow in Subways. OR 9, 2 (jun (random or via curriculum learning). This means that the placed 1958), 81. https://doi.org/10.2307/3006732 fires’ positions have to be changed, along with the exit door, the [13] Dirk Helbing, Péter Molnár, Illés J. Farkas, and Kai Bolay. 2001. Self-organizing pedestrian movement. Environment and Planning B: Planning and Design 28, 3 building’s layout and the agent’s attributes (speed and size), every (2001), 361–383. https://doi.org/10.1068/b2697 time the agent finishes an episode. In general, a better fine-tuning [14] L. A. Hoel. 1968. Pedestrian travel rates in central business districts. Traffic Engineering (1968), 10–13. of the algorithms could provide more accurate evacuations with [15] Arthur Juliani, Vincent-Pierre Berges, Esh Vckay, Yuan Gao, Hunter Henry, fewer losses. Moreover, allowing pedestrians to interact with each Marwan Mattar, and Danny Lange. 2018. Unity: A General Platform for Intelligent other (cooperative learning) will require the agents to be trained in Agents. (sep 2018). arXiv:1809.02627 http://arxiv.org/abs/1809.02627 [16] Chairi Kiourt, Anestis Koutsoudis, and George Pavlidis. 2016. DynaMus: A fully such way that they take into account the number of agents near dynamic 3D virtual museum framework. Journal of Cultural Heritage 22 (nov them or near an exit. In addition to that, future work on the system 2016), 984–991. https://doi.org/10.1016/j.culher.2016.06.007 [17] Vassilios I. Kountouriotis, Manolis Paterakis, and Stelios C. A. Thomopoulos. utility-based behavior models: Sochi Olympic Park Station use case. Procedia 2016. iCrowd: agent-based behavior modeling and crowd simulator, Ivan Kadar Computer Science 136 (2018), 453–462. https://doi.org/10.1016/j.procs.2018.08.266 (Ed.). 98420Q. https://doi.org/10.1117/12.2223109 [27] Patrick Taillandier, Benoit Gaudou, Arnaud Grignard, Quang-Nghi Huynh, Nico- [18] Sheng Yan Lim. 2015. Crowd Behavioural Simulation via Multi-Agent Reinforce- las Marilleau, Philippe Caillou, Damien Philippon, and Alexis Drogoul. 2019. ment Learning. Ph.D. Dissertation. Building, composing and experimenting complex spatial models with the GAMA [19] Francisco Martinez-Gil, Miguel Lozano, and Fernando Fernández. 2014. Strategies platform. GeoInformatica 23, 2 (apr 2019), 299–322. https://doi.org/10.1007/ for simulating pedestrian navigation with multiple reinforcement learning agents. s10707-018-00339-6 Autonomous Agents and Multi-Agent Systems 29, 1 (2014), 98–130. https://doi. [28] Daniel Thalmann. 2016. Crowd Simulation. In Encyclopedia of Computer Graphics org/10.1007/s10458-014-9252-6 and Games. Springer International Publishing, Cham, 1–8. https://doi.org/10. [20] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis 1007/978-3-319-08234-9_69-1 Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with [29] Jason Tsai, Natalie Fridman, Emma Bowring, Matthew Brown, Shira Epstein, Gal Deep Reinforcement Learning. (2013), 1–9. arXiv:1312.5602 http://arxiv.org/abs/ Kaminka, Stacy Marsella, Andrew Ogden, Inbal Rika, Ankur Sheel, Matthew E. 1312.5602 Taylor, Xuezhi Wang, Avishay Zilka, and Milind Tambe. 2011. ESCAPES - Evacu- [21] Rahul Narain, Abhinav Golas, Sean Curtis, and Ming C. Lin. 2009. Aggregate ation simulation with children, authorities, parents, emotions, and social com- Dynamics for Dense Crowd Simulation. ACM Transactions on Graphics 28, 5 parison. 10th International Conference on Autonomous Agents and Multiagent (2009), 1–8. https://doi.org/10.1145/1618452.1618468 Systems 2011, AAMAS 2011 1 (2011), 425–432. [22] Nuria Pelechano and Ali Malkawi. 2008. Evacuation simulation models: Chal- [30] Armel Ulrich Kemloh Wagoum, Mohcine Chraibi, Jonas Mehlich, Armin Seyfried, lenges in modeling high rise building evacuation with cellular automata ap- and Andreas Schadschneider. 2012. Efficient and validated simulation of crowds proaches. Automation in Construction 17, 4 (2008), 377–385. https://doi.org/10. for an evacuation assistant. Computer Animation and Virtual Worlds 23, 1 (feb 1016/j.autcon.2007.06.005 2012), 3–15. https://doi.org/10.1002/cav.1420 [23] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. [31] Ulrich Weidmann. 1993. Transporttechnik der Fussgänger. https://doi.org/10. 2017. Proximal Policy Optimization Algorithms. (jul 2017). arXiv:1707.06347 3929/ethz-a-010025751 http://arxiv.org/abs/1707.06347 [32] Michael Wooldridge. 2009. An Introduction to MultiAgent Systems, 2nd Edition. [24] Noor Shaker, Julian Togelius, and Mark J. Nelson. 2016. Procedural Content 484 pages. Generation in Games. Springer International Publishing, Cham. https://doi.org/ [33] Jiajian Xiao, Philipp Andelfinger, David Eckhoff, Wentong Cai, and Alois Knoll. 10.1007/978-3-319-42716-4 2019. A survey on agent-based simulation using hardware accelerators. Comput. [25] Jivitesh Sharma, Per-Arne Andersen, Ole-Chrisoffer Granmo, and Morten Good- Surveys 51, 6 (2019). https://doi.org/10.1145/3291048 win. 2019. Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire [34] Georgios N. Yannakakis and Julian Togelius. 2018. Artificial Intelligence and Evacuation Environment. (2019), 1–21. arXiv:1905.09673 http://arxiv.org/abs/ Games. Springer International Publishing, Cham. https://doi.org/10.1007/978-3- 1905.09673 319-63519-4 [26] Andrey Simonov, Aleksandr Lebin, Bogdan Shcherbak, Aleksandr Zagarskikh, and Andrey Karsakov. 2018. Multi-agent crowd simulation on large areas with