MiSim — A Lightweight and Extensible Simulator for a Scenario-Based Resilience Evaluation of Microservice Architectures Lion Wagner1 , Sebastian Frank1 , Alireza Hakamian1 and André van Hoorn2 1 University of Stuttgart, Institute of Software Engineering, Universitätsstraße 38, 70569 Stuttgart, Germany 2 University of Hamburg, Department of Informatics, Vogt-Kölln-Straße 30, 22527 Hamburg, Germany Context and Problem With the growing popularity of microservice-based architectures, the need for an effective resilience assessment of these systems occurred. A resilience assessment is often done in production using so-called chaos experiments. While producing representative results, chaos experiments often (1) require a significant investment of time, (2) impact customer experiences as they stress the system under test, and (3) do not lean towards parallel running because of interference. Simulating chaos experiments is an alternative approach to running experiments in a production environment. Today, there are many simulators available for distributed and service-oriented architectures. Popular examples are SimuLizar [1], DraCeo [2], BigHouse [3], 𝜇qsim [4], PacketStorm [5], iFogSim [6] or GreenCloud [7]. Most of these focus on either performance or efficiency by simulating or solving queuing models. However, none of them satisfies the following requirements: (1) supporting many resilience patterns, (2) simulating multiple typical chaos injections such as killing a service instance, and (3) being a lightweight simulator that has low overhead in modeling and simulation. Objective Therefore, we developed MiSim, a simulator specializing in the simulation of (1) resilience patterns and (2) chaos injections. In addition, the simulator supports scenario structures introduced by the Architecture Tradeoff Analysis Method (ATAM) [8] as input. Furthermore, it was targeted to have no external requirements, such as a Platform as a Service (PaaS), submodules, or libraries. SSP’21: Symposium on Software Performance, November 09–10, 2021, Leipzig, Germany " st148345@stud.uni-stuttgart.de (L. Wagner); sebastian.frank@iste.uni-stuttgart.de (S. Frank); mir-alireza.hakamian@iste.uni-stuttgart.de (A. Hakamian); andre.van.hoorn@uni-hamburg.de (A. v. Hoorn)  0000-0003-2567-6077 (A. v. Hoorn) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Method We elicited 24 requirements for MiSim by interviewing stakeholders. In our case, the stakeholders are a group of researchers interested in simulating chaos experiments. We evaluated the simulator engine quality and usability of the simulator. For simulator engine quality, we performed a (1) code review, (2) feature comparison to other simulators, (3) performance analysis, and (4) analyzed the simulation accuracy of real-world scenarios [9]. Result The current version of MiSim supports 20 out of the 24 elicited requirements. We did not implement four requirements due to time restrictions of the project. For a more detailed overview of supported features, have a look at MiSim’s GitHub repository1 . As required, MiSim supports (1) common resilience patterns (i.e., Circuit Breaker, Rate Limiter, Retry, Autoscaler, and Self-Restarting) and (2) chaos injections (i.e., instance/service killing and latency injections). Additionally, (3) there are no external dependencies, and the compiled simulator is only about 11 MB in size. Among others, the simulator accepts LIMBO models [10, 11] as workload definitions and an ATAM scenario-based experiment format. Lastly, throughout the development process, multiple supporting tools for creating architectures or scenarios were created in related projects [12, 13, 14]. Regarding the usability of the simulator, we performed an online meeting with our stake- holders to collect experiences on installing and running chaos experiments through MiSim. The overall feedback was positive. However, one minor complaint was about the necessity to improve the code documentation. The performance evaluation of MiSim revealed that it has a high memory-impact, that strongly relates to the number of simulated requests. This is mainly due to a faulty metric collection system. Additionally, the computational demand on the underlying simulation engine DESMO-J1 is relatively high. Over 90% of the computation time is spent on the (re-)scheduling of events. However, even for the simulation of complex experiments, the actual computation time does not take unreasonably long. Lastly, simulating real-world scenarios confirmed that the implemented patterns behave as expected. Furthermore, this showed that the calibration options and accuracy of the simulation could be improved, since specifically varying workloads were sometimes poorly simulated. Talk Outline and Additional Resources In this talk, we will present the current state of MiSim. We will show an extract of its internal design and how other researchers/practitioners can extend the simulator. Further, we demon- strate how to extract architecture models from real traces. Additionally, we show examples of chaos experiments and cover our findings on performance and accuracy. For a preview, check out MiSim’s GitHub repository2 . 1 http://desmoj.sourceforge.net/ 2 https://github.com/Cambio-Project/resilience-simulator Acknowledgments This research is funded by the Baden-Württemberg Stiftung (Orcas project) and the German Federal Ministry of Education and Research (Software Campus 2.0 — Microproject: DiSpel). References [1] M. Becker, S. Becker, J. Meyer, Simulizar: Design-time modeling and performance analysis of self-adaptive systems, in: S. Kowalewski, B. Rumpe (Eds.), Software Engineering 2013, Gesellschaft für Informatik e.V., Bonn, 2013, pp. 71–84. [2] H. H. A. Valera, M. Dalmau, P. Roose, J. Larracoechea, C. Herzog, DRACeo: A smart simulator to deploy energy saving methods in microservices based networks, in: 2020 IEEE 29th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), volume 2020-Septe, IEEE, 2020, pp. 94–99. [3] D. Meisner, J. Wu, T. F. Wenisch, BigHouse: A simulation infrastructure for data center systems, in: 2012 IEEE International Symposium on Performance Analysis of Systems & Software, IEEE, 2012, pp. 35–45. [4] Y. Zhang, Y. Gan, C. Delimitrou, µqSim: Enabling Accurate and Scalable Simulation for Interactive Microservices, in: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE, 2019, pp. 212–222. [5] Packetstorm Communications, Network Simulation - PacketStorm, 2018. [6] H. Gupta, A. Vahid Dastjerdi, S. K. Ghosh, R. Buyya, iFogSim: A toolkit for modeling and simulation of resource management techniques in the Internet of Things, Edge and Fog computing environments, in: Software - Practice and Experience, volume 47, John Wiley and Sons Ltd, 2017, pp. 1275–1296. [7] D. Kliazovich, P. Bouvry, S. U. Khan, GreenCloud: a packet-level simulator of energy-aware cloud computing data centers, The Journal of Supercomputing 62 (2012) 1263–1283. [8] R. Kazman, M. Klein, P. Clements, ATAM: Method for Architecture Evaluation, Technical Report, Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA 15213, 2000. [9] S. Frank, A. Hakamian, L. Wagner, D. Kesim, J. von Kistowski, A. van Hoorn, Scenario- based resilience evaluation and improvement of microservice architectures: An experience report, in: Companion of the 15th European Conference on Software Architecture (ECSA 2021), 2021. To appear. [10] J. v. Kistowski, N. R. Herbst, S. Kounev, Modeling variations in load intensity over time, LT ’14, Association for Computing Machinery, New York, NY, USA, 2014, p. 1–4. [11] J. v. Kistowski, N. Herbst, S. Kounev, Limbo: A tool for modeling variable load intensities, in: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering, ICPE ’14, Association for Computing Machinery, New York, NY, USA, 2014, p. 225–226. [12] S. Beck, Simulation-based Evaluation of Resilience Antipatterns in Microservice Architec- tures, 2018. Bachelor’s Thesis, University of Stuttgart. [13] C. Zorn, Interactive elicitation of resilience scenarios in microservice architectures, 2021. Masters’s Thesis, University of Stuttgart. [14] N. Kammhoff, Algorithms for Efficient Chaos Experiment Selection in Microservice Archi- tectures, 2019.