A Vision for Explainability of Coordinated and Conflicting Adaptions in Self-Adaptive Systems Sandro Speth1 , Sarah Stieß1 , and Steffen Becker1 Institute of Software Engineering, University of Stuttgart, Stuttgart, Germany {sandro.speth,sarah.stiess,steffen.becker}@iste.uni-stuttgart.de Abstract. In the microservice domain, self-adaptive systems exist that reconfigure themselves to adhere to their guaranteed quality of service in the face of a changing environment. Constant changes in the environment enforce continuous adaptations of the system. Especially, different and potentially conflicting adaptations might interact, making it challenging to explain the decision and rationale behind the overall reconfiguration. In this paper, we discuss different approaches for the explainability of self-adaptive systems. Furthermore, we propose our approach to achieve a good trade-off between explainability and its performance impact for the mandatory data gathering. The approach encompasses eliciting re- quirements regarding explanations and their representations and experi- menting on reference architectures for insights into the data required to fulfil the requirements. Keywords: Microservices · Self-adaption · Explainability. 1 Introduction Modern Cloud-native applications increasingly consist of self-adaptive microser- vice systems to better cope with constant changes in the environment and de- mands [10]. To achieve a better overall resilience, services adapt themselves through reconfigurations of their architecture, e.g., by scaling or by replacing entire failed services [1, 4]. Therefore, a self-adaptive system must monitor and analyze its current state, plan on which adaptions to take, if any required, and execute these actions without human intervention [6]. Due to the varying amount of users in the cloud, the workload changes con- stantly and enforces continuous adaptions, which may, either accidentally or on purpose, happen simultaneously and, therefore, influence or even conflict with each other. As an example, for two dependent services that are part of a larger architecture, scaling out the consuming service increases the incoming load of the consumed one, causing the consumed one to scale as well. Regarding conflicts, a service might define adaptation rules based on different metrics, e.g., response time and CPU load. In case the response time increases while the CPU load does not, e.g., if the increased response time is caused by waiting for another service, the response time rule triggers a scale out, followed by the CPU load rule trying to scale back in. In these examples, the behaviour deviates from the J. Manner, D. Lübke, S. Haarmann, S. Kolb, N. Herzberg, O. Kopp (Eds.): 14th ZEUS Workshop, ZEUS 2022, Bamberg, held virtually due to Covid-19 pandemic, Germany, 24–25 February 2022, published at http://ceur-ws.org Copyright © 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Coordinated and Conflicting Adaptions in Self-Adaptive Systems 17 expected or is sub-optimal. To comprehend the behaviour and to improve and fix the system, a DevOps engineer must understand the rationale behind the performed self-adaptations [12], especially if they are frequently recurring [13]. However, the interactions between potentially conflicting adaptations are chal- lenging to understand, especially if the adaptation rules are more elaborated. Consequently, the need to explain the interactions between various adaptations arises. To create an explanation that DevOps engineers easily understand, we must identify which information the explanation should contain. Furthermore, for the sake of performance, we must find a reasonable trade-off between the amount of gathered data and the granularity of the explanation. This leads to our problem statements: Problem 1. What is mandatory information to explain the coordination of and the interactions between multiple, perhaps conflicting reconfigurations and their impact on the system’s overall adaptation behaviour? Problem 2. How can we obtain the components of such an explanation while keeping a trade-off between the quality of the explanation and the performance impact of the data gathering? 2 Related Work Explainability is becoming increasingly popular and essential in many research fields as it allows developers to understand systems more efficiently [5, 12]. In the context of cyber-physical systems, Bohlender et al. [3] characterise an expla- nation as a collection of information that has a target group and a subject and improves the target groups’ understanding of the subject [3]. As the usefulness of the explanation depends on the target group, this endorses the importance of our first problem. Klös et al. [8, 9] consider the explainability of self-learning self-adaptive sys- tems. Their system adapts based on timed adaption rules and improves them with a genetic learning algorithm. It records various information, such as which condition in the system or environment triggered the adaption, the adaptation’s expected effects and its actual effects, and feeds these information into a learn- ing algorithm [9, 8]. Furthermore, they state that the collected information may serve as explanations of the system’s adaptions or as a foundation to create fur- ther explanations for specific target groups [7]. In contrast to our problem, their initial focus is on explaining the self-learning aspect. In addition, they focus on single rules only instead of coordinated reconfigurations. Blumreiter et al. [2] propose the reference framework MAB-EX for self- explaining systems. Their framework consists of four steps: (1) Monitor, (2) Ana- lyze, (3) Build and (4) EXplain. Monitor and Analyze are analogous to the steps from the MAPE-K [6] loop. Build creates the explanation, and Explain trans- forms the explanation into a representation befitting the receiver and transmits it to the receiver [2]. The last step emphasises the importance of the target group. MAB-EX proposes two realisations for assisted driving systems [2]. In contrast to that, we focus on self-adaptive microservice systems. 18 Sandro Speth et al. 3 Proposed Approach In compliance with Bohlender et al. [3], we define DevOps engineers as the tar- get group for explanations of self-adaptations. Furthermore, we identified three subjects: (1) (non-)application of a single reconfiguration, (2) coordination of reconfigurations, and (3) influences and relations between reconfigurations. For our first problem statement, we already conducted an expert survey re- garding reconfiguration on a Kubernetes cluster and found out that DevOps engineers consider Kubernetes’ primarily textual representations and logs chal- lenging to understand and, therefore, preprocessed cognitive effective representa- tions are needed. Next, we plan to conduct an expert survey on DevOps engineers to identify requirements, mandatory information, and suitable representations, e.g. text or visual, interactive or static, which improve the DevOps engineers’ understanding of the self-adaptations. We expect that an explanation requires at least information about (1) the components which were adapted, (2) the config- uration of the components before the adaption, (3) the time of the adaption, and (4) the environmental change stimuli, e.g., the workload for the affected compo- nents triggering the adaption. Based on the elicited requirements, we decide on a fitting representation for the explanations. For example, explanations could be reported as cross-component issues [14] in Gropius [15], as issues are an already well-established natural platform to explain problems. This way, the explana- tions would be available in the developer’s IDE to reduce context-switches [16]. For the second problem statement, we need a reference architecture for self- adaptive systems to evaluate our solution approach. The system is required to execute not only single reconfigurations but multiples in coordination while pro- viding various metrics and data for the explanations. We plan to conduct a litera- ture survey to identify suitable reference architectures, starting with the list pro- vided by Taibi1 . To monitor environmental change-stimuli to simulate and gain required information to explain adaptions, we plan to instrument OpenAPM [11] solutions. Especially, the monitoring solution should provide data and insights about the system’s behaviour after a reconfiguration to assert the correct exe- cution of adaption. However, deciding on the monitored metrics, their level of detail, and how long to preserve the data depends on the requirements collected in problem 1. Finally, we plan to evaluate explanations created from our refer- ence architecture’s adaptions for their comprehensibility by performing expert surveys with DevOps engineers as representatives of our target group. 4 Conclusion Interactions between self-adaptations and potential conflicts between them are difficult to understand. Therefore, the need for explaining the rationale behind such adaptations arises. However, current approaches focus on explaining sin- gle adaptations only. Therefore, we propose our ideas of improving the DevOps engineers’ understanding of a self-adaptive system by explaining single system 1 https://github.com/davidetaibi/Microservices Project List Coordinated and Conflicting Adaptions in Self-Adaptive Systems 19 reconfiguration decisions as well as coordinated reconfiguration decisions and their influences on and relations with each other. Our ideas include (1) deter- mining the requirements for explanations in self-adaptive systems and (2) how to create a suitable explanation. References 1. Aderaldo, C.M., et al.: Kubow: An architecture-based self-adaptation service for cloud native applications. In: Proceedings of the 13th European Conference on Software Architecture - Volume 2. p. 42–45. ACM (2019) 2. Blumreiter, M., Greenyer, J., Chiyah Garcia, F.J., Klös, V., Schwammberger, M., Sommer, C., Vogelsang, A., Wortmann, A.: Towards self-explainable cyber-physical systems. In: 2019 ACM/IEEE 22nd International Conference on Model Driven En- gineering Languages and Systems Companion (MODELS-C). pp. 543–548 (2019) 3. Bohlender, D., Köhl, M.A.: Towards a characterization of explainable systems (2019) 4. Garlan, D., Cheng, S.W., Huang, A.C., Schmerl, B., Steenkiste, P.: Rainbow: architecture-based self-adaptation with reusable infrastructure. Computer 37(10), 46–54 (2004) 5. Greenyer, J., Lochau, M., Vogel, T.: Explainable software for cyber-physical sys- tems (ES4CPS): report from the GI dagstuhl seminar 19023, january 06-11 2019, schloss dagstuhl. CoRR abs/1904.11851 (2019), http://arxiv.org/abs/1904.11851 6. Kephart, J., Chess, D.: The vision of autonomic computing. Computer 36(1), 41–50 (2003) 7. Klös, V.: Explainable self-learning self-adaptive systems. In: Explainable Software for Cyber-Physical Systems (ES4CPS): Report from the GI Dagstuhl Seminar 19023, January 06-11 2019, Schloss Dagstuhl. pp. 46–47 (2019) 8. Klös, V., Göthel, T., Glesner, S.: Comprehensible and dependable self-learning self-adaptive systems. Journal of Systems Architecture 85-86, 28–42 (2018) 9. Klös, V., Göthel, T., Glesner, S.: Comprehensible decisions in complex self-adaptive systems. In: Software Engineering und Software Management 2018. pp. 215–216. Gesellschaft für Informatik, Bonn (2018) 10. Newman, S.: Building Microservices: Designing Fine-Grained Systems. O’Reilly, 2nd edn. (2021) 11. Novatec GmbH: Openapm: Landscape for apm tools, obervability tools and mon- itoring tools, https://openapm.io/landscape 12. Sadeghi, M., Klös, V., Vogelsang, A.: Cases for explainable software systems: Char- acteristics and examples. In: Proceedings of 2021 IEEE 29th International Require- ments Engineering Conference Workshops (REW). pp. 181–187. IEEE (2021) 13. Speth, S.: Semi-automated cross-component issue management and impact analy- sis. In: Proceedings of 2021 36th IEEE/ACM International Conference on Auto- mated Software Engineering (ASE). pp. 1090–1094. IEEE (2021) 14. Speth, S., Becker, S., Breitenbücher, U.: Cross-component issue metamodel and modelling language. In: Proceedings of the 11th International Conference on Cloud Computing and Services Science (CLOSER 2021). pp. 304–311. SciTePress (2021) 15. Speth, S., Breitenbücher, U., Becker, S.: Gropius — a tool for managing cross- component issues. In: Software Architecture. vol. 1269, pp. 82–94. Springer (2020) 16. Speth, S., Krieger, N., Breitenbücher, U., Becker, S.: Gropius-vsc: Ide support for cross-component issue management. In: Companion Proceedings of the 15th European Conference on Software Architecture. CEUR (2021)