=Paper=
{{Paper
|id=Vol-3612/IWESQ_2023_Paper_06
|storemode=property
|title=Reinforcement Learning-based Service Assurance of Microservice Systems
|pdfUrl=https://ceur-ws.org/Vol-3612/IWESQ_2023_Paper_06.pdf
|volume=Vol-3612
|authors=Xiaojian Liu,Yangyang Zhang,Wen Gu,Qiao Duan, Qingqing Ji
|dblpUrl=https://dblp.org/rec/conf/apsec/LiuZGDJ23
}}
==Reinforcement Learning-based Service Assurance of Microservice Systems==
Reinforcement Learning-based Service Assurance of Microservice Systems Xiaojian Liu1, Yangyang Zhang2, Wen Gu1, Qiao Duan1 and Qingqing Ji3 1 Beijing University of Technology, Beijing, China 2 China Electronics Standardization Institute, Beijing, China 3 Chinese Academy of Sciences, Beijing, China Abstract As microservices architecture has steadily emerged as the prevailing direction in software system design, the assurance of services within microservices systems has garnered increasing attention. The concept of intelligent service assurance within microservices systems offers a novel approach to addressing adaptation challenges in complex, risk-laden environments. This paper introduces a groundbreaking approach known as the Reinforcement Learning (RL) Based Service Assurance Method for Microservice Systems (RL-SAMS), which incorporates the fundamental RL principle of "improving performance through experience" into service assurance activities. Through the implementation of an intelligent service degradation mechanism, the continuity of services is ensured. Within the framework of our designed microservices system, two essential components are introduced: the Adapter Component (AC) and the RL Decision-making Component (RLDC). Each microservice is treated as an independent RL agent, resulting in the construction of a multi-agent RL decision-making architecture that balances "centralized learning and decentralized decision-making." This intelligent decision- making model undergoes training and learning, accumulating positive experiences through continuous trial and error. Experimental cases demonstrate that RL-SAMS outperforms the widely adopted Hystrix across various service risk scenarios, particularly excelling in intelligently critical service assurance. Keywords Reinforcement learning; Microservice system; Intelligent service assurance 1 The autonomy and collaborative interaction 1. Introduction among microservices offer both advantages and, at the same time, present significant service reliability risks. In 2014, Martin Fowler formally introduced the On one hand, this autonomy entails separate concept of "Microservices" through his blog post titled operations, maintenance, and independent decision- "Microservices." This innovative approach to software making. This can lead to a focus on local interests at the architecture involves breaking down a software expense of global considerations, sometimes even system into numerous small services, each operating resulting in conflicting service assurance efforts independently in its own process. When compared to among microservices. On the other hand, the intricate traditional monolithic systems, microservices business interactions among microservices often architectures offer several notable advantages, amplify "local failures" into "cascading failures," including the ability to deploy independently, triggering an "avalanche effect." In such cases, problem effortless scalability, and decentralization. An resolution becomes elusive as the root cause remains increasing number of network applications have made elusive. the transition to microservices architecture, with The key to addressing these service assurance notable examples including Amazon, Netflix, Twitter, challenges lies in establishing an effective group SoundCloud, and PayPal. To give you an idea of the decision-making mechanism within the microservices scale, a single page on Amazon can trigger system. This mechanism empowers each microservice approximately 100 to 150 microservice calls, while the with the ability to comprehend the bigger picture and Netflix system manages a staggering 5 billion make decisions for the entire system. This paper, microservice interactions on a daily basis [1]. It's utilizing a reinforcement learning approach, explores evident that microservice architecture has a service assurance decision-making method tailored progressively emerged as the predominant for microservices systems. Each microservice is developmental direction for software system conceptualized as an independent reinforcement architecture [2][3][4]. learning agent. Through continuous interactions with 5th International Workshop on Experience with SQuaRE Series and its Future Direction, December 04, 2023, Seoul, Korea liuxj@bjut.edu.cn (X. Liu);zhangyy@cesi.cn (Y. Zhang) 0000-0002-0666-4102 (X. Liu); 0009-0006-4940-8527(Y. Zhang) Β© 2023 Copyright for this paper by its authors. The use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 34 the service environment and the operational and System and environmental modelling research. Zhao maintenance environment, the fundamental concept et al. [16] proposed a learning framework that of "enhancing performance through experiential integrates online and offline work based on learning" is woven into the fabric of microservice reinforcement learning and case sets. Belhaj et al. [17] assurance. This equips the decision-making system put forward a framework named "autonomic with the capacity to intelligently differentiate between container", which endows applications with run-time assurance targets and to flexibly provide assurance for adaptive action capability based on RL method. With critical elements. model-based reinforcement learning method, Ho HN Section 2 of the paper provides a summary of et al. 18] used Markov process to model the related research, with a particular emphasis on the environment state, which is applied for the planning current state of research in microservice assurance and continuous optimization of adaptive software technology and reinforcement learning methods. In systems. Tesauro et al. [19] utilized reinforcement Section 3, we present an overview of the RL-SAMS learning method to solve the problem of service method along with an introduction to its key ranking. components. Section 4 showcases the effectiveness of Regarding multi-agent RL, the representative the RL-SAMS method through pre-experimental studies in recent years include MADDPG (Multi-Agent results. Finally, in Section 5, we summarize the Deep Deterministic Policy Gradient) [20] and COMA contributions of this paper and outline potential (Counterfactual Multi-Agent actor-critic) [21], both of directions for future research. which are based on classic Actor-Critic architecture. At present, multi-agent RL is one of the most focused and widely researched directions in reinforcement 2. Related Works learning methods. In summarizing the current state of research, it's Technologies related to microservice assurance clear that while various technologies and effective include service degradation technology [5][6], service measures have been developed for microservice fault tolerance technology [7], service elastic scaling system assurance from different angles, most of them technology [8][9], service current limiting technology primarily address localized issues and decision- [10] etc. Santos et al. [6] proposed a strategy for online making within their own domains. As a result, they service degradation based on quality of service (QoS), often fall short in comprehensively addressing the which aims to minimize request congestion due to lack decision-making requirements for the overall system's of system resources; Combining architecture analysis assurance. The challenge now lies in merging the method and sensitivity analysis method, Wang et al. [7] decision-making traits inherent to microservice proposed a fault-tolerant strategy algorithm based on architecture with the valuable insights gained from the reliability criticality measurement; Coulson et al. [9] remarkable research achievements in reinforcement designed an automatic expansion system prototype of learning methods within the realm of adaptive control. microservice based on supervised learning; Firmani et The objective is to empower each microservice with a al. [10] put forward an API call rate limit selection global perspective and intelligent decision-making strategy in order to prevent unauthorized users from capabilities. This remains at the forefront of ongoing achieving ultra-high SLA. Most of the existing research research efforts. on microservice assurance focus on the local situation of their respective microservices. It is impossible to comprehensively consider the guarantee of service 3. RL-SAMS Methodology expectations from the perspective of users. One of the key problems that need to be solved is how to establish The comprehensive architecture of RL-SAMS is an assurance system of service for global decision- illustrated in Figure 1. Building upon the Microservice making without breaking the original distributed and System Component (MC), we've introduced the independent framework of microservice. Adapter Component (AC) and the RL Decision-making The existing research on reinforcement learning- Component (RLDC). Within the MC, we've enhanced enabled software adaptive control can be roughly each microservice by incorporating the AC. This divided into: (1) Strategy generation and evolution enhancement includes the addition of a SMM and a research. Wang et al. [11] used reinforcement learning DCM, both of which provide interfaces for interaction method to solve the problem of dynamic service with the RLDC. To keep the illustration configuration in the integrated adaptive system. Wang straightforward, Figure 1 simplifies the et al. [12] used reinforcement learning method, interdependence among multiple microservices. The combined with Markov model Gaussian process, to RLDC establishes a mechanism characterized by establish a multi-agent game model, which aims to "centralized learning and decentralized decision- solve the problem of self-adaptive combination of making." services. Rao et al. [13] proposed a distributed The fundamental concept of "enhancing learning mechanism to solve the problem of resource performance through experiential learning" is allocation in the cloud environment. Dongsun et al. [14] embedded into microservice assurance. This proposed a framework-based online planning method integration is achieved through the ongoing for self-management, which enables the software interactive learning of multiple agents, taking into system to change and improve its plan through online account the effects of system operation and RL. Amoui et al. [15] used RL in the planning process maintenance, user expectations, and various other to support action selection, and clarifies why, how and state factors. when RL can benefit autonomous software systems. (2) 35 Figure 1: Architecture of RL-SAMS 4 Experiment will activate these endpoints to achieve simple state monitoring to 3.1. Adapter Component demonstrate the effectiveness of RL-SAMS. Customized SMMs and interfaces are also The core function of the AC is to provide an Interactive suitable for the mechanism proposed in this interface for the RLDC to perceive the running service paper. state of the microservice system, and to timely control 2. Dynamic configuration module (DCM). To the configuration and implementation of various types achieve runtime oriented dynamic assurance, it of assurance actions. The main functional modules is required the RLDC have the ability to include a state monitoring module (SMM) and a dynamically configure and execute assurance dynamic configuration module (DCM). action without restarting the microservice. We 1. State monitoring module (SMM). The content establish a configuration center server to of state monitoring depends on the actual centrally manage the configuration files of each requirements, such as request volume, correct microservice, and the RLDC controls the rate, response time, etc., and can also be content of each microservice configuration file specific business parameters, exception codes, according to the decision result, as well as the etc. Spring Cloud framework provides action of microservice configuration update, so "/metrics" endpoint, "/health" endpoint, as to realize the service assurance, as showed "/trace" endpoint and other interfaces for in Figure 2. regular microservice state monitoring. Section Figure 2: Interaction between AC and RLDC 36 Figure 3: Framework of RLDC 2. Evaluation strategy ππ takes the current state of local microservice π π as input, and 3.2. RL Decision-making Component outputs the assurance action ππ corresponding to π π : In the RLDC, each microservice with decision-making π ππ = ππ (π π |πππ£ππ ) ability is modelled as an independent agent for The evaluation strategy ππ is continuously centralized training and decentralized execution. That trained and learned based on the feedback of Q- is, in training stage, the learning of each agent is value from "value decision" module. πππ£ππ is the performed using globe states to consider strategies of parameter of πππ‘_ππ£πππ’ππ‘πππ_ππππ‘πc. other agents; in execution stage, each agent only Although decentralized decision-making, each makes decisions based on its own state perception. In microservice is closely related in business logic, so the addition, an experience replay pool is set up, and the service effect of each microservice is mostly experience replay mechanism is used to solve the comprehensive evaluation. Therefore, compared with problems of correlation between training samples and MADDPG, which designs a critic module for each agent, unfixed probability distribution of training samples. this paper designs a shared critic module (i.e., "value Each state transition are recorded as state-action pair decision" module) for all microservices, and outputs and the corresponding reward and next state, as the corresponding Q-value of each microservice follows: according to the comprehensive reward function. The (π 1 , π 2 , β¦ , π π ; π1 , π2 , β¦ , ππ ; π ; π 1β² , π 2β² , β¦ , π πβ² ) "value decision" module designs two neural networks Where π π is the current state of each with the same structure: Value decision target microservice. ππ is assurance action selected by each network πππ‘_π‘πππππ‘_ππππ‘ππ and Value decision microservice. π is reward value, such as the degree of evaluation network πππ‘_ππ£πππ’ππ‘πππ_ππππ‘ππ, which are satisfaction of various usersβ expectations after each used to output the Q-value of each microservice assurance action is performed. π πβ² is the next state of assurance action based on the global state of the each microservice. The framework and process of the microservice system: two microservices are shown in Figure 3. Each 1. πππ‘_π‘πππππ‘_ππππ‘ππ takes the next state of the microservice corresponds to an independent "action microservice system (π 1β² , π 2β² , β¦ , π πβ² ) and the decision" module and a shared "value decision" corresponding (π1β² , π2β² , β¦ , ππβ² ) as the input, and module. There are two strategy networks with same outputs the Q-value corresponding to the next state structure in one "action decision" module: Target strategy ππβ² and evaluation strategy ππ , which are of each microservice: π used to assurance decision making based on local π1β² (π πβ² , ππβ² |ππ‘πππππ‘ ) microservice state: where ππ‘πππππ‘ is the parameter of 1. Target strategy ππβ² takes the next state of πππ‘_π‘πππππ‘_ππππ‘ππ. πππ‘_π‘πππππ‘_ππππ‘ππ does not local microservice π πβ² as input, and outputs the actively train and learn, but periodically updates it assurance action ππβ² corresponding to π πβ² : with the continuously learned parameters of π ππβ² = ππβ² (π πβ² |ππ‘πππππ‘ ) πππ‘_ππ£πππ’ππ‘πππ_ππππ‘ππ to increase the stability of The target strategy ππβ² does not actively train, but the learning process. periodically updates it with the parameters of the 2. πππ‘_ππ£πππ’ππ‘πππ_ππππ‘ππ takes the current continuously learning evaluation trategy ππ , state of the microservice system (π 1 , π 2 , β¦ , π π ) thereby increasing the stability of the learning and the corresponding (π1 , π2 , β¦ , ππ ) as input, process. ππ‘πππππ‘ is the parameter of and outputs the Q-value corresponding to the current state of each microservice value: πππ‘_π‘πππππ‘_ππππ‘ππ. π ππ (π π , ππ |ππ‘πππππ‘ ) 37 where πππ£ππ is the parameter of simulation modules for three business function πππ‘_ππ£πππ’ππ‘πππ_ππππ‘πc. πππ‘_ππ£πππ’ππ‘πππ_ππππ‘ππ microservices: πΆπππ_ππππππ‘, πππ_ππππ_ππππππ‘, and periodically selects several state transition records ππππ£ππππ_π’π ππ. We set three simulation modules with randomly from the experience replay pool for different pressure cycles to simulate different training and learning, letβs say π. The process of pressure sources of the microservice system to verify training and learning is the process of continuously the core business priority assurance capability of RL- optimizing the difference between the estimated SAMS in the face of different pressure sources. Q-value and the actual Q-value. The loss function is defined as: 1 π 4.2. Experimental Design πΏ(πππ£ππ ) = β(π + πΎ β ππβ² (π πβ² , ππβ² |ππ‘πππππ‘ ) π π 2 The experiment takes whether the two request β ππ (π π , ππ |πππ£ππ )) microservices perform service degrade as action space, where πΎ is the learning rate, πΎ β [0,1] . The πππππ β [on, off] , ππππ_ππππ β [on, off] , and compares larger the πΎ , the more emphasis on long-term the average reward value of all heartbeat monitoring rewards in the learning process. The evaluation requests for two client microservices within 15s after strategy of each microservice ππ updates the each assurance action. πππππ = ππ means that the parameters according to gradient descent (J1 and service degradation mechanism is enabled to ensure J2 in Figure 3): service continuity, and πππππ = πππ means the 1 π βπ½ β β βππ (π π |πππ£ππ ) β β ππ (π π , ππ , πππ£ππ ) opposite. Reward function is defined as: π β π CC β π NC π = + πΆπππ_ππππ’ππ π‘π πππ_ππππ_ππππ’ππ π‘π 4. Experiments WhereοΌ 4, πππππ_π πππ£πππ 4.1. Experimental scene π CC = { 1, πππππππ_π πππ£πππ β3, π πππ£πππ_πππππ’ππ 1, πππππ_π πππ£πππ In order to verify the effectiveness of RL-SAMS, we π NC = { 0, πππππππ_π πππ£πππ build a user-information-querying system consisting β1, π πππ£πππ_πππππ’ππ five microservices with "VMware Workstation 16 Pro", πΆπππ_ππππ’ππ π‘π and πππ_ππππ_ππππ’ππ π‘π are the as shown in Figure 4. The system includes three total number of microservice state heartbeat business microservices, one configuration center monitoring requests sent randomly in the microservice and one registry center microservice. corresponding period, β π CC and β π NC are the Each microservice is developed based on "Spring sum of the heartbeat monitoring request rewards for Cloud" framework[22] and deployed on an πΆπππ_ππππππ‘ and πππ_ππππ_ππππππ‘ respectively. Three independent VMware virtual machine. The responses are as following: configuration of each virtual machine is as follows: β’ ππππππ_π πππ£πππ. Returning the correct memory 1GB, number of processors 1, hard disk (SCSI) request result within the specified time; 20GB, operating system Ubuntu-16.04. β’ ππππππππ_π πππ£πππ. The microservice is Three business microservices include: degraded and in this experiment, it is designed that 1. Two client microservices, πΆπππ_ππππππ‘ and a default value is returned without actually πππ_ππππ_ππππππ‘ which are used to receive processing; requests for querying user information, and call the ππππ£ππππ_π’π ππ microservice to return the β’ π πππ£πππ_πππππ’ππ. Timing out or returning result to the requesting user. There is no difference error. Different reward value is designed between in business logic between the two microservices, πΆπππ_ππππππ‘ and the πππ_ππππ_ππππππ‘ to just to verify that the RL-SAMS has the ability to encouraging business-critical service assurance. In RL, a 2-layer πππ‘_π‘πππππ‘_ππππ‘ππ and guarantee core business priority, one of the two πππ‘_ππ£πππ’ππ‘πππ_ππππ‘ππ are constructed based on client microservices is selected as the core TensorFlow. πππ‘_ππ£πππ’ππ‘πππ_ππππ‘ππ updates the business microservice. parameters to πππ‘_π‘πππππ‘_ππππ‘ππ every 200 learning. 2. One ππππ£ππππ_π’π ππ microservice, The optimization of the neural network adopts responsible for background business processing. RMSprop optimizer. The learning rate πΎ is set to 0.9, The microservice receives user information query and the exploration strategy π is set to 0.8. The requests, and returns the query results. In order to capacity of the experience replay pool is 200, and simulate the performance bottleneck of each πππ‘_ππ£πππ’ππ‘πππ_ππππ‘ππ randomly selects 32 sets of microservice, set the ππππ£ππππ_π’π ππ microservice state transition records from the experience replay to execute the information query service after pool every 5 steps as training samples for learning, and sleeping for one second. simultaneously trains two behavioral decision We simulate high concurrent business requests evaluation strategies. based on the performance testing framework "Locust". In the experiment, we deploy three pressure 38 Figure 4: Experimental Scene Table 1 Comparative Experiment Scenarios Core_client Non_core_client Expectation Service risk scenarios Concurrent users Concurrent users Action R CC R NC R HJC-HCC-LNC 200 50 [acore = on, anon_core = off] 1 1 2 HJC-LCC-HNC 50 200 [acore = off, anon_core = on] 4 0 4 LJC-LCC-LNC 50 50 [acore = off, anon_core = off] 4 1 5 HJC-LCC-LNC 100 100 [acore = off, anon_core = on] 4 0 4 HJC-HCC-HNC 200 200 [acore = on, anon_core = off] 1 0 1 4.3. Comparative Experiment 4.3.1. Effectiveness Analysis Experiment takes the widely used Hystrix[23] as baseline method, and compares assurance effect between the Hytrix service circuit breaker mechanism and RL-SAMS in five service risk scenarios shown in Table 1. In addition, the service effect without any assurance method, named "Blank" in Figure 5, will be compared as another baseline to verify the successful implementation of Hystrix and RL-SAMS.Table 1 shows five different service risk scenarios and expected optimal decision action and average reward. The name Figure 5: Comparative Experiment of service risk scenarios is combined by three fields, The average reward value of heartbeat monitoring π1π½πΆβπ2πΆπΆβπ3ππΆ, corresponding different concurrent requests for three different service assurance methods pressure models. π1π½πΆ is joint concurrent field, in five service risk scenarios is shown in Figure 5. meaning if requests from both πΆπππ_ππππ’ππ π‘π and In all π»π½πΆ scenarios: (1) The βBlankβ method will πππ_ππππ_ππππ’ππ π‘π together will achieve cause the response time of all requests to time out. performance saturation. π2πΆπΆ and π3ππΆ is independent According to the reward function, the average reward concurrent fields, meaning if requests from value is -4. (2) Using the "Hystrix" method, whether it πΆπππ_ππππ’ππ π‘π or πππ_ππππ_ππππ’ππ π‘π respectively is high independent concurrency (π»πΆπΆ or π»ππΆ), will will achieve performance saturation. H means high activate circuit breakers of both two request concurrent pressure. L means low concurrent microservices. The average reward value is 1. Due to pressure. The preliminary experiments indicate that the existence of the retransmission mechanism in around 150 concurrent users can subject the "Hystrix", the average reward value fluctuates in the microservices in this experiment to high concurrency range of 1+0.2. (3) By comparing the effects of pressure. π»π½πΆβπ»πΆπΆβπΏππΆ, π»π½πΆβπΏπΆπΆβπ»ππΆ, π»π½πΆβπΏπΆπΆβπΏππΆ, it is verified that the decision model trained by the RL-SAMS will intelligently and selectively execute the degrade of the microservices according to source of pressure. In 39 π»π½πΆβπ»πΆπΆβπΏππΆ, since π»πΆπΆ causes π»π½πΆ, πΆπππ_ππππ’ππ π‘π is Non_core_client will receive the maximum reward impossible to assurance. So, it is best to degrade its values with [acore = off, anon_core = on] , and their service to assurance πππ_ππππ_ππππ’ππ π‘π ; In corresponding Q-values will also be the highest. π»π½πΆβπΏπΆπΆβπ»ππΆ, since π»ππΆ causes π»π½πΆ, it is best to degrade Therefore, as training progresses, the proportion of πππ_ππππ_ππππ’ππ π‘π to assurance πΆπππ_ππππ’ππ π‘π ; In [acore = off, anon_core = on] increases. After Period 6, π»π½πΆβπΏπΆπΆβπΏππΆ, πΆπππ_ππππ’ππ π‘π and πππ_ππππ_ππππ’ππ π‘π the proportion of [acore = off, anon_core = on] together cause π»π½ πΆ, it is also best to degrade and exceeds 90% and stabilizes, reaching 98% in Period 8. sacrifice πππ_ππππ_ππππ’ππ π‘π to assurance In other words, in service risk scenario HJC-LCC-HNC, RL- πΆπππ_ππππ’ππ π‘π , according to reward function. SAMS can, with a probability of 98% * 98% = 96%, The experiment verify that RL-SAMS can not only ensure the normal service of the Core_client by only effectively select the assurance action, but also degrading the concurrent requests of the distinguish the degraded objects according to the Non_core_client. The accuracy performance in other source of the service risk, so as to realize intelligent service risk scenarios is similar. elastic Microservice System assurance. 5. Conclusion This paper introduces an innovative decision-making method for microservice systems, leveraging reinforcement learning principles. It seamlessly incorporates the core concept of "enhancing performance through experiential learning" into service assurance processes within the microservices architecture. The flexible assurance capability targeting critical assurance components paves the way for novel approaches to intelligent service assurance and maintenance. Through a thorough analysis and validation via case experiments, RL-SAMS Figure 6: RL process in HJC-LCC-HNC demonstrates its prowess across various service risk scenarios, particularly excelling in its ability to intelligently differentiate key assurance elements and 4.3.2. Model Accuracy and Training proactively ensure the continuity of core business Process Analysis operations. While this paper has introduced reinforcement During the model training process, two Locust learning methods into service assurance activities modules for handling requests as microservices within microservice systems, there are still many continuously simulate concurrent request pressures aspects that require further research and exploration. with a random cycle duration of 1800 seconds. These include: Considering coverage of risk scenarios for five types of β’ Efficient Learning with Expanding State and services and RL state space control to shorten the Action Spaces: Reinforcement learning is learning cycle, the random range for concurrent users fundamentally about accumulating experiential is set to [0, 50, 100, 150, 200]. Logs record the state of knowledge to maximize rewards and minimize each step and the selection of safeguarding actions losses. As the state and action spaces grow, the during the model training process. Taking service risk cost of model training and learning also increases scenario HJC-LCC-HNC as an example, Figure 6 presents rapidly. It will be necessary to investigate and the proportion of assurance actions at each stage of improve methods for accumulating positive training.Due to the random nature of simulating experiences more efficiently and enhancing concurrent request pressures, HJC-LCC-HNC does not convergence rates. occur continuously. The number of cycles in Figure 6 β’ Decentralized Training and Centralized Learning: refers to the extraction of all assurance action selection The approach taken in this paper involves records when HJC-LCC-HNC occurs throughout the entire centralized training and learning. However, in training process. These records are sorted real-world scenarios where microservices come chronologically, and every 100 data points are used to from different providers, there may be obstacles calculate the proportion of assurance actions in a to sharing operational data. Addressing how to Period. The decision of whether to degrade Core_client limit data sharing while enabling decentralized and Non_core_client microservices to break their training for individual microservices and concurrent requests will be made. As shown in Figure centralized learning of experiences is a pressing 6, in Period 1, the intelligent agents of the two request challenge. microservices almost randomly decide whether to β’ Integration with Log Analysis and Risk Prediction: activate the degradation. Since both client Exploring how to combine reinforcement learning microservices experience low concurrent pressure, with log analysis and risk prediction to leverage they both exhibit a trend of not activating degradation prior knowledge and accelerate learning in Period 2, resulting in an increase in the proportion efficiency is an area worth investigating. of [acore = off, anon_core = off]. Under the influence of Integrating reinforcement learning with existing the "value decision" module, Core_client and systems for proactive risk management and 40 incident response can enhance the overall [12] Hongbing Wang, Qin Wu, Xin Chen, Qi Yu, Zibin effectiveness of service assurance activities. Zheng, and Athman Bouguettaya. Adaptive and These areas of research and improvement will dynamic service composition via multiagent contribute to the further development and refinement reinforcement learning. In 2014 IEEE of reinforcement learning methods in the context of international conference on web services, pages microservices and service assurance. 447β454. IEEE, 2014. [13] Jia Rao, Xiangping Bu, Kun Wang, and Cheng- Zhong Xu. Self-adaptive provisioning of References virtualized resources in cloud computing. In Proceedings of the ACM SIGMETRICS joint [1] Xiang zhou, Xin Peng, Tao Xie, Jun Sun, Chenjie Xu, international conference on Measurement and Chao Ji, and Wenyun Zhao. Poster: Benchmarking modeling of computer systems, pages 129β130, microservice systems for software engineering 2011. research. In 2018 IEEE/ACM 40th International [14] Dongsun Kim and Sooyong Park. Reinforcement Conference on Software Engineering: learning-based dynamic adaptation planning Companion (ICSE-Companion), pages 323β324. method for architecture-based selfmanaged IEEE, 2018. software. In 2009 ICSE Workshop on Software [2] Holger Knoche and Wilhelm Hasselbring. Using Engineering for Adaptive and Self-Managing microservices for legacy software modernization. Systems, pages 76β85. IEEE, 2009. IEEE Software, 35(3):44β49, 2018. [15] Mehdi Amoui, Mazeiar Salehie, Siavash Mirarab, [3] Florian Rademacher, Jonas Sorgalla, and Sabine and Ladan Tahvildari. Adaptive action selection Sachweh. Challenges of domain-driven in autonomic software using reinforcement microservice design: A model-driven perspective. learning. In Fourth International Conference on IEEE Software, 35(3):36β43, 2018. Autonomic and Autonomous Systems (ICASβ08), [4] Claus Pahl, Antonio Brogi, Jacopo Soldani, and pages 175β181. IEEE, 2008. Pooyan Jamshidi. Cloud container technologies: a [16] Tianqi Zhao, Wei Zhang, Haiyan Zhao, and Zhi Jin. state-of-the-art review. IEEE Transactions on A reinforcement learning-based framework for Cloud Computing, 7(3):677β692, 2017. the generation and evolution of adaptation rules. [5] Zhizhen Zhong, Jipu Li, Nan Hua, Gustavo B In 2017 IEEE International Conference on Figueiredo, Yanhe Li, Xiaoping Zheng, and Autonomic Computing (ICAC), pages 103β112. Biswanath Mukherjee. On qos-assured degraded IEEE, 2017. provisioning in service-differentiated multi- [17] Nabila Belhaj, Djamel BelaΓ―d, and Hamid Mukhtar. layer elastic optical networks. In 2016 IEEE Framework for building self-adaptive Global Communications Conference component applications based on reinforcement (GLOBECOM), pages 1β5. IEEE, 2016. learning. In 2018 IEEE International Conference [6] Alex S Santos, Andre K Horota, Zhizhen Zhong, on Services Computing (SCC), pages 17β24. IEEE, Juliana De Santi, Gustavo B Figueiredo, Massimo 2018. Tornatore, and Biswanath Mukherjee. An online [18] Han Nguyen Ho and Eunseok Lee. Model-based strategy for service degradation with reinforcement learning approach for planning in proportional qos in elastic optical networks. In self-adaptive software system. In Proceedings of 2018 IEEE International Conference on the 9th International Conference on Ubiquitous Communications (ICC), pages 1β6. IEEE, 2018. Information Management and Communication, [7] Lei Wang. Architecture-based reliability- pages 1β8, 2015. sensitive criticality measure for fault-tolerance [19] Gerald Tesauro, Nicholas K Jong, Rajarshi Das, cloud applications. IEEE Transactions on Parallel and Mohamed N Bennani. A hybrid and Distributed Systems, 30(11):2408β2421, reinforcement learning approach to autonomic 2019. resource allocation. In 2006 IEEE International [8] Chenhao Qu, Rodrigo N Calheiros, and Rajkumar Conference on Autonomic Computing, pages 65β Buyya. Auto-scaling web applications in clouds: 73. IEEE, 2006. A taxonomy and survey. ACM Computing Surveys [20] Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, (CSUR), 51(4):1β33, 2018. OpenAI Pieter Abbeel, and Igor Mordatch. Multi- [9] Nathan Cruz Coulson, Stelios Sotiriadis, and Nik agent actor-critic for mixed cooperative Bessis. Adaptive microservice scaling for elastic competitive environments. Advances in neural applications. IEEE Internet of Things Journal, information processing systems, 30, 2017. 7(5):4195β4202, 2020. [21] Jakob Foerster, Gregory Farquhar, Triantafyllos [10] Donatella Firmani, Francesco Leotta, and Afouras, Nantas Nardelli, and Shimon Whiteson. Massimo Mecella. On computing throttling rate Counterfactual multi-agent policy gradients. In limits in web apis through statistical inference. In Proceedings of the AAAI conference on artificial 2019 IEEE International Conference on Web intelligence, volume 32, 2018. Services (ICWS), pages 418β425. IEEE, 2019. [22] Cosmina I, Cosmina I. Spring microservices with [11] Hongbing Wang, Xiaojun Wang, Xingguo Hu, spring cloud[J]. Pivotal certified professional Xingzhi Zhang, and Mingzhu Gu. A multi-agent spring developer exam: a study guide, 2017: 435- reinforcement learning approach to dynamic 459. service composition. Information Sciences, [23] Molchanov H, Zhmaiev A. Circuit breaker in 363:96β119, 2016. systems based on microservices architecture[J]. 2018. 41