<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reinforcement Learning-based Service Assurance of Microservice Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiaojian Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yangyang Zhang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wen Gu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qiao Duan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qingqing Ji</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beijing University of Technology</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>China Electronics Standardization Institute</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Chinese Academy of Sciences</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>34</fpage>
      <lpage>41</lpage>
      <abstract>
        <p>As microservices architecture has steadily emerged as the prevailing direction in software system design, the assurance of services within microservices systems has garnered increasing attention. The concept of intelligent service assurance within microservices systems offers a novel approach to addressing adaptation challenges in complex, risk-laden environments. This paper introduces a groundbreaking approach known as the Reinforcement Learning (RL) Based Service Assurance Method for Microservice Systems (RL-SAMS), which incorporates the fundamental RL principle of "improving performance through experience" into service assurance activities. Through the implementation of an intelligent service degradation mechanism, the continuity of services is ensured. Within the framework of our designed microservices system, two essential components are introduced: the Adapter Component (AC) and the RL Decision-making Component (RLDC). Each microservice is treated as an independent RL agent, resulting in the construction of a multi-agent RL decision-making architecture that balances "centralized learning and decentralized decision-making." This intelligent decisionmaking model undergoes training and learning, accumulating positive experiences through continuous trial and error. Experimental cases demonstrate that RL-SAMS outperforms the widely adopted Hystrix across various service risk scenarios, particularly excelling in intelligently critical service assurance.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Reinforcement learning</kwd>
        <kwd>Microservice system</kwd>
        <kwd>Intelligent service assurance 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In 2014, Martin Fowler formally introduced the
concept of "Microservices" through his blog post titled
"Microservices." This innovative approach to software
architecture involves breaking down a software
system into numerous small services, each operating
independently in its own process. When compared to
traditional monolithic systems, microservices
architectures offer several notable advantages,
including the ability to deploy independently,
effortless scalability, and decentralization. An
increasing number of network applications have made
the transition to microservices architecture, with
notable examples including Amazon, Netflix, Twitter,
SoundCloud, and PayPal. To give you an idea of the
scale, a single page on Amazon can trigger
approximately 100 to 150 microservice calls, while the
Netflix system manages a staggering 5 billion
microservice interactions on a daily basis [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It's
evident that microservice architecture has
progressively emerged as the predominant
developmental direction for software system
architecture [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
5th International Workshop on Experience with SQuaRE Series and
its Future Direction, December 04, 2023, Seoul, Korea
liuxj@bjut.edu.cn (X. Liu);zhangyy@cesi.cn (Y. Zhang)
      </p>
      <p>The autonomy and collaborative interaction
among microservices offer both advantages and, at the
same time, present significant service reliability risks.
On one hand, this autonomy entails separate
operations, maintenance, and independent
decisionmaking. This can lead to a focus on local interests at the
expense of global considerations, sometimes even
resulting in conflicting service assurance efforts
among microservices. On the other hand, the intricate
business interactions among microservices often
amplify "local failures" into "cascading failures,"
triggering an "avalanche effect." In such cases, problem
resolution becomes elusive as the root cause remains
elusive.</p>
      <p>The key to addressing these service assurance
challenges lies in establishing an effective group
decision-making mechanism within the microservices
system. This mechanism empowers each microservice
with the ability to comprehend the bigger picture and
make decisions for the entire system. This paper,
utilizing a reinforcement learning approach, explores
a service assurance decision-making method tailored
for microservices systems. Each microservice is
conceptualized as an independent reinforcement
learning agent. Through continuous interactions with
the service environment and the operational and
maintenance environment, the fundamental concept
of "enhancing performance through experiential
learning" is woven into the fabric of microservice
assurance. This equips the decision-making system
with the capacity to intelligently differentiate between
assurance targets and to flexibly provide assurance for
critical elements.</p>
      <p>Section 2 of the paper provides a summary of
related research, with a particular emphasis on the
current state of research in microservice assurance
technology and reinforcement learning methods. In
Section 3, we present an overview of the RL-SAMS
method along with an introduction to its key
components. Section 4 showcases the effectiveness of
the RL-SAMS method through pre-experimental
results. Finally, in Section 5, we summarize the
contributions of this paper and outline potential
directions for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        Technologies related to microservice assurance
include service degradation technology [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], service
fault tolerance technology [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], service elastic scaling
technology [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], service current limiting technology
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] etc. Santos et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proposed a strategy for online
service degradation based on quality of service (QoS),
which aims to minimize request congestion due to lack
of system resources; Combining architecture analysis
method and sensitivity analysis method, Wang et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
proposed a fault-tolerant strategy algorithm based on
reliability criticality measurement; Coulson et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
designed an automatic expansion system prototype of
microservice based on supervised learning; Firmani et
al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] put forward an API call rate limit selection
strategy in order to prevent unauthorized users from
achieving ultra-high SLA. Most of the existing research
on microservice assurance focus on the local situation
of their respective microservices. It is impossible to
comprehensively consider the guarantee of service
expectations from the perspective of users. One of the
key problems that need to be solved is how to establish
an assurance system of service for global
decisionmaking without breaking the original distributed and
independent framework of microservice.
      </p>
      <p>
        The existing research on reinforcement
learningenabled software adaptive control can be roughly
divided into: (1) Strategy generation and evolution
research. Wang et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] used reinforcement learning
method to solve the problem of dynamic service
configuration in the integrated adaptive system. Wang
et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] used reinforcement learning method,
combined with Markov model Gaussian process, to
establish a multi-agent game model, which aims to
solve the problem of self-adaptive combination of
services. Rao et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] proposed a distributed
learning mechanism to solve the problem of resource
allocation in the cloud environment. Dongsun et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
proposed a framework-based online planning method
for self-management, which enables the software
system to change and improve its plan through online
RL. Amoui et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] used RL in the planning process
to support action selection, and clarifies why, how and
when RL can benefit autonomous software systems. (2)
      </p>
      <p>
        System and environmental modelling research. Zhao
et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] proposed a learning framework that
integrates online and offline work based on
reinforcement learning and case sets. Belhaj et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
put forward a framework named "autonomic
container", which endows applications with run-time
adaptive action capability based on RL method. With
model-based reinforcement learning method, Ho HN
et al. 18] used Markov process to model the
environment state, which is applied for the planning
and continuous optimization of adaptive software
systems. Tesauro et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] utilized reinforcement
learning method to solve the problem of service
ranking.
      </p>
      <p>
        Regarding multi-agent RL, the representative
studies in recent years include MADDPG (Multi-Agent
Deep Deterministic Policy Gradient) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and COMA
(Counterfactual Multi-Agent actor-critic) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], both of
which are based on classic Actor-Critic architecture. At
present, multi-agent RL is one of the most focused and
widely researched directions in reinforcement
learning methods.
      </p>
      <p>In summarizing the current state of research, it's
clear that while various technologies and effective
measures have been developed for microservice
system assurance from different angles, most of them
primarily address localized issues and
decisionmaking within their own domains. As a result, they
often fall short in comprehensively addressing the
decision-making requirements for the overall system's
assurance. The challenge now lies in merging the
decision-making traits inherent to microservice
architecture with the valuable insights gained from the
remarkable research achievements in reinforcement
learning methods within the realm of adaptive control.
The objective is to empower each microservice with a
global perspective and intelligent decision-making
capabilities. This remains at the forefront of ongoing
research efforts.</p>
    </sec>
    <sec id="sec-3">
      <title>3. RL-SAMS Methodology</title>
      <p>The comprehensive architecture of RL-SAMS is
illustrated in Figure 1. Building upon the Microservice
System Component (MC), we've introduced the
Adapter Component (AC) and the RL Decision-making
Component (RLDC). Within the MC, we've enhanced
each microservice by incorporating the AC. This
enhancement includes the addition of a SMM and a
DCM, both of which provide interfaces for interaction
with the RLDC. To keep the illustration
straightforward, Figure 1 simplifies the
interdependence among multiple microservices. The
RLDC establishes a mechanism characterized by
"centralized learning and decentralized
decisionmaking."</p>
      <p>The fundamental concept of "enhancing
performance through experiential learning" is
embedded into microservice assurance. This
integration is achieved through the ongoing
interactive learning of multiple agents, taking into
account the effects of system operation and
maintenance, user expectations, and various other
state factors.</p>
      <sec id="sec-3-1">
        <title>3.1. Adapter Component</title>
        <p>The core function of the AC is to provide an Interactive
interface for the RLDC to perceive the running service
state of the microservice system, and to timely control
the configuration and implementation of various types
of assurance actions. The main functional modules
include a state monitoring module (SMM) and a
dynamic configuration module (DCM).</p>
        <p>1. State monitoring module (SMM). The content
of state monitoring depends on the actual
requirements, such as request volume, correct
rate, response time, etc., and can also be
specific business parameters, exception codes,
etc. Spring Cloud framework provides
"/metrics" endpoint, "/health" endpoint,
"/trace" endpoint and other interfaces for
regular microservice state monitoring. Section
4 Experiment will activate these endpoints to
achieve simple state monitoring to
demonstrate the effectiveness of RL-SAMS.
Customized SMMs and interfaces are also
suitable for the mechanism proposed in this
paper.
2. Dynamic configuration module (DCM). To
achieve runtime oriented dynamic assurance, it
is required the RLDC have the ability to
dynamically configure and execute assurance
action without restarting the microservice. We
establish a configuration center server to
centrally manage the configuration files of each
microservice, and the RLDC controls the
content of each microservice configuration file
according to the decision result, as well as the
action of microservice configuration update, so
as to realize the service assurance, as showed
in Figure 2.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. RL Decision-making Component</title>
        <p>In the RLDC, each microservice with decision-making
ability is</p>
        <p>modelled as an independent agent for
centralized training and decentralized execution. That
is, in training stage, the learning of each agent is
performed using globe states to consider strategies of
other agents; in execution stage, each agent only
makes decisions based on its own state perception. In
addition, an experience replay pool is set up, and the
experience replay mechanism is used to solve the
problems of correlation between training samples and
unfixed probability distribution of training samples.
Each state transition are recorded as state-action pair
and the corresponding reward and next state, as
follows:</p>
        <p>( 1,  2, … ,   ;  1,  2, … ,   ;  ;  1′ ,  2′, … ,  ′ )
Where


is
the
current
state
of
each
microservice.   is assurance action selected by each
microservice.</p>
        <p>is reward value, such as the degree of
satisfaction of various users’ expectations after each
assurance action is performed.  ′ is the next state of
each microservice. The framework and process of the
two</p>
        <p>microservices are shown in Figure 3. Each
microservice corresponds to an independent "action
decision"
module and a shared "value
decision"
module. There are two strategy networks with same
structure in one "action decision" module: Target
strategy  ′ and evaluation strategy   , which are
used to assurance decision making based on local
microservice state:
1.
local microservice  ′ as input, and outputs the</p>
        <p>Target strategy  ′ takes the next state of
assurance action  ′ corresponding to  ′:

 ′ =  ′(

′
|


)
The target strategy  ′ does not actively train, but
periodically updates it with the parameters of the
continuously learning</p>
        <p>evaluation trategy
thereby increasing the stability of the learning

process.
the
parameter
where</p>
        <p>Evaluation strategy   takes the current
state of local microservice 
as input, and
outputs the assurance action   corresponding to
  =   (  | 

The evaluation
strategy
is continuously
trained and learned based on the feedback of
Qparameter of</p>
        <p>_
value from "value decision" module.  
Although
decentralized
decision-making, each
microservice is closely related in business logic, so the
comprehensive evaluation. Therefore, compared with
MADDPG, which designs a critic module for each agent,
this paper designs a shared critic module (i.e., "value
decision" module) for all microservices, and outputs
the
corresponding</p>
        <p>Q-value
of each
evaluation network 
used to output the Q-value of each
microservice
assurance action based on the global state of the
microservice system:
corresponding ( 1′,  2′, … ,  ′ ) as the input, and
outputs the Q-value corresponding to the next state
takes the next state of the
( 1′ ,  2′, … ,  ′ )
and
the
of each microservice:
and
decision
Value</p>
        <p>target
. 
is
_
the</p>
        <p>to increase the stability of
the learning process.
 
_
)

)
)
  ,
of
state of the microservice system
and the corresponding
( 1,  2, … ,   )
( 1,  2, … ,   )
as input,
and outputs the Q-value corresponding to the
current state of each microservice value:

where
of
defined as:
4. Experiments
4.1. Experimental scene
 ( 
) =
∑( +  ∗  ′(

′
,  
′
|
−   (  ,   | 

where</p>
        <p>
          is the learning rate,  ∈ [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ] . The
larger the  , the more emphasis on long-term
rewards in the learning process. The evaluation
strategy of each
microservice  
updates the
parameters according to gradient descent (J1 and
J2 in Figure 3):

1
∇ ≈
∑ ∇  (  |
        </p>
        <p>
          ) ∙ ∇   (  ,   ,  
framework[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]
        </p>
        <sec id="sec-3-2-1">
          <title>Each</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Cloud"</title>
          <p>In order to verify the effectiveness of RL-SAMS, we
build a user-information-querying system consisting
five microservices with "VMware Workstation 16 Pro",
as shown in Figure 4. The system includes three
business
microservices, one configuration
center
microservice and one registry center microservice.</p>
          <p>microservice is developed based on "Spring
independent</p>
          <p>VMware
configuration of each virtual machine is as follows:
memory 1GB, number of processors 1, hard disk (SCSI)
20GB, operating system Ubuntu-16.04.
and
virtual
deployed</p>
          <p>on
machine.</p>
          <p>an
The
Three business microservices include:
1.</p>
          <p>Two client microservices, 
_
_
_
which
are
used to receive
requests for querying user information, and call
and
the</p>
          <p>microservice to return the
result to the requesting user. There is no difference
in business logic between the two microservices,
just to verify that the RL-SAMS has the ability to
guarantee core business priority, one of the two
client</p>
          <p>microservices is selected as the core
business microservice.
2.</p>
          <p>One
_
microservice,

responsible for background business processing.
The microservice receives user information query
requests, and returns the query results. In order to
simulate the</p>
          <p>performance bottleneck of each
microservice, set the 
_
simulation</p>
          <p>modules for three business function
microservices: _

, 
_
_
, and
_</p>
          <p>. We set three simulation modules with
different
pressure
cycles
to
simulate
different
pressure sources of the microservice system to verify
the core business priority assurance capability of
RLSAMS in the face of different pressure sources.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>4.2. Experimental Design</title>
        <p>The experiment takes whether the two request
microservices perform service degrade as action space,
 
∈ [on,off],  _
∈ [on,off], and compares
the average reward value of all heartbeat monitoring
requests for two client microservices within 15s after
each assurance action.  
= 
means that the
service degradation mechanism is enabled to ensure
sent
∑ CC</p>
        <p>_
and 
_
∑ NC</p>
        <p>_
= 
_
_
_
_
_
_
_
_
sum of the heartbeat monitoring request rewards for
request result within the specified time;
degraded and in this experiment, it is designed that
a default value is returned
without actually
processing;</p>
        <p>based on</p>
        <p>updates
every 200 learning.</p>
        <p>to
and
the</p>
        <sec id="sec-3-3-1">
          <title>TensorFlow.</title>
          <p>parameters to 
The
optimization
of the neural network
adopts
RMSprop optimizer. The learning rate  is set to 0.9,
and the exploration strategy 
is set to 0.8. The
capacity of the experience replay pool is 200, and
state transition records from the experience replay
pool every 5 steps as training samples for learning, and
simultaneously
trains
two
behavioral
RL,</p>
          <p>Core_client
Concurrent users</p>
          <p>Non_core_client</p>
          <p>Concurrent users</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>4.3. Comparative Experiment</title>
      </sec>
      <sec id="sec-3-5">
        <title>4.3.1. Effectiveness Analysis</title>
        <p>
          Experiment takes the widely used Hystrix[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] as
baseline method, and compares assurance effect
between the Hytrix service circuit breaker mechanism
and RL-SAMS in five service risk scenarios shown in
Table 1. In addition, the service effect without any
assurance method, named "Blank" in Figure 5, will be
compared as another baseline to verify the successful
implementation of Hystrix and RL-SAMS.Table 1 shows
five different service risk scenarios and expected
optimal decision action and average reward. The name
of service risk scenarios is combined by three fields,
 1 − 2 − 3 , corresponding different concurrent
pressure models.  1 is joint concurrent field,
meaning if requests from both  _ and
 _ _ together will achieve
performance saturation.  2 and  3 is independent
concurrent fields, meaning if requests from
 _ or  _ _ respectively
will achieve performance saturation. H means high
concurrent pressure. L means low concurrent
pressure. The preliminary experiments indicate that
around 150 concurrent users can subject the
microservices in this experiment to high concurrency
pressure.
  −  −  , since   causes   ,  _ is
impossible to assurance. So, it is best to degrade its
service to assurance  _ _ ; In
  −  −  , since   causes   , it is best to degrade
 _ _ to assurance  _ ; In
  −  −  ,  _ and  _ _
together cause   , it is also best to degrade and
sacrifice  _ _  to assurance
 _ , according to reward function.
        </p>
        <p>The experiment verify that RL-SAMS can not only
effectively select the assurance action, but also
distinguish the degraded objects according to the
source of the service risk, so as to realize intelligent
elastic Microservice System assurance.</p>
      </sec>
      <sec id="sec-3-6">
        <title>4.3.2. Model Accuracy and Training</title>
      </sec>
      <sec id="sec-3-7">
        <title>Process Analysis</title>
        <p>During the model training process, two Locust
modules for handling requests as microservices
continuously simulate concurrent request pressures
with a random cycle duration of 1800 seconds.
Considering coverage of risk scenarios for five types of
services and RL state space control to shorten the
learning cycle, the random range for concurrent users
is set to [0, 50, 100, 150, 200]. Logs record the state of
each step and the selection of safeguarding actions
during the model training process. Taking service risk
scenario HJC-LCC-HNC as an example, Figure 6 presents
the proportion of assurance actions at each stage of
training.Due to the random nature of simulating
concurrent request pressures, HJC-LCC-HNC does not
occur continuously. The number of cycles in Figure 6
refers to the extraction of all assurance action selection
records when HJC-LCC-HNC occurs throughout the entire
training process. These records are sorted
chronologically, and every 100 data points are used to
calculate the proportion of assurance actions in a
Period. The decision of whether to degrade Core_client
and Non_core_client microservices to break their
concurrent requests will be made. As shown in Figure
6, in Period 1, the intelligent agents of the two request
microservices almost randomly decide whether to
activate the degradation. Since both client
microservices experience low concurrent pressure,
they both exhibit a trend of not activating degradation
in Period 2, resulting in an increase in the proportion
of [acore = off, anon_core = off]. Under the influence of
the "value decision" module, Core_client and</p>
        <p>Non_core_client will receive the maximum reward
values with [acore = off, anon_core = on] , and their
corresponding Q-values will also be the highest.
Therefore, as training progresses, the proportion of
[acore = off, anon_core = on] increases. After Period 6,
the proportion of [acore = off, anon_core = on]
exceeds 90% and stabilizes, reaching 98% in Period 8.
In other words, in service risk scenario HJC-LCC-HNC,
RLSAMS can, with a probability of 98% * 98% = 96%,
ensure the normal service of the Core_client by only
degrading the concurrent requests of the
Non_core_client. The accuracy performance in other
service risk scenarios is similar.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <p>This paper introduces an innovative decision-making
method for microservice systems, leveraging
reinforcement learning principles. It seamlessly
incorporates the core concept of "enhancing
performance through experiential learning" into
service assurance processes within the microservices
architecture. The flexible assurance capability
targeting critical assurance components paves the way
for novel approaches to intelligent service assurance
and maintenance. Through a thorough analysis and
validation via case experiments, RL-SAMS
demonstrates its prowess across various service risk
scenarios, particularly excelling in its ability to
intelligently differentiate key assurance elements and
proactively ensure the continuity of core business
operations.</p>
      <p>While this paper has introduced reinforcement
learning methods into service assurance activities
within microservice systems, there are still many
aspects that require further research and exploration.
These include:
• Efficient Learning with Expanding State and
Action Spaces: Reinforcement learning is
fundamentally about accumulating experiential
knowledge to maximize rewards and minimize
losses. As the state and action spaces grow, the
cost of model training and learning also increases
rapidly. It will be necessary to investigate and
improve methods for accumulating positive
experiences more efficiently and enhancing
convergence rates.
• Decentralized Training and Centralized Learning:
The approach taken in this paper involves
centralized training and learning. However, in
real-world scenarios where microservices come
from different providers, there may be obstacles
to sharing operational data. Addressing how to
limit data sharing while enabling decentralized
training for individual microservices and
centralized learning of experiences is a pressing
challenge.
• Integration with Log Analysis and Risk Prediction:
Exploring how to combine reinforcement learning
with log analysis and risk prediction to leverage
prior knowledge and accelerate learning
efficiency is an area worth investigating.
Integrating reinforcement learning with existing
systems for proactive risk management and
incident response can enhance the overall
effectiveness of service assurance activities.</p>
      <p>These areas of research and improvement will
contribute to the further development and refinement
of reinforcement learning methods in the context of
microservices and service assurance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Xiang</surname>
            <given-names>zhou</given-names>
          </string-name>
          , Xin Peng, Tao Xie, Jun Sun, Chenjie Xu,
          <string-name>
            <given-names>Chao</given-names>
            <surname>Ji</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Wenyun</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>Poster: Benchmarking microservice systems for software engineering research</article-title>
          .
          <source>In 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion)</source>
          , pages
          <fpage>323</fpage>
          -
          <lpage>324</lpage>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Holger</given-names>
            <surname>Knoche</surname>
          </string-name>
          and
          <string-name>
            <given-names>Wilhelm</given-names>
            <surname>Hasselbring</surname>
          </string-name>
          .
          <article-title>Using microservices for legacy software modernization</article-title>
          .
          <source>IEEE Software</source>
          ,
          <volume>35</volume>
          (
          <issue>3</issue>
          ):
          <fpage>44</fpage>
          -
          <lpage>49</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Florian</given-names>
            <surname>Rademacher</surname>
          </string-name>
          , Jonas Sorgalla, and
          <string-name>
            <given-names>Sabine</given-names>
            <surname>Sachweh</surname>
          </string-name>
          .
          <article-title>Challenges of domain-driven microservice design: A model-driven perspective</article-title>
          .
          <source>IEEE Software</source>
          ,
          <volume>35</volume>
          (
          <issue>3</issue>
          ):
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Claus</given-names>
            <surname>Pahl</surname>
          </string-name>
          , Antonio Brogi, Jacopo Soldani, and
          <string-name>
            <given-names>Pooyan</given-names>
            <surname>Jamshidi</surname>
          </string-name>
          .
          <article-title>Cloud container technologies: a state-of-the-art review</article-title>
          .
          <source>IEEE Transactions on Cloud Computing</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <fpage>677</fpage>
          -
          <lpage>692</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Zhizhen</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jipu</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Nan</given-names>
            <surname>Hua</surname>
          </string-name>
          , Gustavo B Figueiredo,
          <string-name>
            <surname>Yanhe Li</surname>
            ,
            <given-names>Xiaoping</given-names>
          </string-name>
          <string-name>
            <surname>Zheng</surname>
            , and
            <given-names>Biswanath</given-names>
          </string-name>
          <string-name>
            <surname>Mukherjee</surname>
          </string-name>
          .
          <article-title>On qos-assured degraded provisioning in service-differentiated multilayer elastic optical networks</article-title>
          .
          <source>In 2016 IEEE Global Communications Conference (GLOBECOM)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . IEEE,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Alex</surname>
            <given-names>S Santos</given-names>
          </string-name>
          , Andre K Horota, Zhizhen Zhong, Juliana De Santi, Gustavo B Figueiredo,
          <string-name>
            <surname>Massimo Tornatore</surname>
            , and
            <given-names>Biswanath</given-names>
          </string-name>
          <string-name>
            <surname>Mukherjee</surname>
          </string-name>
          .
          <article-title>An online strategy for service degradation with proportional qos in elastic optical networks</article-title>
          .
          <source>In 2018 IEEE International Conference on Communications (ICC)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Lei</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Architecture-based reliabilitysensitive criticality measure for fault-tolerance cloud applications</article-title>
          .
          <source>IEEE Transactions on Parallel and Distributed Systems</source>
          ,
          <volume>30</volume>
          (
          <issue>11</issue>
          ):
          <fpage>2408</fpage>
          -
          <lpage>2421</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Chenhao</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigo N Calheiros</surname>
            , and
            <given-names>Rajkumar</given-names>
          </string-name>
          <string-name>
            <surname>Buyya</surname>
          </string-name>
          .
          <article-title>Auto-scaling web applications in clouds: A taxonomy and survey</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          ,
          <volume>51</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Nathan</given-names>
            <surname>Cruz</surname>
          </string-name>
          <string-name>
            <surname>Coulson</surname>
          </string-name>
          , Stelios Sotiriadis, and
          <string-name>
            <given-names>Nik</given-names>
            <surname>Bessis</surname>
          </string-name>
          .
          <article-title>Adaptive microservice scaling for elastic applications</article-title>
          .
          <source>IEEE Internet of Things Journal</source>
          ,
          <volume>7</volume>
          (
          <issue>5</issue>
          ):
          <fpage>4195</fpage>
          -
          <lpage>4202</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Donatella</surname>
            <given-names>Firmani</given-names>
          </string-name>
          , Francesco Leotta, and
          <string-name>
            <given-names>Massimo</given-names>
            <surname>Mecella</surname>
          </string-name>
          .
          <article-title>On computing throttling rate limits in web apis through statistical inference</article-title>
          .
          <source>In 2019 IEEE International Conference on Web Services (ICWS)</source>
          , pages
          <fpage>418</fpage>
          -
          <lpage>425</lpage>
          . IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Hongbing</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiaojun</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Xingguo Hu, Xingzhi Zhang, and
          <string-name>
            <given-names>Mingzhu</given-names>
            <surname>Gu</surname>
          </string-name>
          .
          <article-title>A multi-agent reinforcement learning approach to dynamic service composition</article-title>
          .
          <source>Information Sciences</source>
          ,
          <volume>363</volume>
          :
          <fpage>96</fpage>
          -
          <lpage>119</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Hongbing</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin Wu</surname>
            , Xin Chen, Qi Yu, Zibin Zheng, and
            <given-names>Athman</given-names>
          </string-name>
          <string-name>
            <surname>Bouguettaya</surname>
          </string-name>
          .
          <article-title>Adaptive and dynamic service composition via multiagent reinforcement learning</article-title>
          .
          <source>In 2014 IEEE international conference on web services</source>
          , pages
          <fpage>447</fpage>
          -
          <lpage>454</lpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Jia</surname>
            <given-names>Rao</given-names>
          </string-name>
          , Xiangping Bu,
          <string-name>
            <given-names>Kun</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>ChengZhong</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <article-title>Self-adaptive provisioning of virtualized resources in cloud computing</article-title>
          .
          <source>In Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems</source>
          , pages
          <fpage>129</fpage>
          -
          <lpage>130</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Dongsun</given-names>
            <surname>Kim</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sooyong</given-names>
            <surname>Park</surname>
          </string-name>
          .
          <article-title>Reinforcement learning-based dynamic adaptation planning method for architecture-based selfmanaged software</article-title>
          .
          <source>In 2009 ICSE Workshop on Software Engineering for Adaptive and Self-Managing Systems</source>
          , pages
          <fpage>76</fpage>
          -
          <lpage>85</lpage>
          . IEEE,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Mehdi</surname>
            <given-names>Amoui</given-names>
          </string-name>
          , Mazeiar Salehie, Siavash Mirarab, and
          <string-name>
            <given-names>Ladan</given-names>
            <surname>Tahvildari</surname>
          </string-name>
          .
          <article-title>Adaptive action selection in autonomic software using reinforcement learning</article-title>
          .
          <source>In Fourth International Conference on Autonomic and Autonomous Systems (ICAS'08)</source>
          , pages
          <fpage>175</fpage>
          -
          <lpage>181</lpage>
          . IEEE,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Tianqi</surname>
            <given-names>Zhao</given-names>
          </string-name>
          , Wei Zhang, Haiyan
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>and Zhi</given-names>
          </string-name>
          <string-name>
            <surname>Jin</surname>
          </string-name>
          .
          <article-title>A reinforcement learning-based framework for the generation and evolution of adaptation rules</article-title>
          .
          <source>In 2017 IEEE International Conference on Autonomic Computing (ICAC)</source>
          , pages
          <fpage>103</fpage>
          -
          <lpage>112</lpage>
          . IEEE,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Nabila</surname>
            <given-names>Belhaj</given-names>
          </string-name>
          , Djamel Belaïd, and
          <string-name>
            <given-names>Hamid</given-names>
            <surname>Mukhtar</surname>
          </string-name>
          .
          <article-title>Framework for building self-adaptive component applications based on reinforcement learning</article-title>
          .
          <source>In 2018 IEEE International Conference on Services Computing (SCC)</source>
          , pages
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Han</surname>
            <given-names>Nguyen Ho</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Eunseok</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Model-based reinforcement learning approach for planning in self-adaptive software system</article-title>
          .
          <source>In Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Gerald</surname>
            <given-names>Tesauro</given-names>
          </string-name>
          , Nicholas K Jong,
          <string-name>
            <surname>Rajarshi Das</surname>
          </string-name>
          , and
          <string-name>
            <surname>Mohamed N Bennani</surname>
          </string-name>
          .
          <article-title>A hybrid reinforcement learning approach to autonomic resource allocation</article-title>
          .
          <source>In 2006 IEEE International Conference on Autonomic Computing</source>
          , pages
          <fpage>65</fpage>
          -
          <lpage>73</lpage>
          . IEEE,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Ryan</surname>
            <given-names>Lowe</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yi I Wu</surname>
            , Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and
            <given-names>Igor</given-names>
          </string-name>
          <string-name>
            <surname>Mordatch</surname>
          </string-name>
          .
          <article-title>Multiagent actor-critic for mixed cooperative competitive environments</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>30</volume>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Jakob</surname>
            <given-names>Foerster</given-names>
          </string-name>
          , Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and
          <string-name>
            <given-names>Shimon</given-names>
            <surname>Whiteson</surname>
          </string-name>
          .
          <article-title>Counterfactual multi-agent policy gradients</article-title>
          .
          <source>In Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>32</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Cosmina</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cosmina</surname>
            <given-names>I.</given-names>
          </string-name>
          <article-title>Spring microservices with spring cloud[J]. Pivotal certified professional spring developer exam: a study guide,</article-title>
          <year>2017</year>
          :
          <fpage>435</fpage>
          -
          <lpage>459</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Molchanov</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhmaiev</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Circuit breaker in systems based on microservices architecture</article-title>
          [J].
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>