A Proactive Formal Approach For Microservice-based Applications Auto-Scaling Souheir Merkouche 1, Chafia Bouanaka 1 1 LIRE Laboratory, University of Constantine 2-Abdelhamid Mehri-, Constantine, Algeria Abstract Due to the emergence of cloud and containers, microservices has become widely adopted to develop large-scale applications, since deploying on the cloud provides an unlimited amount of resources to the developers. However, an uncontrolled usage of these resources leads to unnecessary costs or a non-performant system. Therefore, several researches have been carried out around an efficient resources auto-scaling, coming out with several policies. Most of the existing policies follow a reactive approach that relies on the current state of the system to adapt it. On the counterpart, proactive approaches are based on resource future usage estimation to adapt the system before it reaches a non-performant state, yet, complex and expensive methodologies are needed to ensure proactivity such as reinforcement learning. In this work, we propose a proactive approach of resource auto-scaling. We use the weak and strong dependencies concept to expect the future state of the system. To formally model the proposed approach, we combine high-level PNs and plausible PNs. The plausible PNs are suitable for decision-making, when several adaptation plans are available, they allow identifying a compromise plan when the auto-scaling concerns different qualities of the system. Keywords 1 Microservice architectures, auto-scaling, containers, formal methods, Petri Nets 1. Introduction Microservice architectures organizes the applications as a set of loosely-coupled components, evolving independently, where new versions can be added and work side by side with the old ones, hence reducing the applications downtime for maintenance and upgrades. Nowadays, microservices architectures are widely adopted, due to the emergence of cloud computing and containers technology, where the cloud provides the user with unlimited pay-per-use resources to run the application, and the containers grant a fast and easy deployment of the microservices. However, well-established policies are needed to manage the allocation/deallocation of resources to avoid supplement and unnecessary costs on one hand, and meet performance targets of the microservices on the other hand. Auto-scaling consists of increasing/decreasing the amount of the allocated resources as a response to the workload applied on the microservices for an efficient resource utilization, a cost reduction and sustainment of the application's quality of service (QoS). Many approaches were proposed to answer those needs, where most of them are reactive, i.e. monitor the microservices and adapt them according to the current workload [5] [12] [13]. However, a reactive auto-scaling remains a long process and proactive approaches are needed to avoid the violation of the system’s QoS. Existing proactive approaches [4] require expert systems and knowledges to achieve auto-scaling, thus limiting their applicability [18]. In this paper, we propose a proactive approach for auto-scaling, in which we consider the architectural level of the system and make use of the dependency relations between microservices to establish proactive auto-scaling policies. To this end, we adopt the architecture proposed in [14] for self-adaptive systems, this architecture provides a separation between RIF’22: The 11th Seminary of Computer Science Research at Feminine, March 10th, 2022, Constantine 2-Abdelhamid Mehri University, Algeria EMAIL: souheir.merkouche@univ-constantine2.dz (S. Merkouche), chafia.bouanaka@univ-constantine2.dz (C. Bouanaka); ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) the operating system and the adaptation logic. Unlike existing approaches, our work doesn’t include any complex methodologies, it is based on the strong and weak dependencies concept presented in [1] as the connected microservices to a given microservice, where strong dependencies must be provided before creating of the microservice instances, and the weak dependencies must be provided before the end of the given microservice deployment. We use this definition to determine the future state of the dependencies from the current state of a given microservice. Application auto-scaling is a fault-sensitive operation, a wrong scaling can lead to a loss of money (resources rented without being used), or also to a reduction of the quality of service offered by the application (lack of rented resources). Therefore, we adopt formal verification to ensure that the model is reliable and thus avoiding performance degradation due to a faulty resizing. To that end, we combine High-Level Petri Nets (HLPN) with Plausible Petri Nets (PPN) where the PPNs are used to compute a compromise plan between several adaptation plans. We highlight that concurrency in our resizing model concerns the qualities of microservices and not the microservices themselves. The paper is structured as follows: in section 2, we recall basic concepts of both HLPNs and PPNs. In section 3, we present some of the existing auto-scaling approaches. Afterwards, in section 4, we propose a proactive auto-scaling approach for microservice-based applications, we introduce the adopted architecture and we present the metrics monitored to achieve auto-scaling in microservice-based applications, then we propose the policies implemented to achieve this adaptation. In section 5 we detail the PN-based model of the approach and use the email pipeline processing as a case study to model it. Finally, section 6 rounds up the paper with a summary of the presented work and ongoing. 2. Background Petri nets have been initially proposed to model the behavior of a dynamic system with discrete events; they have then undergone several evolutions and variants. We present in what follows two types of Petri nets to be used in the present work: High level and Plausible Petri nets. 2.1. High level Petri nets A high-level petri net (HLPN for short) [6] can be defined in several ways. In our work, we adopt the definition proposed in [7]. Definition 1 [7]: a HLPN as a directed bipartite graph H= (P, T, A, D, Type, M0), where: ⚫ P and T are non-empty finite sets of elements called places and transitions, respectively, such that P ∩ T = ∅; ⚫ A⊆ (P×T) ∪ (T×P) is the set of arcs connecting places to transitions, and transitions to places; ⚫ D is a non-empty finite set of non-empty domains where each element of D is called type; ⚫ Type: P ∪ T → D is a function used to assign types to places and transitions; ⚫ M0 ∈ μPLACE is a multiset called initial marking of H, where μPLACE is the set of multisets over the set PLACE = {(p,g) : p ∈ P, g ∈ Type(p)}. In a HLPN, places are typed (see Figure 1), in order to define the collection of tokens that can be hold in each place. The collection of all the tokens associated to all the places represent the net marking. In addition, arcs can be associated with expressions, that contain constants, variables, or function images (e.g x, y, f(x)). To evaluate an arc expression, values are assigned to its variables, the resulting items must be inscribed in the type of the arc’s place. Finally, Boolean expressions are associated to transitions and are called guards (e.g x > y). These features make it possible to model several data and information about the system and its different variables, so it is possible to measure the qualities of the modeled system. A transition can fire only if it is enabled. A transition is enabled due to a net marking and a particular mode. A transition mode is defined by assigning values to all the variables in the transition’s guard and the annotations of the connected arcs. After assigning values, the input arcs expressions are evaluated, the results are tokens having the same type as the input places. The transition is enabled for the current mode, if the marking of its input places is equal or superior to the multiset of the resulting tokens. The firing rule is simple; whenever a transition fires, tokens according to the multiset of the evaluated expressions are subtracted from the input places, and tokens according to the evaluation of output arcs are added to the output places. Figure 1: A HLPN example [7]. 2.2. Plausible Petri nets Plausible Petri nets (PPNs) [8] [9] are a hybrid variant of PNs composed of two types of places and transitions, namely, symbolic and numerical, in order to describe both discrete and continuous behaviors of a system. In the symbolic subnet, discrete behavior is described using regular tokens, while in the numerical subnet, continuous or numerical behavior is described with tokens that carry states of information about states of variables [10], where a state of information about a given variable is the probability density function (PDF) of over [10]. In a self-adaptive system, PPNs can be used to compute the plausibility of the different possible adaption plans, so the most appropriate plan can be chosen. Definition 2 [11]: A PPN is defined as a 9-tuple = (P, T, F, W, D, , , , M0), where: ⚫ P is the set of places partitioned into two disjoint subsets, , and for numerical places, and symbolic places, respectively. ⚫ T is the set of transitions partitioned into two subsets, , and for numerical transitions, and symbolic transitions, respectively. Unlike places, a transition can belong to both and , in this case it is referred to as a mixed transition. ⚫ F is the set of arcs that connect transitions to places, and places to transitions. ⚫ W is the non-negative set of weights applied to arcs within F, and it is partitioned into two disjoint subsets, , and for arcs connected to numerical places, and those connected to symbolic places, respectively. ⚫ D is the set of switching delays for both symbolic and mixed transitions. ⚫ is the state space of a stochastic variable . ⚫ is the set of density functions associated to numerical places and transitions. ⚫ is the set of equations representing the dynamics of the state variable . ⚫ M0 is the initial marking of the net, which is given by two vectors and for numerical and symbolic places, respectively. A PPN can be divided into two subnets, namely, the symbolic subnet, and the numerical subnet. Unlike the ordinary evolution of the symbolic subnet, the numerical subnet evolution relies on an ad- hoc information flow based on conjunction and disjunction of states of information [8][9]. For a numerical transition, it can fire when the conjunction between all its input places’ states of information and the transition’s states of information is possible. For a mixed transition, both conditions of symbolic and numerical transitions must be satisfied. Firing transition’s effect for the numerical places is a state of information consisting of a disjunction of the previous state of information, and the information produced after firing the transition (conjunction of state of information within the transition and its input places). Figure 2 illustrates an example of a simple PPN (given by part (a)) and its firing rules (given by (b)). For more details on PPNs, the reader is referred to [11]. Figure 2: PPN example(a) and firing rules [11]. 3. Related Work With the expanding importance of auto-scaling, many approaches have been proposed for an efficient auto-scaling of microservices-based applications. Most of the presented works are reactive approaches; they adapt the system according to its current state. While other approaches are proactive; they adapt the system based on predicting its future state by analyzing historical data. Existing approaches generally use threshold-based policies (e.g., [19], [20], [21]), queuing theory (e.g., [22], [23],[24]) or machine learning techniques (e.g., [25], [26]). In a threshold-based solution, static thresholds are defined for the resources usage and adaptation actions are planned according to them, allowing the definition of simple but efficient auto-scaling policies. Authors in [21] use reinforcement learning for a dynamic adaptation of the thresholds. Threshold-based policies are also used in proactive approaches such as [19] in which authors predict the CPU utilization of the microservices and then adapt them using a threshold-based policy. In a queuing theory-based solution, each microservice is modelled as a queue of requests to predict its performance under different conditions of workload. Authors in [23] modelled a microservice-based application through a Layered Queuing Network for a dynamic scaling. However, Queuing theory solutions are considered limited due to the fact that they need to be recomputed when the workload changes. Most of proactive approaches (e.g., [25], [26]) are machine learning-based solutions using reinforcement learning (RL). RL allows making dynamic scaling decisions after learning phases using a trial-and-error approach. However, RL techniques suffer from the long learning process and require time to converge to an optimal policy. In the present work, we propose a proactive auto-scaling approach, with a threshold-based policy. The strong and weak dependencies concept [1] allowed us to reach proactivity without using complex and expensive methodologies (e.g., machine learning). Contrary of existing proactive approaches, our model doesn’t need any learning process, since it relays on the microservice’s dependencies to predict the future state of microservices. 4. A proactive auto-scaling approach Auto-scaling approaches can be classified as proactive and reactive approaches. In a reactive approach the auto-scaler increases/decreases the resources allocated to a given microservice relying on the current state of the workload, this is achieved by monitoring the system metrics such as input data rate and resources usage, despite the efficiency of this approach, it remains a long process due to the significant time of adaptation computing and scheduling. Conversely, in a proactive approach, the microservice is resized relying on a prediction of the future evolution of the workload and the system state using machine learning. Machine learning is a heavy and greedy process in terms of cost, also predictions can’t be one hundred percent accurate which may lead to a waste of resources and unnecessary cost. [15] and [16] are examples from the few works that have considered proactive approaches. In our work, we present a proactive approach, in which we consider the architectural level of the system to achieve proactivity, by considering each microservice dependencies when scaling it, i.e. we use the monitoring result of one microservice to adapt it due to the workload applied on it, and estimate the future workload of its dependencies, and prevent an adaptation for them. In this section, we present the adopted architecture to model our approach, then we present the supervised metrics and the auto-scaling policy. Figure 3: Layered self-adaptive system Architecture. 4.1. Modeling Architecture We mainly adopt the architecture presented in [14], and adapt it to meet our needs. As illustrated in figure 3, this architecture is composed of a base-level layer referring to the managed subsystem, and a high-level layer representing the managing subsystem. Additionally, an emulator and a set of API primitives are defined to connect the two layers. The logic behind this architecture is that the system process is modelled separately from its managing process, for that end, the system process, referred to by the managed subsystem is modelled in the base- level layer, while the managing logic is modelled in the high-level by means of a set of MAPE-K loops, where each loop models the adaptation process of one metric of the system. These loops use the emulator to obtain the current state of the system, and the API primitives to edit it. This feats perfectly our need to separate the application logic from the auto-scaling process. Additionally, it allows modelling the auto-scaling process independently from the application architecture, its composing microservices and distributed locations, while the managing subsystem architecture enables the modeling of multiple auto-scaling policies according to multiple objectives. 4.2. Supervised Metrics The auto-scaling process consists of monitoring the microservices and adapting them regarding quality violations, this is measurable through a set of metrics that are directly monitored by the Cloud infrastructure and particularly the containers. For each microservice, the MAPE loop’s monitor collects the following metrics: • CPU usage metric: represents the CPU usage rate of the microservice, obtained as the sum of the microservice’s replicas usage of the CPU divided by the sum of CPUs allocated to this microservice. • Memory usage metric: it is similar to the CPU usage rate expect that the memory usage is considered. • Input data rate metric: The input data rate is monitored at the microservice level, representing the rate of user’s requests to the considered microservice. 4.3. Auto-Scaling Policy To realize a proactive auto-scaling, we define a set of MAPE loops where each one is responsible for determining an adaptation plan that brings the system to its performant state while maintaining one of the supervised metrics. The MAPE loops share one monitor component that monitors the microservices metrics and when one of them is violated it triggers an adaptation process in the corresponding loop. After obtaining all the adaptation plans, an extended plan component is defined to make a compromise between them and execute it by one shared executor. Authors in [1] defined the concept of strong and weak dependencies of a microservice, where the strong dependencies are microservices that must be deployed before the creation of the microservice instance, and weak dependencies are the ones that must be fulfilled before the end of its deployment. Since the auto-scaling consists on increasing/decreasing the number of instances of a microservice, then new instances of the dependencies are needed to manage them. Therefore, auto-scaling a microservice can provide us with a prediction of the necessity of resizing microservices depending on it. The auto-scaling policy that will be adopted in our approach is as follow: • The Shared Monitor collects metrics of each microservice from the environment layer and verifies if an adaptation is needed. • When an adaptation is needed for a given microservice, i.e. when one of the microservice’s metrics is violated, the Analyze and Plan components of the corresponding MAPE define the adaptation plan from the violated metric viewpoint. • The Extended Plan computes a compromise adaptation plan if several metrics where violated, and computes a proactive adaptation plans for the strong dependencies, and a recommended adaptation plan for the weak dependencies. • The Execute element applies the adaptation plan on the microservice, and the proactive adaptation plans on the strong dependencies, then it notifies the weak dependencies by their recommended adaptation plans. Figure 4 illustrate this process. Figure 4: The Proactive Auto-Scaling Process. The total amount of CPU and Memory allocated to the application microservices always ranges in an interval of maximal and minimal thresholds (preset at the deployment phase), to avoid under-utilization/over- utilization of resources and thus to reduce cost and avoid latency respectively. The microservices input data rate is another supervised metric, with a maximal threshold representing the rate that the allocated resources can deal with in order to avoid the application’s latency, and a minimal threshold to supervise the efficiency of usage of the allocated resource. The adaptation plan: An adaptation plan corresponds to the number of replicas to be added or removed so that the microservice can manage the applied workload while preserving its QoS. When violations are perceived on one or multiple metrics, the corresponding loop augments/reduces the current number of replicas and computes the new value of the metric, until it ranges again in the defined interval. This new value is computed using the following equation: (1) where oldVmetric is the current value of the violated metric, k is the current number of replicas and k’ is the new number of replicas initially set to k. When achieving the adapted metric value, the loop returns the corresponding number of replicas as an adaptation plan, then the extended plan computes a compromise adaptation plan from all the plans as follow: The compromise adaptation plan (CAP): The defined metrics importance and priority degrees vary from one microservice to another, therefore, a weighting function is associated to each microservice that defines a percentage foreach supervised metric according to its importance degree for that microservice. After receiving all the adaptation plans of a given microservice, the Extended Plan uses the following function to define the compromise adaptation plan: (2) where Pi represents the adaptation plan proposed by the metric loop i, Qi the priority degree of this metric and nbQ the number of the metric loops. The Proactive adaptation plan: Foreach instance of a microservice, a number of instances of a strong dependency is needed, the Extended Plan uses this information to compute the proactive adaptation plan for each strong dependency as follow: (3) where Ni is the number of instances of the microservice i that is needed to run one instance of the microservice j. Figure 5: Microservice architecture of email processing pipeline [1]. The recommended adaptation plan: is computed in a similar manner as the proactive adaptation plan, but it is not applied on the weak dependencies but sent to them, specifically to the violated metric loops. When receiving it, each loop computes the metric value associated to the plan to verify that it doesn’t violate it, and if it does, it computes a new plan that preserves it. To formally apply this policy on our model, we use a PPN in the plan component of the loops to enable computing the defined values. In the next section, we present the formal model of this approach which is based on HLPNs and PPNs. For a clear modeling of the approach we apply it on the email pipeline processing as a case study. 5. A Formal model for proactive auto-scaling Microservice auto-scaling is a critical process in terms of cost and performance. An improper scaling can affect the system’s performance or cost. Therefore, a formal approach is needed to model the auto- scaling process and validate it before its implementation. Therefore, the approach presented in section 3 for a proactive auto-scaling of microservice-based applications is formally modelled; by means of HLPNs and PPNs, in order to construct a proactive compromise auto-scaling and illustrated through the example of the email pipeline processing. We describe in what follows the PN-based model and its constituents, then we apply it on the email pipeline processing system. 5.1. The Email Pipeline Processing System The email processing system as described in [17] is a system composed of 12 microservices working together to analyze an email as shown in Figure 5, first the Message Receiver receives the emails, then forwards them to the Message Parser which extracts data from the emails and forwards each part to the proper microservice to treat it. The treatment passes through other microservices to finally return all the results to the Message Analyzer (we refer the reader to [1] for a detailed description of each microservice). In such an application, the treatment of each user request needs a processing time, so it is necessary to supervise the response time of each microservice, by monitoring the input data rate. On the other hand, due to the high-workload applied on this kind of applications, and to avoid waste of resources and extra costs, thresholds have to be defined for the resources’ usage (CPU and Memory usage). In the rest of this section, we model this system by means of the proposed approach. 5.1.1. The base-level layer The base-level layer represents the managed subsystem that we will modelled by means of a HLPN. In the managed subsystem, we are not concerned with the microservices behavior, we actually model the microservices and the connections between them. The microservices are represented by the places of the HLPN model, and their deployments are associated to those places, while the connections are modeled by the transitions and arcs connecting the places. Places in the managed subsystem are complex places. Tokens encode information representing their current state (replicas number, the amount of associated resources, strong and weak dependencies), also known as the deployment of the corresponding microservice, and when an adaptation plan is executed, the deployment information is updated. Figure 6: HLPN-based model of email processing pipeline. The aim of this representation is to allow adding other managing subsystems to the model, that can work in parallel with the auto-scaling managing subsystem, for example a scheduler model can be added as another managing subsystem to manage allocating nodes for the new replicas and deploying them and can be warned of the adaptation made by the auto-scaling managing subsystem through the places states. What allow us of considering complex places is the encoding of this Petri net on the emulator, where places, and also all the other components of the Petri net, will be represented by tokens, then each place will be transformed to a complex token manipulated and edited by the managing subsystem. Figure 6 illustrates the HLPN model of the managed subsystem of the email pipeline processing system. 5.1.2. The environment layer This layer refers to the physical state of the system, it contains metrics collected from the containers running the system’s microservices instances. The environment layer also contains the set of available nodes where new instances can be deployed. By means of a HLPN composed of a set of places; each one representing a monitored metric of the system, and which is connected to the managing subsystem’s monitor, on the other hand those places are connected to all the nodes where the system microservices instances are deployed as shown in Figure 7. Figure 7: HLPN model of the environment layer. Transitions get_idr, get_CPU, get_Memory transfer tokens from places nodei to the places idr, CPU, Memory respectively. Transferred tokens are composed of an identifier of the corresponding microservice and a current value of the metric corresponding to one instance (idr for the input data rate, CPU for the current usage of the cpu, and Memory for the current usage of the memory). After collecting all the metrics, the next transitions will assemble tokens corresponding to instances of the same microservice together to obtain a single token containing the global value of the metric for one microservice. These tokens are fired to the managing subsystem monitor by the transition metrics_port. The place available_nodes contains tokens representing the set of nodes available for deploying new instances, each token is composed of the node identifier and its cpu and memory capacity, when a new instance is deployed on a certain node, the corresponding token is consumed by the transition deploy, whenever the instance is uninstalled, the node becomes available once again, the token is deposited in the place available_nodes by transition free. This part is not connected to the managing subsystem, since the auto-scaling concerns resize the microservices without deploying the new instances, a scheduler managing subsystem can make use of this place to find proper nodes for the new instances and schedule them. 5.1.3. The high-level layer This layer includes three components coopering together for an efficient auto-scaling of the system: 1. The emulator: An emulator is a HLPN that can encode and emulate any HLPN’s behavior. The emulator set of places coincides with the set of constituents of the HLPN (places, transitions, inputs, etc.) that will be encoded into the marking of the emulator [7]. The emulator initially contains a transition fire that manipulates the encoded net, whenever this transition fires it causes the encoded net to be executed or changed. The place places is connected to the transition fire, so that when it fires tokens representing the places are obtained by the managing subsystem’s monitor, in order to associate them with the metrics token obtained from the environment. The transition executes fires independently from the encoded Petri net, to edit the places’ tokens by the managing subsystem. In fact, tokens flow in the managed subsystem is not considered for the auto-scaling process, it is only concerned with the places state. Figure 8 illustrates our emulator model where places of the managed subsystem are encoded as complex tokens that represent the deployment of a microservice, they are composed of: • The microservice identifier. • The current number of replicas. • The amount of allocated cpu and memory. • The needed amount of cpu and memory for one replica. • The metrics thresholds (max_idr, min_idr, max_cpu, min_cpu, max_memory, min_memory). • The sets of strong and weak dependencies. Figure 8: The emulator HLPN model. 2. The API primitives The API primitives were defined by [7] as a set of transitions that allow sampling and editing the state and the structure of the managed subsystem by the managing subsystem. The API primitives are used to read and write the base-level, they allow adding new components to the net or removing existing ones (places, transitions, arcs, etc.). They represent sensors and actuators. For our model, we use the primitive get_Tokens, and define the primitives set_Tokens and edit_Tokens: (a) get_Tokens(x):=p(x) : the primitive is connected to the place place of the emulator, when it fires it obtains the token x from place. (b) set_Tokens(x):=p(x): also connected to place, it is used to put a token x in place. (c) edit_Tokens(x.i):=j: the primitive updates the field i of the token x by j. These primitives are used by the managing subsystem to read/edit tokens representing the places of the managed subsystem. 3. The Managing Subsystem The managing subsystem models the auto-scaling policy by means of a set of MAPE loops sharing a Monitor, an Extended Plan, and the Executer, where each loop analyzes and computes an adaptation plan for the auto-scaling of the microservices. For the example of the email processing pipeline, we considered three supervised metrics, namely the input data rate, the CPU and Memory usage. Therefore, we present a HLPN-PPN managing subsystem composed of three MAPE loops structured as follow: Monitor: obtains from the environments the metrics values tokens, and uses the API primitive get_Tokens to obtain the microservices tokens from the emulator place place, it associates each metric token to the corresponding microservice token, then verifies if the metric values are violated by comparing it to the microservice’s thresholds (M for maximal value of the metric, N for its minimal value). The violated metrics and the associated microservices are then transferred to the loops’ analyzer, while the others are returned to the emulator using the API primitive set_Tokens. Figure 9 illustrates this Petri net. Figure 9: The Monitor HLPN model. Figure 10: The metric Analyze-Plan HLPN and PPN model. Metric Analyze and Plan : obtains tokens composed of the metric value and the corresponding microservice, verifies the auto-scaling type (Up/Down), according to which it will weather augment or reduce the replicas number, then computes the new metric value by the plausible transition new_metric, using equation 1 presented in section 3, it repeats the process until the metric value is adapted and the plan is fired by the transition adapted metric. Figure 10 illustrates a generic model of this HLPN combined with PPN. Extended Plan: After computing the adaptation plans from the metric loops’ plans, transition combine_plans fires a token containing a vector combining all the plans, and the corresponding microservice, then the plausible transition compute_compromise computes the compromise plan using equation 2 presented in section 3. Meanwhile, transitions compute_Proactive and compute_Recommended obtain the strong and weak dependencies respectively, of the microservice from the place ms_dependencies and compute the corresponding adaptation plans using equation 3 from section 3, then fire the strong dependencies adaptation plans to places proactive_plans_*, and the weak dependencies adaptation plans to places recommended_plans_*. This HLPN combined with PPN is illustrated in Figure 11. Figure 11: The Extended Plan HLPN and PPN model. Execute: After computing all the plans, the execute apply the compromise plan using the API primitive edit_Tokens to edit the number of replicas, and return it to the emulator using set_Tokens. It does the same for the proactive plans, after obtaining the corresponding microservices tokens from the emulator using get_Tokens. For the recommended plans, the execute use the same process, where edit_Tokens is used to update the recommended plan on the token. This HLPN is illustrated in Figure 12. 6. Conclusion In this work, we proposed a formal auto-scaling model that proactively auto-scales microservices, but still uses reactivity to define the primer scaling plan. A layered architecture was adopted to ensure separation of concerns between the application’s process and the auto-scaling, it is then formally modeled using HLPNs combined with PPNs; a Petri Nets extension that allows us of computing compromise between several adaptation plans concerning the different supervising metrics of the application, enabling then auto-scaling from different point of view for the microservice-based applications. As future work, we aim to verify and validate the presented approach to prove the efficiency of auto- scaling considering the architectural level of the system. For this purpose, we need a PN modeling tool with the ability to model plausible functions. We also plan to model the scheduling and deployment processes to obtain a full orchestration tool for microservices-based applications. Figure 12: The Execute HLPN model. 7. References [1] Bravetti M., Giallorenzo S., Mauro J., Talevi I., Zavattaro G. (2019) Optimal and Automated Deployment for Microservices. In: Hähnle R., van der Aalst W. (eds) Fundamental Approaches to Software Engineering. FASE 2019. Lecture Notes in Computer Science, vol 11424. Springer, Cham. https://doi.org/10.1007/978-3-030-16722-6_21 [2] Q. Zhang, L. Liu, C. Pu, Q. Dou, L. Wu and W. Zhou, "A Comparative Study of Containers and Virtual Machines in Big Data Environment," 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), 2018, pp. 178-185, doi: 10.1109/CLOUD.2018.00030. [3] Lorido-Botran, T., Miguel-Alonso, J. & Lozano, J.A. A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments.J Grid Computing 12, 559–592 (2014). https://doi.org/10.1007/s10723-014-9314-7 [4] Rossi, Fabiana & Cardellini, Valeria & Lo Presti, Francesco. (2020). Hierarchical Scaling of Microservices in Kubernetes. 28-37. 10.1109/ACSOS49614.2020.00023. [5] Alexander K, Hanif M, Lee C, Kim E, Helal S (2020) Cost-aware orchestration of applications over heterogeneous clouds. PLoS ONE 15(2): e0228086. https://doi.org/10.1371/journal.pone.0228086. [6] R. Wolfgang. 1985. Petri Nets: An Introduction. Springer-Verlag New York, Inc., New York, NY, USA. [7] Camilli, M., Bellettini, C., & Capra, L. (2018). A high-level petri net-based formal model of distributed self-adaptive systems. Proceedings of the 12th European Conference on Software Architecture Companion Proceedings - ECSA ’18. [8] Chiachío, M., Chiachío, J., Prescott, D., & Andrews, J. (2016). An information theoretic approach for knowledge representation using Petri nets. In Proceedings of the Future Technologies Conference 2016, San Francisco, 6–7 December, pp. 165–172. IEEE. [9] Chiachío, M., Chiachío, J., Prescott, D., & Andrews, J. (2018). A new paradigm for uncertain knowledge representation by Plausible Petri nets. Information Sciences, 453, 323–345. [10] Rus, G., Chiachío, J., & Chiachío, M. (2016). Logical inference for inverse problems. Inverse Problems in Science and Engineering, 24(3), 448–464. [11] Chiachío, M., Chiachío, J., Prescott, D., & Andrews, J. (2018). Plausible Petri nets as self‐adaptive expert systems: A tool for infrastructure asset monitoring. Computer-Aided Civil and Infrastructure Engineering. [12] Tsagkaropoulos, A., Verginadis, Y., Papageorgiou, N. et al. Severity: a QoS- aware approach to cloud application elasticity. J Cloud Comp 10, 45 (2021). https://doi.org/10.1186/s13677-021- 00255-5 [13] A. A. D. P. Souza and M. A. S. Netto, "Using Application Data for SLA-Aware Auto-scaling in Cloud Environments,"2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2015, pp. 252-255, doi: 10.1109/MASCOTS.2015.15. [14] Rogério de Lemos, Holger Giese, Hausi A. Müller, and Mary Shaw (Eds.). 2013. Software Engineering for Self-Adaptive Systems II - International Seminar, Dagstuhl Castle, Germany, October 24-29, 2010 Revised Selected and Invited Papers. Lecture Notes in Computer Science, Vol. 7475. [15] A. Bauer, V. Lesch, L. Versluis, A. Ilyushkin, N. Herbst, and S. Kounev, “Chamulteon: Coordinated auto-scaling of micro-services,” in Proc. of IEEE ICDCS ’19, 2019, pp. 2015–2025. [16] M. Imdoukh, I. Ahmad, and M. Alfailakawi, “Machine learning based auto-scaling for containerized applications,” Neural Computing and Applications, vol. 32, pp. 9745–9760, 2019. [17] Ken Fromm. Thinking Serverless! How New Approaches Address Modern Data Processing Needs. https://read.acloud.guru/thinking-serverless-how-new-approaches-address-modern-data- processing-needs-part-1-af6a158a3af1. Accessed on May, 2020. [18] M. Wajahat, A. Gandhi, A. Karve and A. Kochut, "Using machine learning for black-box autoscaling," 2016 Seventh International Green and Sustainable Computing Conference (IGSC), 2016, pp. 1-8, doi: 10.1109/IGCC.2016.7892598. [19] A.Bauer,V.Lesch,L.Versluis,A.Ilyushkin,N.Herbst,andS.Kounev, “Chamulteon: Coordinated auto-scaling of micro-services,” in Proc. of IEEE ICDCS ’19, 2019, pp. 2015–2025. [20] H. Khazaei, R. Ravichandiran, B. Park, H. Bannazadeh, A. Tizghadam, and A. Leon-Garcia, “Elascale: Autoscaling and monitoring as a ser- vice,” in Proc. of CASCON ’17, 2017, pp. 234– 240. [21] E. Di Nitto, L. Florio, and D. A. Tamburri, “Autonomic decentralized microservices: The Gru approach and its evaluation,” in Microservices: Science and Engineering. Cham: Springer, 2020, pp. 209–248. [22] Y. Mao, J. Oak, A. Pompili, D. Beer, T. Han, and P. Hu, “DRAPS: Dynamic and resource-aware placement scheme for Docker containers in a heterogeneous cluster,” in Proc. of IEEE IPCCC ’17, 2017, pp. 1–8. [23] A. U. Gias, G. Casale, and M. Woodside, “ATOM: Model-driven autoscaling for microservices,” in Proc. of IEEE ICDCS ’19, 2019, pp. 1994–2004. [24] F. Rossi, V. Cardellini and F. L. Presti, "Hierarchical Scaling of Microservices in Kubernetes," 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS), 2020, pp. 28-37, doi: 10.1109/ACSOS49614.2020.00023. [25] S. Horovitz and Y. Arian, “Effificient cloud auto-scaling with SLA objective using Q-learning,” in Proc. of IEEE FiCloud ’18, Aug 2018, pp. 85–92. [26] F. Rossi, M. Nardelli, and V. Cardellini, “Horizontal and vertical scaling of container-based applications using reinforcement learning,” in Proc. of IEEE CLOUD ’19, July 2019, pp. 329–338.