=Paper=
{{Paper
|id=Vol-1561/paper7
|storemode=property
|title=Performance Modeling of Cloud-based Web Systems to Estimate Response Time Distribution
|pdfUrl=https://ceur-ws.org/Vol-1561/paper7.pdf
|volume=Vol-1561
|dblpUrl=https://dblp.org/rec/conf/indiaSE/ChettiarDD16
}}
==Performance Modeling of Cloud-based Web Systems to Estimate Response Time Distribution==
Performance Modeling of Cloud-based Web Systems to Estimate Response Time Distribution Dayle Chettiar Arindam Das Olivia Das Ryerson University York University Ryerson University Toronto, Ontario, Canada Toronto, Ontario, Canada Toronto, Ontario, Canada dayle.chettiar@ryerson.ca raj.das.ca@gmail.com odas@ee.ryerson.ca ABSTRACT computational resources as required on a pay-per-use basis. This Performance analysis of distributed systems with tiered software relieves the application service providers from buying and architecture has popularly entailed mean response time as the maintaining data centers thereby reducing the operational cost. commonly used metric. It must be noted however that as a metric, However, such deployment poses challenges for automated response-time percentile is of greater importance since it is more performance management of these applications. This is because desirable to reduce the variability of a system’s response time, the management system now needs to decide on the amount of rather than minimizing the mean response time. It is a fact that computational resources (VMs) to be acquired or released analytical approximations for response time distribution do exist. dynamically for a change in system workload while ensuring that However these analytical solutions capture only the steady-state the SLOs are not violated. One way to make such a decision is to behaviour (long-run behaviour) of the system. On the other hand, use system performance models repeatedly to evaluate various today’s tiered cloud-based systems are so complex that they never what-if scenarios [3]. reach steady-state. Consequently, analyzing their transient System performance models can be used to predict different behaviour (short-term behaviour) becomes far more important performance measures. The commonly used performance measure than analyzing their steady-state behaviour. Regardless, it is a has been the mean response time (RT). However, Broadwell [5] difficult task to accomplish transient analysis analytically due to has justified that response-time percentile as a metric is of greater the enormous state space of such systems. In this work, we importance than mean RT since it is more desirable to reduce the analyze the transient behaviour of a 3-tier cloud-based system variability of a system’s response time, rather than minimizing the using discrete event simulation. We model the system as an open mean response time. queueing network and estimate the response time distribution through the simulation. The results show that in a 3-tier system, a In this work, we have used an open queueing network as our configuration with large number of virtual machines (VMs) does system performance model because we assume that the cloud- not necessarily perform better than a configuration with smaller based systems have large number of users and the users are number of VMs. The results further show that different system transient in their use of websites. Consequently, the web configurations containing the same number of VMs yield different application might behave more like an open system as suggested performance depending on the replication level of software by Harchol-Balter [9]. components running in different tiers. We demonstrate that our For open queueing networks, computing the exact response time model can serve as part of a decision support system associated distribution analytically is difficult since it may have to deal with with dynamic VM provisioning. Our model can be used to infinite number of system states. Several approximations to determine whether a given number of VMs can meet the desired response time distribution do exist though [1, 4, 6, 10, 11]. service-level objectives (SLOs) specified in terms of response However these analytical solutions are for long-run or steady-state time percentile. behaviour of the system. Cloud-based web systems are so complex and dynamic that they CCS Concepts never reach steady-state. Consequently, analyzing their transient • Software and its engineering➝Software performance behaviour becomes far more important than analyzing their Keywords steady-state behaviour [2]. In this work, we are interested in analyzing the transient behavior of the system. In spite of the Performance; Discrete-event simulation; Open Queueing importance of transient analysis, it is a difficult task to achieve it Network; Response time distribution. analytically due to the enormous state space of these systems. We 1. INTRODUCTION therefore resort to discrete event simulation for our analysis. Increasingly, tiered web applications are getting deployed in The goal of this paper is to develop a simulation model to analyze clouds since cloud computing allows for dynamic scaling of transient behavior of a 3-tier cloud-based web system. Our model predicts the response time distribution for a given system workload. We model the system as an open queueing network with only feed-forward arcs. The system workload is represented by the arrival rate, i.e. the number of job arrivals per unit time. Here, we assume that the software server at each tier is replicated into one or more copies and each copy runs on a separate virtual Copyright © 2016 for the individual papers by the papers' authors. machine (VM). Thus, the queueing network consists of a variable Copying permitted for private and academic purposes. This volume is number of VMs in three tiers. Although our simulation model published and copyrighted by its editors. may be computationally expensive as compared to an analytical Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016 41 counterpart, it is more general in terms of service time and inter- servers (DB servers) run in the third tier (tier-3). The users access arrival time distributions. the application at the web servers. We assume that at any given For our hypothetical 3-tier system, over- or under-utilization of tier, one or more VMs can be provisioned, each running a single VMs could occur if an application service provider didn’t instance of a server relevant to that tier. For our modeling purchase enough number of VMs from the cloud provider for purposes, we assume that the workload is equally distributed different tiers of the system. For example, when the number of among the servers at any given tier. We indicate this in Figure 1 VMs purchased is too few to handle a given workload, most of the using the phrase "balanced load". requests will not be processed within the required response time We assume that a service request will be processed exactly once threshold. On the contrary, if too many VMs were purchased to (in a server) at each tier. After completion of processing at the handle relatively fewer number of requests, the VMs will be third tier, the response is returned to the user. We further assume under-utilized and this could lead to wastage of computational that a request incurs a waiting time in the server’s queue before resources. Hence, the challenge is to find a configuration for the being processed, if the server is busy. The request then incurs a system consisting of the appropriate number of VMs for each service time for getting processed in the server. stage to process the incoming requests that will ensure that a The request is first sent to a Web server for processing. If the Web required response-time percentile is within a given threshold. server is busy then the request needs to wait in the server’s queue The key contribution of our work is twofold. First our work before getting processed. presents a model—not only to predict mean response time but also The request is then redirected to an App server present in the to predict the response time distribution. Our model is general second tier. If the App server is busy then the request needs to enough to accommodate non-Markovian inter-arrival and service- wait in the server’s queue before being processed. time distributions. Second, our work demonstrates how our model can serve as part of a decision support system to find the Next, the request is redirected to a DB server present in the third appropriate configuration that would ensure that a given SLO (in tier. As before, the request waits in the server’s queue if the server terms of response-time percentile) is met. is busy. Once the processing of the request is finished at the DB server, the response is sent back to the user. The rest of the paper is organized as follows. Section 2 describes the 3-tier software architecture. Section 3 describes the open 3. SYSTEM PERFORMANCE MODEL queueing network performance model for this architecture. The 3-tier software architecture of Figure 1 is modeled as an open Section 4 analyzes the queueing model and discusses the results. queueing network (see Figure 2). In Figure 2, each layer of Finally section 5 concludes the work. queueing stations represents the collection of servers (each server running on its own VM) supporting execution of requests at each 2. 3-TIER SOFTWARE ARCHITECTURE tier. We assume that the replicas of servers in a given tier have Figure 1 shows the software architecture of our hypothetical 3-tier identical service time distribution and that the arrivals are split cloud-based system. We analyze this architecture in this work. uniformly among them. Let denote the arrival rate of user requests at tier 1. If we have 3 server replicas in tier-1, then the arrival rate at each of that replica will be /3. We assume that 1 users and their is the service rate of each Web server replica at tier-1, 2 denotes web browsers the service rate of each App server replica at tier-2, and 3 denotes the service rate of each DB Server replica at tier-3. balanced load As shown in Figure 2, the response time of a request is the time between the arrival of the request at a tier-1 server to the completion of the request at a tier-3 server. This time includes the Tier-1 Web Server Web Server waiting times at the queues of the relevant servers at different tiers VM VM and the service times of those servers. Let RTi denote the response time of the i-th request. We assume balanced load that the SLO is specified in terms of response time percentile. An example SLO is “The response time should be less than or equal Tier-2 App Server App Server to 0.3 seconds with probability 0.95”. This means that 95% of the requests should complete within 0.3 seconds. Here, 0.3 VM VM seconds is the response time threshold. We denote this threshold by . balanced load We have simulated the open queueing network shown in Figure 2 using a discrete event simulation framework called SimPy—a Tier-3 DB Server DB Server Python based framework. VM VM Let N denote the total number of requests completed in one simulation run. During every simulation run, we record the Figure 1. Software architecture of our hypothetical 3-tier response time of each request RTi. At the end of each simulation cloud-based system. run, we compute the number of requests whose response time is less than or equal to the threshold . For request i, The architecture consists of three tiers. One or more Web servers Let �� = 1 if { RTi ≤ } run in the first tier (tier-1), one or more application servers (App servers) run in the second tier (tier-2) and one or more database = 0 otherwise Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016 42 Web Server App Server DB Server Tier-1 Tier-2 Tier-3 Response Time Figure 2. The 3-tier software architecture of Figure 1 depicted as an open queueing network model. Table 1. Model Parameters Let the random variable RT denote the response time. We estimate Parameter Parameter Value the response time distribution as: Arrival rate, 100 requests/sec � �� Service rate at tier-1, 1 60 requests/sec � �� ≤ � = ∑ �=1 Service rate at tier-2, 2 70 requests/sec Further, we estimate the mean response time as: Service rate at tier-3, 3 80 requests/sec � ��� ��� �� = ∑ �=1 4.1 Finding Response-time Distribution and Percentiles To illustrate the predictions of response time distribution and 4. SIMULATION RESULTS percentiles, we consider the configuration (3,2,1). This In this section, we analyze the queueing network of Figure 2 for configuration reflects the scenario where the database layer different configurations of VMs. We denote a VM configuration becomes the performance bottleneck owing to requirements of as (C1, C2, C3) where C1, C2, C3 denote the number of VMs in tier- transactional access and atomicity [7]. We analyze this 1, tier-2 and tier-3 respectively. We assume that a VM in a tier configuration for five different time periods: 60sec, 120sec, runs a server replica relevant to that tier. In sub-section 4.1, we 180sec, 240sec and 300sec. We assume the model parameters as take an example VM configuration (3,2,1). Using the given in Table 1. configuration, we estimate the response-time distribution and analyze the system’s transient behavior for five different time- A plot of the five resulting response time distributions is shown in periods. We assume that the requests arrive according to a Poisson Figure 3 for the configuration (3,2,1). In this configuration, there process and the service times of the servers are exponentially is only one server at tier-3 which processes 80 requests per distributed. We further assume that that the system gets started second. Since the arrival rate is 100 requests/sec, the requests get with empty queues for every server. In sub-section 4.2, we queued up in the tier-3 server as time increases. Consequently, as demonstrate how our model can be used to evaluate various what- time passes by, more and more requests fail to meet a given if scenarios in order to decide for a configuration that would meet threshold. If we consider an SLO specifying that “The response a given SLO. time should be below 15 seconds with probability 95%”, then this configuration will meet the SLO for only one minute. Table 1 shows the model parameters and their values for our Subsequently, it will not be able to meet the SLO any further. simulation. We have adopted them from the work of Gullhav et al. [8]. We assume the arrival rate to be 100 requests/sec. We further Figure 4 summarizes some important statistics about the five assume that the tier-3 servers are faster than tier-2 servers, and the response time distributions. It shows the mean response time and tier-2 servers are faster than tier-1 servers. We have assumed the three different response time percentiles (90th, 95th and 99th) with service rates accordingly (see Table 1). the passage of time. We find that during a time period of 5 Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016 43 minutes, the configuration (3,2,1) will be able to meet a response dynamically depending on the workload to be cost effective. Thus time threshold of 57 seconds with probability 0.95. our goal is to buy a minimum number of VMs that will meet the SLO for a given workload. Since we are allowed a maximum of 3 VMs per tier, there could be 27 potential VM configurations that could be analyzed. We analyze the response time per request for processing 1000 requests using the model parameter values provided in Table 1. We undertake this analysis for the response time threshold range of 0.1 to 0.7 seconds. We consider only those configurations for which the probability of meeting a given threshold (in the range 0.1 to 0.7 secs) is 0.55 or higher. We find that out of the 27 different configurations, only 8 of them meet this requirement. Figure 5 shows the response time distribution for the eight configurations (2,2,2), (2,2,3), (2,3,2), (2,3,3), (3,2,2), (3,2,3), (3,3,2) and (3,3,3). Let us consider an SLO to be “The response time should be below 0.3 seconds with probability 0.95”. The question we want to answer is “Which configuration is the best one to meet this SLO?” From the figure we see that the Figure 3. Transient Analysis of configuration (3,2,1): configurations (2,3,2), (2,3,3), (3,2,2), (3,2,3), (3,3,2) and (3,3,3) Response time distribution for five different time satisfy the SLO. However, among these configurations, (2,3,2) periods of 60, 120, 180, 240 and 300 seconds. and (3,2,2) have smaller number of VMs (7 VMs) in comparison to others. But (2,3,2) meets the response time threshold with probability 0.969 whereas (3,2,2) meets the response time threshold with probability 0.987. So the best configuration that meets the SLO would be (3,2,2). Figure 4. Mean Response Time (Mean RT), 90th, 95th and 99th percentile for RT plotted against the passage of time for the VM configuration (3,2,1). System starts with empty queues at every server. Figure 5. Response time distribution of eight different VM configurations. System starts with empty queues at every server. 1000 requests are processed. 4.2 What-if analysis in Decision Support In this section, we illustrate how our model is used to evaluate different what-if scenarios to decide for a VM configuration that Table 2 summarizes some important statistics about the response would meet a specified SLO. time distributions of eight VM configurations. It shows the mean response time and 95th and 99th response time percentiles. This We are aware that the cloud computing paradigm allows for the table demonstrates that percentile measures are of greater dynamic scaling of computational resources as required on a pay- importance than mean values. As an example, let us consider an per-use basis. Let us assume that our cost budget will allow us to SLO specified in terms of mean RT as “The mean RT should be buy a maximum of 3 VMs for each tier. We further assume that below 0.1 seconds”. Eyeballing the Table 2, we find that two every VM costs the same amount of dollars. Therefore, instead of configurations (3,3,2) and (3,3,3) satisfies this SLO. On the a static configuration, our aim is to change the configuration contrary, we observe in Figure 5 that the response time of 0.1 Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016 44 seconds will be met only by about 60% of the requests for (3,3,2), Next we demonstrate that a configuration with large number of and only about 69% of the requests for (3,3,3). These percentages VMs does not necessarily perform better than a configuration with are way below than the 95% to 99% norm. This suggests that an smaller number of VMs. Let us consider two VM configurations: SLO specified in terms of response time percentiles is more (2,3,3) with 8 VMs, and (3,2,2) with 7 VMs. With the assumed reliable than the SLO given in terms of response time averages. parameter values of Table 1, Figure 7 shows that the configuration (3,2,2) performs better for a response time threshold of 0.3 Table 2. Mean Response Time (RT), 95th and 99th percentile seconds or below than the configuration (2,3,3). Such tier-based for RT for eight configurations analyses can help a service provider to refrain from unnecessarily VM Mean RT RT 95th RT 99th spending money to buy excess VMs since it may not be a Configuration (sec) percentile percentile worthwhile investment. (sec) (sec) (2,2,2) 0.185 0.38 0.46 (2,2,3) 0.157 0.33 0.42 (2,3,2) 0.136 0.28 0.34 (2,3,3) 0.126 0.26 0.31 (3,2,2) 0.116 0.23 0.31 (3,2,3) 0.102 0.21 0.30 (3,3,2) 0.097 0.20 0.28 (3,3,3) 0.087 0.19 0.26 Next we demonstrate that different system configurations containing the same number of VMs yield different performance depending on the VM replication level in different tiers. Figure 6 shows the response time distribution for three VM configurations (2,2,3), (2,3,2) and (3,2,2). All these configurations have 7 VMs. Figure 7. Large number of VMs does not necessarily With the assumed model parameter values of Table 1, this figure lead to better performance: Comparison of VM shows that the configuration (3,2,2) is better than (2,3,2) which in configurations, one containing 7 VMs (2,3,3) versus turn is better than (2,2,3) performance-wise. Likewise we the other containing 8 VMs (3,2,2). compared sets of configurations, each set consisting of configurations having same number of VMs—sets of 4, 5, 6 and 8 VMs. We conclude that VM replication in tier-1 yields better result than replication in lower tiers among the configurations 5. CONCLUSIONS with same number of VMs. We have developed a simulation model to analyze transient behavior of a 3-tier cloud-based web system. Our model not only predicts mean response time but also the response time percentiles. Our model is general enough to accommodate non- Markovian inter-arrival and service-time distributions. We have demonstrated how our model can serve as part of a decision support for VM planning process. Given all the VM plans satisfying SLO requirements, we acknowledge that it is not a straight forward task to figure out the optimal plan. We recommend that research be undertaken to investigate whether our model can be used jointly with an optimization engine to select the best VM plan. 6. ACKNOWLEDGMENTS We would like to thank NSERC Canada for their financial support through NSERC Discovery Grant of Olivia Das. 7. REFERENCES Figure 6. An example comparison of configurations [1] Abate, J., Choudhury, G.L. and Whitt, W. 1996. Exponential with same number of VMs: Comparing three VM approximations for tail probabilities in queues II: sojourn configurations, each having 7 VMs. time and workload. Operations Research. 44, 5 (1996), 758– 763. Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016 45 [2] Angius, A., Horváth, A. and Wolf, V. 2013. Approximate Transactions on Autonomous and Adaptive Systems transient analysis of queuing networks by quasi product (TAAS). 9, 3 (2014), 13. forms. Analytical and Stochastic Modeling Techniques and [8] Gullhav, A.N., Nygreen, B. and Heegaard, P.E. 2013. Applications. Springer. 22–36. Approximating the response time distribution of fault- [3] Ardagna, D., Casale, G., Ciavotta, M., Pérez, J.F. and Wang, tolerant multi-tier cloud services. Proceedings of the 2013 W. 2014. Quality-of-service in cloud computing: modeling IEEE/ACM 6th International Conference on Utility and techniques and their applications. Journal of Internet Services Cloud Computing (2013), 287–291. and Applications. 5, 1 (2014), 1–17. [9] Harchol-Balter, M. 2013. Performance Modeling and Design [4] Au-Yeung, S.W., Dingle, N.J. and Knottenbelt, W.J. 2004. of Computer Systems: Queueing Theory in Action. Efficient approximation of response time densities and Cambridge University Press. quantiles in stochastic models. ACM SIGSOFT Software [10] Van Houdt, B. and Blondia, C. 2005. Approximated transient Engineering Notes (2004), 151–155. queue length and waiting time distributions via steady state [5] Broadwell, P.M. 2004. Response time as a performability analysis. Stochastic Models. 21, 2-3 (2005), 725–744. metric for online services. Report No. UCB//CSD-04-1324. [11] Van Velthoven, J., Van Houdt, B. and Blondia, C. 2005. Computer Science Division (EECS), University of Response time distribution in a D-MAP/PH/1 queue with California, Berkeley. general customer impatience. Stochastic Models. 21, 2-3 [6] Grottke, M., Apte, V., Trivedi, K.S. and Woolet, S. 2011. (2005), 745–765. Response time distributions in networks of queues. Queueing Networks. Springer. 587–641. [7] Grozev, N. and Buyya, R. 2014. Multi-cloud provisioning and load distribution for three-tier applications. ACM Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016 46