=Paper=
{{Paper
|id=Vol-1561/paper7
|storemode=property
|title=Performance Modeling of Cloud-based Web Systems to Estimate Response Time Distribution
|pdfUrl=https://ceur-ws.org/Vol-1561/paper7.pdf
|volume=Vol-1561
|dblpUrl=https://dblp.org/rec/conf/indiaSE/ChettiarDD16
}}
==Performance Modeling of Cloud-based Web Systems to Estimate Response Time Distribution==
<pdf width="1500px">https://ceur-ws.org/Vol-1561/paper7.pdf</pdf>
<pre>
       Performance Modeling of Cloud-based Web Systems
             to Estimate Response Time Distribution
             Dayle Chettiar                                   Arindam Das                                  Olivia Das
           Ryerson University                               York University                           Ryerson University
        Toronto, Ontario, Canada                        Toronto, Ontario, Canada                   Toronto, Ontario, Canada
    dayle.chettiar@ryerson.ca                          raj.das.ca@gmail.com                        odas@ee.ryerson.ca


ABSTRACT                                                              computational resources as required on a pay-per-use basis. This
Performance analysis of distributed systems with tiered software      relieves the application service providers from buying and
architecture has popularly entailed mean response time as the         maintaining data centers thereby reducing the operational cost.
commonly used metric. It must be noted however that as a metric,      However, such deployment poses challenges for automated
response-time percentile is of greater importance since it is more    performance management of these applications. This is because
desirable to reduce the variability of a system’s response time,      the management system now needs to decide on the amount of
rather than minimizing the mean response time. It is a fact that      computational resources (VMs) to be acquired or released
analytical approximations for response time distribution do exist.    dynamically for a change in system workload while ensuring that
However these analytical solutions capture only the steady-state      the SLOs are not violated. One way to make such a decision is to
behaviour (long-run behaviour) of the system. On the other hand,      use system performance models repeatedly to evaluate various
today’s tiered cloud-based systems are so complex that they never     what-if scenarios [3].
reach steady-state. Consequently, analyzing their transient           System performance models can be used to predict different
behaviour (short-term behaviour) becomes far more important           performance measures. The commonly used performance measure
than analyzing their steady-state behaviour. Regardless, it is a      has been the mean response time (RT). However, Broadwell [5]
difficult task to accomplish transient analysis analytically due to   has justified that response-time percentile as a metric is of greater
the enormous state space of such systems. In this work, we            importance than mean RT since it is more desirable to reduce the
analyze the transient behaviour of a 3-tier cloud-based system        variability of a system’s response time, rather than minimizing the
using discrete event simulation. We model the system as an open       mean response time.
queueing network and estimate the response time distribution
through the simulation. The results show that in a 3-tier system, a   In this work, we have used an open queueing network as our
configuration with large number of virtual machines (VMs) does        system performance model because we assume that the cloud-
not necessarily perform better than a configuration with smaller      based systems have large number of users and the users are
number of VMs. The results further show that different system         transient in their use of websites. Consequently, the web
configurations containing the same number of VMs yield different      application might behave more like an open system as suggested
performance depending on the replication level of software            by Harchol-Balter [9].
components running in different tiers. We demonstrate that our        For open queueing networks, computing the exact response time
model can serve as part of a decision support system associated       distribution analytically is difficult since it may have to deal with
with dynamic VM provisioning. Our model can be used to                infinite number of system states. Several approximations to
determine whether a given number of VMs can meet the desired          response time distribution do exist though [1, 4, 6, 10, 11].
service-level objectives (SLOs) specified in terms of response        However these analytical solutions are for long-run or steady-state
time percentile.                                                      behaviour of the system.
                                                                      Cloud-based web systems are so complex and dynamic that they
CCS Concepts                                                          never reach steady-state. Consequently, analyzing their transient
• Software and its engineering➝Software performance
                                                                      behaviour becomes far more important than analyzing their
Keywords                                                              steady-state behaviour [2]. In this work, we are interested in
                                                                      analyzing the transient behavior of the system. In spite of the
Performance; Discrete-event simulation;          Open     Queueing
                                                                      importance of transient analysis, it is a difficult task to achieve it
Network; Response time distribution.
                                                                      analytically due to the enormous state space of these systems. We
1. INTRODUCTION                                                       therefore resort to discrete event simulation for our analysis.
Increasingly, tiered web applications are getting deployed in         The goal of this paper is to develop a simulation model to analyze
clouds since cloud computing allows for dynamic scaling of            transient behavior of a 3-tier cloud-based web system. Our model
                                                                      predicts the response time distribution for a given system
                                                                      workload. We model the system as an open queueing network
                                                                      with only feed-forward arcs. The system workload is represented
                                                                      by the arrival rate, i.e. the number of job arrivals per unit time.
                                                                      Here, we assume that the software server at each tier is replicated
                                                                      into one or more copies and each copy runs on a separate virtual
Copyright © 2016 for the individual papers by the papers' authors.    machine (VM). Thus, the queueing network consists of a variable
Copying permitted for private and academic purposes. This volume is   number of VMs in three tiers. Although our simulation model
published and copyrighted by its editors.                             may be computationally expensive as compared to an analytical


Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016
                                                                                                                             41
counterpart, it is more general in terms of service time and inter-    servers (DB servers) run in the third tier (tier-3). The users access
arrival time distributions.                                            the application at the web servers. We assume that at any given
For our hypothetical 3-tier system, over- or under-utilization of      tier, one or more VMs can be provisioned, each running a single
VMs could occur if an application service provider didn’t              instance of a server relevant to that tier. For our modeling
purchase enough number of VMs from the cloud provider for              purposes, we assume that the workload is equally distributed
different tiers of the system. For example, when the number of         among the servers at any given tier. We indicate this in Figure 1
VMs purchased is too few to handle a given workload, most of the       using the phrase "balanced load".
requests will not be processed within the required response time       We assume that a service request will be processed exactly once
threshold. On the contrary, if too many VMs were purchased to          (in a server) at each tier. After completion of processing at the
handle relatively fewer number of requests, the VMs will be            third tier, the response is returned to the user. We further assume
under-utilized and this could lead to wastage of computational         that a request incurs a waiting time in the server’s queue before
resources. Hence, the challenge is to find a configuration for the     being processed, if the server is busy. The request then incurs a
system consisting of the appropriate number of VMs for each            service time for getting processed in the server.
stage to process the incoming requests that will ensure that a         The request is first sent to a Web server for processing. If the Web
required response-time percentile is within a given threshold.         server is busy then the request needs to wait in the server’s queue
The key contribution of our work is twofold. First our work            before getting processed.
presents a model—not only to predict mean response time but also       The request is then redirected to an App server present in the
to predict the response time distribution. Our model is general        second tier. If the App server is busy then the request needs to
enough to accommodate non-Markovian inter-arrival and service-         wait in the server’s queue before being processed.
time distributions. Second, our work demonstrates how our model
can serve as part of a decision support system to find the             Next, the request is redirected to a DB server present in the third
appropriate configuration that would ensure that a given SLO (in       tier. As before, the request waits in the server’s queue if the server
terms of response-time percentile) is met.                             is busy. Once the processing of the request is finished at the DB
                                                                       server, the response is sent back to the user.
The rest of the paper is organized as follows. Section 2 describes
the 3-tier software architecture. Section 3 describes the open         3. SYSTEM PERFORMANCE MODEL
queueing network performance model for this architecture.              The 3-tier software architecture of Figure 1 is modeled as an open
Section 4 analyzes the queueing model and discusses the results.       queueing network (see Figure 2). In Figure 2, each layer of
Finally section 5 concludes the work.                                  queueing stations represents the collection of servers (each server
                                                                       running on its own VM) supporting execution of requests at each
2. 3-TIER SOFTWARE ARCHITECTURE                                        tier. We assume that the replicas of servers in a given tier have
Figure 1 shows the software architecture of our hypothetical 3-tier    identical service time distribution and that the arrivals are split
cloud-based system. We analyze this architecture in this work.         uniformly among them. Let  denote the arrival rate of user
                                                                       requests at tier 1. If we have 3 server replicas in tier-1, then the
                                                                       arrival rate at each of that replica will be /3. We assume that 1
  users and their                                                      is the service rate of each Web server replica at tier-1, 2 denotes
  web browsers                                                         the service rate of each App server replica at tier-2, and 3 denotes
                                                                       the service rate of each DB Server replica at tier-3.
                                  balanced load                        As shown in Figure 2, the response time of a request is the time
                                                                       between the arrival of the request at a tier-1 server to the
                                                                       completion of the request at a tier-3 server. This time includes the
 Tier-1             Web Server                     Web Server          waiting times at the queues of the relevant servers at different tiers
                       VM                             VM               and the service times of those servers.
                                                                       Let RTi denote the response time of the i-th request. We assume
                                 balanced load                         that the SLO is specified in terms of response time percentile. An
                                                                       example SLO is “The response time should be less than or equal
 Tier-2       App Server                         App Server
                                                                       to 0.3 seconds with probability 0.95”. This means that 95% of
                                                                       the requests should complete within 0.3 seconds. Here, 0.3
                 VM                                 VM
                                                                       seconds is the response time threshold. We denote this threshold
                                                                       by .
                             balanced load
                                                                       We have simulated the open queueing network shown in Figure 2
                                                                       using a discrete event simulation framework called SimPy—a
 Tier-3       DB Server                          DB Server             Python based framework.
                 VM                                 VM                 Let N denote the total number of requests completed in one
                                                                       simulation run. During every simulation run, we record the
   Figure 1. Software architecture of our hypothetical 3-tier          response time of each request RTi. At the end of each simulation
   cloud-based system.                                                 run, we compute the number of requests whose response time is
                                                                       less than or equal to the threshold . For request i,
The architecture consists of three tiers. One or more Web servers       Let �� = 1 if { RTi ≤  }
run in the first tier (tier-1), one or more application servers (App
servers) run in the second tier (tier-2) and one or more database               = 0 otherwise


Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016
                                                                                                                             42
                                         Web Server                      App Server                   DB Server
                                         Tier-1                          Tier-2                       Tier-3

                                                                                                         


                                                                                                       


                                                                                                         


                                                                Response Time


                      Figure 2. The 3-tier software architecture of Figure 1 depicted as an open queueing network model.


                                                                              Table 1. Model Parameters
Let the random variable RT denote the response time. We estimate                          Parameter                     Parameter Value
the response time distribution as:
                                                                                         Arrival rate,                 100 requests/sec
                  �
                       ��                                                           Service rate at tier-1, 1           60 requests/sec
� �� ≤ � = ∑
               �=1                                                                  Service rate at tier-2, 2           70 requests/sec
Further, we estimate the mean response time as:                                     Service rate at tier-3, 3           80 requests/sec
              �
                      ���
  ��� �� = ∑
             �=1                                                              4.1 Finding Response-time Distribution and
                                                                              Percentiles
                                                                              To illustrate the predictions of response time distribution and
4. SIMULATION RESULTS                                                         percentiles, we consider the configuration (3,2,1). This
In this section, we analyze the queueing network of Figure 2 for              configuration reflects the scenario where the database layer
different configurations of VMs. We denote a VM configuration                 becomes the performance bottleneck owing to requirements of
as (C1, C2, C3) where C1, C2, C3 denote the number of VMs in tier-            transactional access and atomicity [7]. We analyze this
1, tier-2 and tier-3 respectively. We assume that a VM in a tier              configuration for five different time periods: 60sec, 120sec,
runs a server replica relevant to that tier. In sub-section 4.1, we           180sec, 240sec and 300sec. We assume the model parameters as
take an example VM configuration (3,2,1). Using the                           given in Table 1.
configuration, we estimate the response-time distribution and
analyze the system’s transient behavior for five different time-              A plot of the five resulting response time distributions is shown in
periods. We assume that the requests arrive according to a Poisson            Figure 3 for the configuration (3,2,1). In this configuration, there
process and the service times of the servers are exponentially                is only one server at tier-3 which processes 80 requests per
distributed. We further assume that that the system gets started              second. Since the arrival rate is 100 requests/sec, the requests get
with empty queues for every server. In sub-section 4.2, we                    queued up in the tier-3 server as time increases. Consequently, as
demonstrate how our model can be used to evaluate various what-               time passes by, more and more requests fail to meet a given
if scenarios in order to decide for a configuration that would meet           threshold. If we consider an SLO specifying that “The response
a given SLO.                                                                  time should be below 15 seconds with probability 95%”, then this
                                                                              configuration will meet the SLO for only one minute.
Table 1 shows the model parameters and their values for our                   Subsequently, it will not be able to meet the SLO any further.
simulation. We have adopted them from the work of Gullhav et al.
[8]. We assume the arrival rate to be 100 requests/sec. We further            Figure 4 summarizes some important statistics about the five
assume that the tier-3 servers are faster than tier-2 servers, and the        response time distributions. It shows the mean response time and
tier-2 servers are faster than tier-1 servers. We have assumed the            three different response time percentiles (90th, 95th and 99th) with
service rates accordingly (see Table 1).                                      the passage of time. We find that during a time period of 5


Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016
                                                                                                                             43
minutes, the configuration (3,2,1) will be able to meet a response   dynamically depending on the workload to be cost effective. Thus
time threshold of 57 seconds with probability 0.95.                  our goal is to buy a minimum number of VMs that will meet the
                                                                     SLO for a given workload.
                                                                     Since we are allowed a maximum of 3 VMs per tier, there could
                                                                     be 27 potential VM configurations that could be analyzed. We
                                                                     analyze the response time per request for processing 1000
                                                                     requests using the model parameter values provided in Table 1.
                                                                     We undertake this analysis for the response time threshold range
                                                                     of 0.1 to 0.7 seconds. We consider only those configurations for
                                                                     which the probability of meeting a given threshold (in the range
                                                                     0.1 to 0.7 secs) is 0.55 or higher. We find that out of the 27
                                                                     different configurations, only 8 of them meet this requirement.
                                                                     Figure 5 shows the response time distribution for the eight
                                                                     configurations (2,2,2), (2,2,3), (2,3,2), (2,3,3), (3,2,2), (3,2,3),
                                                                     (3,3,2) and (3,3,3). Let us consider an SLO to be “The response
                                                                     time should be below 0.3 seconds with probability 0.95”. The
                                                                     question we want to answer is “Which configuration is the best
                                                                     one to meet this SLO?” From the figure we see that the
   Figure 3. Transient Analysis of configuration (3,2,1):            configurations (2,3,2), (2,3,3), (3,2,2), (3,2,3), (3,3,2) and (3,3,3)
   Response time distribution for five different time                satisfy the SLO. However, among these configurations, (2,3,2)
   periods of 60, 120, 180, 240 and 300 seconds.                     and (3,2,2) have smaller number of VMs (7 VMs) in comparison
                                                                     to others. But (2,3,2) meets the response time threshold with
                                                                     probability 0.969 whereas (3,2,2) meets the response time
                                                                     threshold with probability 0.987. So the best configuration that
                                                                     meets the SLO would be (3,2,2).


   Figure 4. Mean Response Time (Mean RT), 90th, 95th
   and 99th percentile for RT plotted against the passage
   of time for the VM configuration (3,2,1). System starts
   with empty queues at every server.
                                                                         Figure 5. Response time distribution of eight different
                                                                         VM configurations. System starts with empty queues
                                                                         at every server. 1000 requests are processed.
4.2 What-if analysis in Decision Support
In this section, we illustrate how our model is used to evaluate
different what-if scenarios to decide for a VM configuration that    Table 2 summarizes some important statistics about the response
would meet a specified SLO.                                          time distributions of eight VM configurations. It shows the mean
                                                                     response time and 95th and 99th response time percentiles. This
We are aware that the cloud computing paradigm allows for the        table demonstrates that percentile measures are of greater
dynamic scaling of computational resources as required on a pay-     importance than mean values. As an example, let us consider an
per-use basis. Let us assume that our cost budget will allow us to   SLO specified in terms of mean RT as “The mean RT should be
buy a maximum of 3 VMs for each tier. We further assume that         below 0.1 seconds”. Eyeballing the Table 2, we find that two
every VM costs the same amount of dollars. Therefore, instead of     configurations (3,3,2) and (3,3,3) satisfies this SLO. On the
a static configuration, our aim is to change the configuration       contrary, we observe in Figure 5 that the response time of 0.1


Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016
                                                                                                                             44
seconds will be met only by about 60% of the requests for (3,3,2),     Next we demonstrate that a configuration with large number of
and only about 69% of the requests for (3,3,3). These percentages      VMs does not necessarily perform better than a configuration with
are way below than the 95% to 99% norm. This suggests that an          smaller number of VMs. Let us consider two VM configurations:
SLO specified in terms of response time percentiles is more            (2,3,3) with 8 VMs, and (3,2,2) with 7 VMs. With the assumed
reliable than the SLO given in terms of response time averages.        parameter values of Table 1, Figure 7 shows that the configuration
                                                                       (3,2,2) performs better for a response time threshold of 0.3
Table 2. Mean Response Time (RT), 95th and 99th percentile             seconds or below than the configuration (2,3,3). Such tier-based
for RT for eight configurations                                        analyses can help a service provider to refrain from unnecessarily
    VM            Mean RT          RT 95th           RT 99th           spending money to buy excess VMs since it may not be a
Configuration      (sec)          percentile        percentile         worthwhile investment.
                                    (sec)             (sec)
   (2,2,2)          0.185           0.38              0.46
   (2,2,3)          0.157           0.33              0.42
   (2,3,2)          0.136           0.28              0.34
   (2,3,3)          0.126           0.26              0.31
   (3,2,2)          0.116           0.23              0.31
   (3,2,3)          0.102           0.21              0.30
   (3,3,2)          0.097           0.20              0.28
   (3,3,3)          0.087           0.19              0.26


Next we demonstrate that different system configurations
containing the same number of VMs yield different performance
depending on the VM replication level in different tiers. Figure 6
shows the response time distribution for three VM configurations
(2,2,3), (2,3,2) and (3,2,2). All these configurations have 7 VMs.        Figure 7. Large number of VMs does not necessarily
With the assumed model parameter values of Table 1, this figure           lead to better performance: Comparison of VM
shows that the configuration (3,2,2) is better than (2,3,2) which in      configurations, one containing 7 VMs (2,3,3) versus
turn is better than (2,2,3) performance-wise. Likewise we                 the other containing 8 VMs (3,2,2).
compared sets of configurations, each set consisting of
configurations having same number of VMs—sets of 4, 5, 6 and 8
VMs. We conclude that VM replication in tier-1 yields better
result than replication in lower tiers among the configurations        5. CONCLUSIONS
with same number of VMs.                                               We have developed a simulation model to analyze transient
                                                                       behavior of a 3-tier cloud-based web system. Our model not only
                                                                       predicts mean response time but also the response time
                                                                       percentiles. Our model is general enough to accommodate non-
                                                                       Markovian inter-arrival and service-time distributions. We have
                                                                       demonstrated how our model can serve as part of a decision
                                                                       support for VM planning process. Given all the VM plans
                                                                       satisfying SLO requirements, we acknowledge that it is not a
                                                                       straight forward task to figure out the optimal plan. We
                                                                       recommend that research be undertaken to investigate whether our
                                                                       model can be used jointly with an optimization engine to select
                                                                       the best VM plan.


                                                                       6. ACKNOWLEDGMENTS
                                                                       We would like to thank NSERC Canada for their financial support
                                                                       through NSERC Discovery Grant of Olivia Das.


                                                                       7. REFERENCES
   Figure 6. An example comparison of configurations                   [1] Abate, J., Choudhury, G.L. and Whitt, W. 1996. Exponential
   with same number of VMs: Comparing three VM                             approximations for tail probabilities in queues II: sojourn
   configurations, each having 7 VMs.                                      time and workload. Operations Research. 44, 5 (1996), 758–
                                                                           763.


Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016
                                                                                                                             45
[2] Angius, A., Horváth, A. and Wolf, V. 2013. Approximate                Transactions on Autonomous and Adaptive Systems
    transient analysis of queuing networks by quasi product               (TAAS). 9, 3 (2014), 13.
    forms. Analytical and Stochastic Modeling Techniques and          [8] Gullhav, A.N., Nygreen, B. and Heegaard, P.E. 2013.
    Applications. Springer. 22–36.                                        Approximating the response time distribution of fault-
[3] Ardagna, D., Casale, G., Ciavotta, M., Pérez, J.F. and Wang,          tolerant multi-tier cloud services. Proceedings of the 2013
    W. 2014. Quality-of-service in cloud computing: modeling              IEEE/ACM 6th International Conference on Utility and
    techniques and their applications. Journal of Internet Services       Cloud Computing (2013), 287–291.
    and Applications. 5, 1 (2014), 1–17.                              [9] Harchol-Balter, M. 2013. Performance Modeling and Design
[4] Au-Yeung, S.W., Dingle, N.J. and Knottenbelt, W.J. 2004.              of Computer Systems: Queueing Theory in Action.
    Efficient approximation of response time densities and                Cambridge University Press.
    quantiles in stochastic models. ACM SIGSOFT Software              [10] Van Houdt, B. and Blondia, C. 2005. Approximated transient
    Engineering Notes (2004), 151–155.                                     queue length and waiting time distributions via steady state
[5] Broadwell, P.M. 2004. Response time as a performability                analysis. Stochastic Models. 21, 2-3 (2005), 725–744.
    metric for online services. Report No. UCB//CSD-04-1324.          [11] Van Velthoven, J., Van Houdt, B. and Blondia, C. 2005.
    Computer Science Division (EECS), University of                        Response time distribution in a D-MAP/PH/1 queue with
    California, Berkeley.                                                  general customer impatience. Stochastic Models. 21, 2-3
[6] Grottke, M., Apte, V., Trivedi, K.S. and Woolet, S. 2011.              (2005), 745–765.
    Response time distributions in networks of queues. Queueing
    Networks. Springer. 587–641.
[7] Grozev, N. and Buyya, R. 2014. Multi-cloud provisioning
    and load distribution for three-tier applications. ACM


Workshop on Software Architectures for Adaptive Autonomous Systems (SAAAS 2016) - colocated with ISEC 2016, Goa, India, Feb 18, 2016
                                                                                                                             46

</pre>