=Paper=
{{Paper
|id=Vol-1614/paper_116
|storemode=property
|title=Semi-Markov Availability Models for an Infrastructure as a Service Cloud with Multiple Pools
|pdfUrl=https://ceur-ws.org/Vol-1614/paper_116.pdf
|volume=Vol-1614
|authors=Oleg Ivanchenko,Vyacheslav Kharchenko
|dblpUrl=https://dblp.org/rec/conf/icteri/IvanchenkoK16
}}
==Semi-Markov Availability Models for an Infrastructure as a Service Cloud with Multiple Pools==
<pdf width="1500px">https://ceur-ws.org/Vol-1614/paper_116.pdf</pdf>
<pre>
    Semi-Markov Availability Models for an Infrastructure
           as a Service Cloud with Multiple Pools

                        Oleg Ivanchenko1, Vyacheslav Kharchenko2
                              1
                               University of Customs and Finance,
                      2/4 Dzerzhinskogo, 217 km, Dnepropetrovsk, Ukraine

                                     vmsu12@gmail.com
                2
                    National Aerospace University “Kharkiv Aviation Institute”
                            17 Chkalova St., 61070, Kharkiv, Ukraine

                                  v_s_kharchenko@ukr.net


       Abstract. Solving of tasks for Cloud Computing is impossible without main-
       taining of high availability level of Infrastructure as a Service (IaaS) Cloud.
       Several large IaaS Cloud providers try to solve this problem by means of in-
       creasing number of physical machines (PMs) in multiple pools. However, mi-
       grations of available PMs from one pool to another and also repairs, diagnostic
       instances of failed physical machines are quite complex task for modeling of
       availability for an IaaS Cloud. In this paper, we show how we can build Semi-
       Markov availability models with discrete states and how we can use it in order
       to determine availability level for the IaaS Cloud with three pools.


       Keywords. Infrastructure as a Service Cloud, three pools of physical machines,
       Semi-Markov availability models


       Key Terms. Infrastructure, Mathematical Model, Development, Characteristic


1      Introduction
Nowadays Cloud Computing is one of the most widely used services in an enterprise
environment. Therefore availability of cloud infrastructures is of paramount impor-
tance to improve quality of service (QoS) and development of cloud user’s possibili-
ties. Despite this fact, researchers of cloud infrastructure behavior, including avail-
ability and reliability analysis of respective components are still regarded as quite
complex scientific direction for different kinds of modeling.
   Large Infrastructure as a Service (IaaS) Cloud providers try to use multiple pools
of physical machines (PMs) in order to maintain normal operation of cloud’s compo-
nents on quite a long period of time. However, with a larger number of PMs number
of states for stochastic model of an IaaS Cloud also increase; the model ought to in-


ICTERI 2016, Kyiv, Ukraine, June 21-24, 2016
Copyright © 2016 by the paper authors
                                         - 350 -


clude a large number of parameters while still being tractable [1]. Some famous re-
searchers in order to perform availability analysis of large side clouds had built inter-
acting sub-models, before they started to build monolithic model for an IaaS Cloud
[2]. They built interacting sub-models on based Markov models and used stochastic
reward nets. At the same time these sub-models were built by them with using of their
own software package SHARPE [3]. Earlier in paper [4] authors tried to build a con-
tinuous time Markov chain (CTMC) availability model, for an example with two PMs
in each pool. Another researcher also proposed the availability model with the failure
of PMs, repairing process and employment of cold PMs in case of failure in running
machines [5].In this model PMs in each pool are modeled by a three-dimensional
CTMC too. In spite of the fact that various authors used stochastic approach based
Markov models to describe behavior of the physical machine pools, they couldn’t get
rigorous analytical expressions.
   Focus of this paper is to build Semi-Markov availability models for the IaaS Cloud
with three pools and different number of PMs in each pool.


2      Statement of the Researches Results

2.1    Metamodel for Availability Analysis of IaaS Cloud
Note that the architecture of an IaaS Cloud is not tied to a real cloud implementation
[6]. Suppose that researchers have used a simple cloud infrastructure with certain
number of PMs. To reduce power consumption, cooling and infrastructure costs, PMs
are grouped into three pools such as: hot, warm and cold pools. Assume that hot pool
consists from turned on and running PMs; warm pool contains turned on, although not
ready physical machines; cold pool consists from turned off PMs. Moreover this ar-
chitecture has certain number of virtual machines (VMs), which are deployed on
PMs. Deployment of VMs on base PMs allows to reduce power consumption and to
maintain enough high performance of the cloud implementation.
    In difference from other, proposed concept for maintaining of availability for a
cloud infrastructure bases on use of two additional systems, namely Technical State-
Control System (TSCS)and Resource Provisioning Decision Engine (RPDE) [7].Our
IaaS Cloud should be used TSCS, which is working in monitoring and diagnostic
modes. In this case, these modes as regarded as an organization form of constant con-
trol of the significant parameters that the determinate not only the PMs performabil-
ity, but also affect cloud infrastructure readiness to make effective intended use [8].
It's obviously, that monitoring and diagnostic sub-systems provide repair facilitated
by information which is needed to repair and migration of PMs from one pool to
other. As described in [6], RPDE tries to find a PM that can accept the job provision-
ing.
    Figure 1 shows the portions of the taxonomy metamodel for availability analysis of
IaaS Cloud. Researchers in order to deal with the complexities of metamodeling
should work in the paradigm of four models, such as scalability, performance, flexi-
bility (elasticity), power consumption. Each model ensures the overall metamodel by
input parameters, namely initial number of PMs for each pool (scalability model),
power consumption for each PM (power consumption model), management metrics
                                         - 351 -


values, search rates (flexibility model) and failure rates, repair rates, migration rates,
number of repair facilities (reliability model).In other words output parameters of
these models are input parameters for meta model. At the same time values of design
and temporal parameters of such models can be experimentally measured. The stages
of meta modeling are colorfully shown by this figure. According to the illustrated Fig.
1, we will try to create analytical models with considering states and stochastic chang-
ing of all times failures, repairs and migrations of PMs.


            Fig. 1. Taxonomy metamodel for availability analysis of IaaS Cloud

   On this basis, we will construct Semi-Markov model for availability analysis of an
IaaS Cloud with three pools. Therefore it is proposed to describe various options of
interactions of PMs at availability-model level.

2.2    Analytical Availability Models for an IaaS Cloud with Three Pools
Let's consider two interesting analytical models of an IaaS Cloud. Fig. 2 shows a
Semi-Markov (SM) model for availability analysis of the IaaS Cloud with three pools
(hot, warm and cold) and three PMs in each pool.
   In our modeling we use the following assumptions and limitations.

 Hot, warm, and cold pools contain identical PMs [9]. If a hot PM fails the failed
  PM is replaced by available (non-failed) PM from warm or cold pools, respec-
  tively.
 We assume that periodic technical state control (CTS) of hot PMs is operated dur-
  ing a time interval, which lasts  c .
 To analyze the availability of the IaaS Cloud we also assume that all times to fail-
  ure of all PMs are exponentially distributed. Typically, mean time to failure
  (MTTF) of warm PMs ( 1 w ) is higher than MTTF of hot PMs ( 1 h ) by a factor
                                         - 352 -


  of two to four [7]. At the same time MTTF of cold PMs is a very lower than 1 w .
  However, for process of SM modeling we will use only MTTF of hot PMs, consid-
  ering quite high reliability level of warm and cold PMs.
 Moreover in real situations providers haven’t enough time for repair of failed hot
  PMs, as well as they haven’t enough number of repair facilities. Therefore we also
  assume that all times to repair are not exponentially distributed. In this occasion we
  have preferred to use Erlang-k distribution, where k  2, 3 [10]. Parameter 1  is
  mean time to repair (MTTR) of a PM.
 Available PMs can migrate from warm and cold pools to hot pool. We also assume
  that all times to migration (migration delays) of PMs are exponentially distributed.
  For modeling we have used mean time to migration (MTTM) of PMs from warm
  ( 1  wh ) and cold ( 1  ch ) pools to hot pool.
 The migrations of PMs to hot pool are implemented when providers can search
  non-failed warm or cold PMs with mean time to searches (MTTSs) 1  w and 1  c .
 We consider that IaaS Cloud becomes unavailable when the SM model enters the
  state S15 .

   Suppose that this infrastructure is operated during a time interval t  0 ,T  and at
the initial moment t  0 the IaaS Cloud is ready for using (state S0 ). The transition
from state S0 to state S1 occurs at fixed nonrandom time  c  T , where parameter T
is operation time of IaaS Cloud between two periodic controls of technical state. The
state S1 is state of CTS. Note that the periodic CTS includes monitoring and diagnos-
tic operations of hot PMs and conduct by means of using Technical State Control
System. If third hot PM is available the SM model returns from state S1 to state S0 .
   Otherwise when the TSCS detects a failure, model goes to state S2 with rate h . In
state failure of the third hot PM, model tries to search non-failed warm PM (transition
from state S2 to state S3 ) with rate  w or cold PM (transition from state S2 to state
S4 ) with rate  c . When warm or cold PMs are available, model transforms from
state S3 to state S0 or from state S4 to state S0 respectively.
   If the warm and cold pools are empty, repair facility tries to recover the failed hot
PM, that is model goes from state S2 to state S0 with repair rate  .When recovery
the third failed PM is impossible, model transforms from state S2 to state S5 with
overall failure rate 3h . It means that next steps of modeling as regards states of S5 –
S9 for second hot PM and states of S10 – S12 for first hot PM are repeated. Note that
in this case we can maintain that transition from state S7 to state S10 and transition
from state S12 to state S15 are implemented with overall failure rates 2h and h re-
spectively. We also consider that the model will transition from state S12 to state S15
when the last hot PM fails.
                                               - 353 -


 Fig. 2. SM model for the availability analysis of the IaaS Cloud with three PMs in each pool

   To solve this task in the following we are inclined to use method of transformation
of the SM models into embedded Markov chains [10]. For this type of models the
transitions of process from state i to state j occur through unit time. Therefore the
transitions of this SM process are interpreted as follows. Since CTS performs within
fixed deterministic period of time T , consequently transition from state S0 to state S1
is given by:
                                               0 ,t  T ,
                                    Q01 t   
                                               1,t  T .
  The transition from state S1 to state S0 is then given by:
                                                  0 ,t   c ,
                                       Q10 t   
                                                  1,t   c .
  The other similar transitions can be got as follows:
                                                            , ,

                                                                    , ,


                                                      0 t  T
                              Q56 t   Q1011 t   
                                                      1 t  T
                                                      0 ,t   с ,
                              Q65 t   Q1110 t   
                                                      1,t   с .
   At the same time, probabilities of sudden failures of hot PMs at random times for
transitions from state S1 to state S2 , from state S6 to state S7 and from state S11 to
state S12 are given by:
                            Q12 t   Q67 t   Q1112 t   1  e ht .
                                                 - 354 -


   Implementations of transitions from state S2 to state S0 , from state S7 to state S5 ,
from state S12 to state S10 and from state S7 to state S0 , from state S12 to state S5 ,
from state S15 to state S10 depend from time to repair of the hot PMs. Therefore in
these cases, distribution functions of repair time are given by:

                        Q20 t  Q75 t  Q1210 t  1  1  t e  t ,
                            ︵ t︶ Q ︵ t︶ 1  1  t  t  e  t .
                                                              2
                  Q70︵t︶ Q125                            2 
                                   1510
                                               
               ︵︶ ︵︶             ︵︶
   For our SM availability model, we assume that distribution functions of search
time of non-failed warm and cold PMs respectively are given by:

                             Q23 t   Q78 t   Q1213 t   1  e  wt ,
                             Q24 t   Q79 t   Q1214 t   1  e  сt .

   Similarly, distribution functions of migration time for warm and cold PMs respec-
tively are given by:

                             Q30 t   Q85 t   Q1310 t   1  e  wht ,
                             Q40 t   Q95 t   Q1410 t   1  e  сht .

   Then steady-state availability [10] of the cloud can be computed as

                                        A   0   5   10 ,                            (1)

where  0 , 5 , 10 are steady-state probabilities for states S0 , S5 , S10 .
  On the other hand, the steady-state availability A (1) is given by [11]:
                                                   l
                                                   i
                                                   m


                                            A       ︵A︶
                                                       t ,
                                                  t 


where︵A︶ t – instantaneous availability of the cloud infrastructure.
  In the overall case steady-state probabilities of SM availability model are given by:

                                  t           t              t
                              0  0 ,  5   5 ,  10   10 ,
                                 U            U              U
                                                                                 
      U  t0  t1  p12 t 2  p23 t3  p24 t4   t5  t6  p67 t7  p78 t8  p79 t 9 
                               
                 t10  t11   t12  p1213 t13  p1214 t14  p1215 t15 ,       
                      p67 p710                      p12 p25
where                             ,                               ,
                 1  p1110  p1112      1  p65  p1112 p125  p67
               p1210  p1213  p1214  p1215 ,   p75  p78  p79 ,    p 1112 ,
             p12  p67  p1112  1  e  h c ,
                                                         - 355 -


                                                                             
           p23   w  1  t e  3 h  w  c   t dt , p24   c  1  t e  3h  w  c   t dt ,
                       0                                                      0
                                                 
                                     p25  h  1  t e  3 h  w  c   t dt ,
                                                 0

                           1 2 2 2
                              
                                                 
                                                            t 2 e 2h  w  c  2  t dt ,
                   p75       t  t  2 t  2  1  t 
                           20                                 2 
                                            
                       p78   w  1  t  1  t 
                                                           t 2 e 2h  w  c  2  t dt ,
                                   0                         2 
                                            
                       p79   c  1  t  1  t 
                                                           t 2 e 2h  w  c  2  t dt ,
                                   0                         2 
                              
         p710  h  1  t  1  t 
                                          t 2 e 2h  w  c  2  t dt , p  p  e h c ,
                                             2 
                                                                                    65        1110
                   0           
                         1
                                                      
                p1210    2 t  2 t 2  2 t  2  1  t         t 2 e h  w  c  2  t dt ,
                         20                                               2 
                                              
                       p1214   c  1  t  1  t 
                                                             t 2 e h  w  c  2  t dt ,
                                     0                        2 
                                              
                       p1215  h  1  t  1  t 
                                                             t 2 e h  w  c  2  t dt ,
                                     0                        2 
                               p125  1  p1210  p1213  p1214  p1215 ,
                                 3h   w   c  2                                  1                         1
    t0  t5  t10  T , t2                                 , t3  t8  t13                , t4  t9  t14        ,
                                3h   w   c      2
                                                                                      wh                       ch

                                      3
                             t15  , t1  t6  t11 
                                      
                                                               1
                                                               h
                                                                          
                                                                     1  e h c ,        
                                             
                             t7   1  t  1  t 
                                                          t 2 e 2h  w  c  2  t dt ,
                                  0                        2 
                                            
                           t12   1  t  1  t 
                                                         t 2 e h  w  c  2  t dt .
                                 0                        2 

   Plots depending of steady-state availability A from failure rates h of hot PMs
and operation time T (repair rates  are constant values) are shown in Fig. 3, Fig. 4.
The values of steady-state availability A are greatly increased by means of increasing
of repair rate  and reduction of failure rate h of hot PMs, as depicted in Fig. 3 and
Fig. 4.
                                           - 356 -


     Fig. 3. Depending of steady-state availability Ah ,T  for T  100 h,   0 ,5 1/h


    Fig. 4. Depending of steady-state availability Ah ,T  for T  100 h,   0 ,75 1/h

   Let's continue our researches by means of creation more scalable stochastic model
for IaaS Cloud. Because with a larger number of PMs in a data center, the overall
Cloud service availability increases, leading to lower cost of service downtime
[7].Therefore within a unified methodological approach we will try to create an
improved SM availability model of infrastructure with a larger number of PMs.
   Additional researches have shown that IaaS Cloud providers wish to increase
number of PMs in order to minimize downtime cost and damage business reputation
[4], [6], [7]. Perhaps inspired by using stochastic approaches for solution various
                                           - 357 -


serious tasks of determining the optimal PM capacity configuration of IaaS Cloud [6],
we have been proposing next SM availability model.
   Assume that our infrastructure contains similar three pools with ten PMs in each
pool. This SM model for availability analysis of the IaaS Cloud is shown in Fig. 5.
Also suppose that all times to failure of PMs are exponentially distributed and Erlang-
k distribution, where k  2 is general distribution for all times to repair. In spite of the
fact, that both models are SMs models, we have to take into consideration some
interesting features of their implementation.


   Fig. 5. SM model for the availability analysis of the IaaS Cloud with ten PMsineachpool

   Unlike first SM model, second SM availability model of the IaaS Cloud includes
modeling kernel from five states. The states S0 , S1 , S5 for second model (Fig. 5) are
the same as the first model (Fig. 2). But the difference between kernels of first model
and second model is that states S4 , S9 for first model are states of search of the cold
PM, whilst these states for second model are states of failure of the warm PMs. For
second model the following group assumptions can take place.

 Model contains hot, warm, and cold pools. Every pool consists of ten identical
  PMs[9]. If a hot PM fails the failed PM is replaced by available (non-failed) PM
  from warm or cold pools too.
 Upon failure of the warm PM, the failed PM is replaced by available (non-failed)
  PM from cold pool.
 We also assume that periodic technical state control (CTS) of hot PMs is operated
  during a time interval, which lasts  с .
 To analyze performance and availability of the IaaS Cloud we also assume that all
  times to failure of all hot and warm PMs are exponentially distributed.
 We also consider that all times to repair are not exponentially distributed.In this
  case we have used Erlang-k distribution, where k  2,3 [10]. Parameter 1  is
  mean time to repair (MTTR) of a PM.
 Cloud infrastructure becomes unavailable when the SM model enters thestate S50 .

   Therefore the transitions of modeling kernel for second SM model can be written
as follows:
                                                 - 358 -


     Q01 t   Q56 t   Q1011 t   Q1516 t   Q2021 t   Q2526 t   Q3031 t   Q3536 t  
                                                                0 ,t  T ,
                                   Q4041 t   Q4546 t   
                                                                1,t  T ,
   Q10 t   Q65 t   Q1110 t   Q1615 t   Q2120 t   Q2625 t   Q3130 t   Q3635 t  
                                                             0 ,t   c ,
                                 Q4140 t   Q4645 t   
                                                             1,t   c ,
For other functions we can write the following:

     Q12 t   Q67 t   Q1112 t   Q1617 t   Q2122 t   Q2627 t   Q3132 t   Q3637 t  
                                   Q4142 t   Q4647 t   1  e  wt ,
     Q13 t   Q68 t   Q1113 t   Q1618 t   Q2123 t   Q2628 t   Q3133 t   Q3638 t  
                                    Q4143 t   Q4648 t   1  e  c t ,
     Q14 t   Q69 t   Q1114 t   Q1619 t   Q2124 t   Q2629 t   Q3134 t   Q3639 t  
                                   Q4144 t   Q4649 t   1  e  wt ,
     Q20 t   Q75 t   Q1210 t   Q1715 t   Q2220 t   Q2725 t   Q3230 t   Q3735 t  
                                   Q4240 t   Q4745 t   1  e  wh t ,
     Q30 t   Q85 t   Q1310 t   Q1815 t   Q2320 t   Q2825 t   Q3330 t   Q3835 t  
                                   Q4340 t   Q4845 t   1  e  ch t ,
Q43 t   Q98 t   Q1413 t   Q1918 t   Q2423 t   Q2928 t   Q3433 t   Q3938 t  
                              Q4443 t   Q4948 t   1  e  c t ,
Q40 t   Q95 t   Q1410 t   Q1915 t   Q2420 t   Q2925 t   Q3430 t   Q3935 t  
                          Q4440 t   Q4945 t   1  1  t e  t ,
Q50 t   Q105 t   Q1510 t   Q2015 t   Q2520 t   Q3025 t   Q3530 t   Q4035 t  
                                                      
                      Q4450 t   Q5045 t   1   1  t 
                                                                 t 2 e  t ,
                                                                  2 
                                                Q15 t   1  e  1t ,                                    (2)
                                               Q610 t   1  e    2t
                                                                            ,                               (3)
                                              Q1115 t   1  e    3t
                                                                            ,                               (4)
                                              Q1620 t   1  e    4 t
                                                                            ,                               (5)
                                              Q2125 t   1  e    5 t
                                                                            ,                               (6)
                                             Q2630 t   1  e    6 t
                                                                            ,                               (7)
                                              Q3135 t   1  e    7 t
                                                                            ,                               (8)
                                              Q3640 t   1  e    8 t
                                                                            ,                               (9)
                                              Q4145 t   1  e    9 t
                                                                            ,                              (10)
                                              - 359 -


                                          Q4650 t   1  e  10t .                     (11)

    We define the failure rates for j  1,2 ,...,nh ( nh  10 ) PMs nodes[9], [10]:

                          j  nh  i 0 , i  0 ,1,...k (for k  nh  1 ),             (12)

where 0 – basic failure rate value for all PMs.
  By replacing the  j expression (12) to the s values in the equations (2), (3), …,
(11), we will be finished description of second model.
   As can be seen in Fig. 3 and Fig. 4 in case with three PMs in each pool, IaaS Cloud
has quite high of availability level. Results of modeling for second SM availability
model will get in the near future time. Overall feature for both SM models is identical
modeling kernels.


3       Conclusions Statement of the Researches Results
Clearly, proposed stochastic approach based SM models gives opportunity to perform
availability analysis of the Cloud Infrastructures with using of different modeling
kernels. Thus, the contributions of this paper are the following.
   If you wanted to make a deep availability and reliability analysis of the IaaS Cloud,
for example, when this infrastructure is one of the most important components of
Management System Critical Infrastructure, in particular during the accidents and
disasters or other negative events, such as sudden or hidden failures, you would be
able to use proposed SMs availability models. An additional advantage of these SMs
models is that researches can use rigorous analytical expressions from this paper in
order to determine availability and reliability values for the IaaS Cloud. Moreover you
can use this stochastic approach in order to choose optimal architectures among the
many various Cloud Infrastructures. Several optimization problems, including capac-
ity planning, management of resources of Cloud Infrastructures can be solved using
stochastic approach and SM models described in this paper.


References
 1. Khazaei, H., Misic, J., Misic, V.B., Mohammadi, N.B.: Availability analysis of cloud com-
    puting centers. In: Globecom 2012 – Communications Software, Services and Multimedia
    Symposium (GC12 CSSM) (2012)
 2. Longo, F., Ghosh, R., Naik, V.K., Trivedi, K.S.: A scalable availability model for infra-
    structure-as-a-service cloud. In: Proc. Int Conf on Dependable Systems and Networks, pp.
    335--346 (2011)
 3. Trivedi, K.S., and Sahner, R.: SHARPE at the age of twenty two. ACM Sigmetrics Per-
    formance Evaluation Review, 36(4), 52--57 (2009)
 4. Ghosh, R., Trivedi, K.S., Naik, V.K., Kim, D.S.: End-to-end performability analysis for in-
    frastructure-as-a-service cloud: An Interacting Stochastic Models Approach. In: IEEE
    PRDC, Tokyo (2010)
                                           - 360 -


 5. Khazaei, H: Performance modeling of cloud computing centers. Diss. The University of
    Manitoba (2013)
 6. Ghosh, R., Longo, F., Xia, R., Naik, V., Trivedi, K.: Stochastic model driven capacity
    planning for an infrastructure-as-a-service cloud. IEEE Transactions on Services Comput-
    ing, 7(4), 667--680 (2014)
 7. Ghosh, R.: Scalable stochastic models for cloud services. Diss. Duke of University (2012)
 8. Ivanchenko, O., Kharchenko, V., Skatkov, A.: Management of critical infrastructures
    Based on Technical Megastate. Int. J. “Information and Security”, 28(1), 37--51 (2012)
 9. Ghosh, R., Longo, F., Frattini, F., Russo, S., Trivedi, K.S.: Scalable analytics for IaaS
    cloud availability. IEEE Transactions on Cloud Computing, 2(1), 57--70 (2014)
10. Ivanchenko, O., Lovyagin, V., Maschenko, E., Skatkov, A., Shevchenko, V.: Distributed
    critical systems and infrastructures. National Aerospace University named after N.
    Zhukovsky “KhAI”, Kharkiv, Ukraine (2013)
11. Abbadi, I.M.: Toward Trustworthy Clouds’ Internet Scale Critical Infrastructure. In: Bao,
    F., Weng J. (eds.) Information Security Practice and Experience. LNCS-6672, pp. 71–82.
    Springer Verlag, Heidelberg (2011)

</pre>