=Paper=
{{Paper
|id=Vol-1844/10000634
|storemode=property
|title=Semi-Markov Availability Model for Infrastructure as a Service Cloud
Considering Hidden Failures of Physical Machines
|pdfUrl=https://ceur-ws.org/Vol-1844/10000634.pdf
|volume=Vol-1844
|authors=Oleg Ivanchenko,Vyacheslav Kharchenko,Yurij Ponochovny,Ivan Blindyuk,Oksana Smoktii
|dblpUrl=https://dblp.org/rec/conf/icteri/IvanchenkoKPBS17
}}
==Semi-Markov Availability Model for Infrastructure as a Service Cloud
Considering Hidden Failures of Physical Machines==
<pdf width="1500px">https://ceur-ws.org/Vol-1844/10000634.pdf</pdf>
<pre>
    Semi-Markov Availability Model for Infrastructure as a
    Service Cloud Considering Hidden Failures of Physical
                         Machines


    Oleg Ivanchenko1, Vyacheslav Kharchenko2, Yurij Ponochovny3, Ivan Blindyuk1,

                                     Oksana Smoktii4
                     1 University of Customs and Finance, Dnipro, Ukraine

                     vmsu12@gmail.com, ivanblindyuk@mail.ru
        2 National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine

                                 v_s_kharchenko@ukr.net
    3 Poltava National Technical University named after Yurij Kondratyuk, Poltava, Ukraine

                                     pnch1@rambler.ru
                  4 Vasyl' Stus Donetsk National University, Vinniza, Ukraine

                                oksana.smokty@gmail.com


        Abstract. Results of researches in different areas of science and technique
        confirm that an effective route in order to solve important tasks is the
        possibility of migrating data resource within the cloud. Therefore Cloud
        Computing services including Infrastructure as a Service Cloud (IaaS Cloud)
        should have high availability level. It is serious problem, because even large
        cloud providers face with sudden failures of IaaS Cloud. It comes no surprise,
        that scientists consider taxonomy of this system on base of using overall
        underlying components, such as physical machines (PMs) and virtual machines
        (VMs). However cloud providers should always remember that hidden failures
        of PMs are one of the main causes of damage for their cloud assets. In this
        paper we propose approach on base of using Semi-Markov model in order to
        determine availability level for the IaaS Cloud with Technical State Control
        System.


        Keywords: Infrastructure as a Service Cloud, sudden and hidden failures of
        physical machines, Semi-Markov availability model.


        Key terms: Infrastructure, Mathematical Model, Development, Characteristic.


1       Introduction

Cloud Infrastructure is one of the most widely used model of Cloud Computing.
Therefore modern large cloud providers such as Amazon, Microsoft, Google,
Rackspace need approaches and models for the quantification of reliability level. In
particular, Infrastructure as a Service (IaaS) Cloud provider’s data centers try to
ensure quality of service (QoS) by using different approaches for determining of their
availability level. Significance of issue ensuring of availability of the IaaS Cloud can
hardly be exaggerated.
    Moreover additional incentive for cloud providers is transformation of cyber assets
for several important Critical Infrastructures into cloud. According to researches by
Cornell University and Washington State University, group of scientists have made
efforts to develop software platform for Grid Smart energy infrastructure [1].
Amazon EC2 was used by them in order to perform the cloud computing needs for the
Critical Energy Infrastructure.
    At the same time nowadays we can describe situation, when number of physical
machines for IaaS Cloud data centers are climbing fast and different scientists try to
help providers control their availability level [2], [3]. Most of the scientists usually
prefer to use Markov models in order to solve different tasks for concrete computer
systems, including Cloud Infrastructures. In fact Continuous Time Markov Chains are
main toolkit and predominate among the different mathematical models of availability
and reliability for IaaS Cloud [4]. Stochastic Petri Nets [5] also featured heavily in the
list with Reliability Block Diagrams [6], Fault Trees and Reliability Graphs all among
the best techniques, which researches of cloud computing systems prefer to use.
    However researchers have to take into consideration that these types of models
need to describe different events for IaaS Cloud, including sudden and hidden
failures, repairs and monitoring services. We can't afford to ignore issues, that to
relate to the monitoring of technical and information states of different components
for cloud infrastructures. Our researches show, that due to combination of
deterministic and stochastic durations into working cycle of Infrastructure as a
Service Cloud, this availability model can be presented by us as Semi-Markov model.
We will also try to consider solutions for IaaS Cloud on base of benefits added by
description of Semi-Markov process with special states. It gives us a way to conduct
quite deep analysis of IaaS Cloud behavior in different negative situations, involving
accidents of data centers, failures of physical and virtual machines or even DoS and
DDoS attacks. At the same time we won't theorize approaches and techniques that
related to the availability of cloud infrastructure. We will only consider concrete
situation for the IaaS Cloud with definite number of PMs.
    In this paper, we consider how to build Semi-Markov availability models for the
IaaS Cloud with three pools of physical machines (PMs) and Technical State Control
System. Note that, IaaS Cloud provider can substitute one machine for another in case
definite PM is failed. PMs are grouped into three pools such as: hot, warm and cold
pools [7], [8].
    Rest parts of this paper are organized as follows. In Section 2 a special approach
for availability analysis of IaaS Cloud with three multiple pools is described by us,
considering sudden and hidden failures of PMs. Final Section 3 introduces the
conclusions statement of our researches.
2      Statement of the Researches Results

2.1 Approach for Availability Analysis of IaaS Cloud

We will not try describing concrete architecture for IaaS Cloud, but we will consider
that user has access to IaaS Cloud. At the same time we assume that IaaS Cloud
consists of three pools of PMs. According to the research, three pools of PMs allow to
reduce infrastructure cost, cooling cost and power consumption of IaaS Cloud [9].
   The approach uses the means of failure detection that two main components could
employ. First component is Resource Provisioning Decision Engine (RPDE) by which
the pools implement capture the resource provisioning decision process [10]. As
second component that are functioned in system, namely, Technical State Control
System (TSCS), which is working in monitoring and diagnostic modes [8]. Perhaps
inspired by necessity of through high availability level maintenance, cloud providers
will get possibilities for effective repair and migration of physical resources (PMs).
Figure 1 shows the taxonomy model for availability analysis of IaaS Cloud.


            Fig. 1. Taxonomy model for availability analysis of IaaS Cloud

  As this taxonomy (Fig. 1) shows, that provider has to consider different regimes
and operational time of TSCS in order to ensure high availability level of the IaaS
Cloud. Furthermore, researchers must build mathematical availability models for IaaS
Cloud considering different types of physical machine’s failures. In [11], the authors
presented data about several typical cloud services’ downtime. In particular, they
calculated that in 2011 year Amazon cloud services downtime equals about 8,5 days
and the corresponding availability was about 97,67%. This information teaches us,
how to analyze, what is causing the downtime of cloud services. Results of analysis
have shown that hidden failures of PMs is one the most important causes of the IaaS
Cloud downtime.
    Turning to building of models, we will look at specific type of models that is
Semi-Markov models with special states. Let's look now at overall methods of
solution tasks to assess the IaaS Cloud availability level. Research has shown that in
order to solve different tasks we propose to enhance the State Space Modeling
Taxonomy [12], [13] with new type of Semi-Markov models. Semi-Markov
regenerative process is described by this type of models. In fact we can characterize
these models, that relate tо the Semi-Markov birth-death processes.
   So far, we have tried to use state-space models for monolithic cloud computing
system, but not for separate components of IaaS Cloud. It was serious problem,
because this model couldn't give accurate assessments for whole system. Now we can
obtain the availability allocation for all IaaS Cloud and for concrete PM using Semi-
Markov modeling approach, by which cloud provider determines possibilities of their
own resources in working environment. As an illustrative example, sudden and
hidden failures for IaaS Cloud with three pools of PMs and TSCS will be considered
by us.

2.2 Analytical and Stochastic Availability Model for an IaaS Cloud Considering
Hidden Failures of PMs

The case study chosen is a model of IaaS Cloud. Figure 2 shows finite graph for
Semi-Markov model of the IaaS Cloud with three PMs.


      Fig. 2. Semi-Markov availability model of the IaaS Cloud with three PMs
    In Fig. 2, if three PMs fail, the cloud computing system becomes unavailable. State
 S7 is unavailable state of IaaS Cloud. Obviously the IaaS Cloud becomes available,
when the model enters states S0 , S2 , S3 , S5 . In state S0 three PMs are operational. At
the same time states S 2 and S3 are states with two operational PMs. Then state S5 is
state with one operational PM. From now available states are yellow, whereas
unavailable states are red and states of TSCS are green.
    Suppose our IaaS Cloud works throughout a particular time with operated duration
 t  0 ,T  . We use the following assumptions and limitations for our modeling
process.
 Hot, warm, and cold PMs are identical PMs [4]. Provider can do replacement of
    failed hot PM by available warm or cold PM, respectively.
 Solution for this model can be obtained when replacement process of PMs is
    instantaneous. It means that we consider immediate transitions.
 IaaS Cloud provider can perform technical state control (CTS) of hot PM. The
    duration of this interval is  с .
 Overall effect several types of possible failures in PMs with an aggregated mean
  time to failure (MTTF) is considered here [14]. We also use assumptions that all
  times to failures for all PMs are exponentially distributed. Despite the fact that hot,
  warm and cold PMs have different operating time in order to simplify modeling
  process, we will suppose that MTTF 1 s can be represented as equal values.
  Apparently, it is reasonable to consider case for three hot, warm and cold PMs
  ( nh  nw  nc  1 ), when MTTF 1  s  1  sh  1  sw , where sudden failure rates
   sh for hot PM and sudden failure rates sw for warm PM.
 IaaS Cloud provider haven’t enough time in order to perform repair operations for
  failed PMs. Therefore we need to take into consideration that all times to repair are
  not exponentially distributed. We use Erlang-k distribution in preference to
  exponential distribution, where k  2 [15]. We also assume that mean time to
  repair (MTTR) of warm PM 1 w is higher than MTTR of hot PM 1 h by a
  factor of two.
 Specific feature of architecture for IaaS Cloud is implementation of migration
  process of PMs. We consider the migration operations of physical machines as
  operations to restore the working capacity of warm and hot PMs with repair rates
   w and h , respectively.
 We assume that hot and warm PMs can fail due to the occurrence of hidden
  failures with rates h   hh  hw . In this case MTTF equals 1 h  1 hh  1 hw
   for hot and warm PMs, respectively. Hidden unavailable hot or warm PM will
   repair after next CTS with rate h .
 IaaS Cloud becomes unavailable when the SM model enters the state S7 .
  According to the Fig. 2, our IaaS Cloud is processing workload into the states S0
(at the initial moment t  0 ), S 2 , S3 and S5 , that is available sub-set space S A1 for
IaaS Cloud was created by states S A1  S0 , S 2 , S3 , S5  . Other states for IaaS Cloud can
                                       1
be described as: a) CTS sub-set space SCTS  S1 , S4 , S6 , S9 , S11 , S13  ; b) unavailable
sub-set space SUA 1
                      S7 , S8 , S10 , S12 . In order to solve this task we will employ
analytical and stochastic method on base of using embedded Markov Chains [8], [15].
Then steady-state probability vector    0 , 2 , 3 , 5  is solution of this task.
   Turning to solution, we will interpret Semi-Markov process as follows. As our
research shows CTS performs deterministic period of time T , therefore transitions
from states Si to states S j are given by:
                                                                                            0 ,t  T ,
           Q01 t   Q24 t   Q34 t   Q56 t   Q89 t   Q1011 t   Q1213 t   
                                                                                            1,t  T .
      At the same time transitions for TSCS from states S j to states Si can be written
as:
                                                                                         0 ,t   c ,
        Q10 t   Q42 t   Q43 t   Q65 t   Q98 t   Q1110 t   Q1312 t   
                                                                                         1,t   c .
   Let we turn now to sudden failures for hot, warm and cold PMs. In this case
distribution function for transitions from state S1 to state S 2 , from state S1 to state
S3 , from state S4 to state S5 and from state S6 to state S7 are given by:
                            Q12 t   Q13 t   Q45 t   Q67 t   1  e  s t
                                                                .
   Now we will move on to hidden failures for hot and warm PMs. Distribution
functions for transitions from state S0 to state S8 , from state S 2 to state S10 , from
state S3 to state S12 can be written as:
                                                          1  е  ht ,t  T ,
                       Q08 t   Q210 t   Q312 t   
                                                          0 ,t  T .
   Next distribution functions for hidden unavailable hot or warm PM after next CTS
are given by:
                            Q92 t   Q93 t   Q115 t   Q135 t   1  e ht
                                                                                      

                                                                                          .
      Distribution functions of transitions from state S 2 to state S0 , from state S3 to
state S0 and from state S7 to state S5 can be written as:
                             Q20 ( t )  Q30 ( t )  Q75 ( t )  1  1  h t e  ht
                                                                   ,
      Simultaneously, distribution functions of transitions from state S5 to state S 2 ,
from state S5 to state S3 are given by:
                                  Q52 ( t )  Q53 ( t )  1  ( 1   wt )e  wt .
                                                            13
      Taking the total probability relation                   i  1 and taking the steady-state
                                                            i 0

probability vector, we can compute the required result as
                               A  0   2  3   5 ,
where  0 , 2 , 3 , 5 are steady-state probabilities for states S0 , S2 , S3 , S5 .
   In other words, states S0 , S2 , S3 , S5 really are states, when IaaS Cloud can perform
operational required functions. But we also consider others states, when IaaS Cloud
can’t perform a certain amount of useful work. Overall results of IaaS Cloud model-
ing based on embedded Markov Chains are shown in Fig. 3 and Fig. 4.


        Fig. 3. Depending of steady-state availability Ah ,T  for T  250 h,
                                         8 1/h


        Fig. 4. Depending of steady-state availability Ah ,T  for T  250 h,
                                        10 1/h
   Analysis of modeling results confirmed that value of steady-state availability A is
increased by means of increasing of repair rate  of hot PMs and reduction of hidden
failure rate h of hot PMs, as it results by comparing Fig. 3 with Fig. 4.
   Next we will illustrate how to use our stochastic approach in order to describe the
behavior of IaaS Cloud with three pools of PMs. Figure 5 shows finite graph for sec-
ond type of Semi-Markov model of the IaaS Cloud with nine PMs. Note that the
model was solved for only one type of hidden failures. It is serious lack, because in
real situation we can observe different types of hidden failures. For example, hidden
failures of hot and warm physical machines can be different.
   In considering the second model, we consider that two different types of hidden
failures of PMs are made possible. Previous study have shown that we can interpret
two branches of hidden failures for PMs using the following probability transitions: 1)
 p03 , p711 , p1325 , p819 , p1429 , p2750 , p2239 , p3564 , p5371 , p4261 , p5775 , p7385 , p2447 ,
 p4668 , p6782 , p8189 , p8893 , p9297 (first branch for PMs of hot pool); 2) p04 , p816 , p2243 ,
 p717 , p1437 , p3558 , p1332 , p2754 , p5378 (second branch for PMs of warm pool).
   According to the Fig. 5, available sub-set space S A2 for second model of IaaS
Cloud was created by states S A2  S0 , S7 , S8 , S13 , S14 , S 22 , S 24 , S 27 , S35 , S42 , S46 , S53 , S57 ,
 S67 ,S73 ,S81 ,S88 ,S92 . Other states for second Semi-Markov availability model can be
described as unavailable. Then steady-state probability vector    0 ,7 , 8 , 13 , 14 ,
 22 , 24 , 27 , 35 , 42 , 46 , 53 , 57 , 67 ,73 , 81 , 88 , 92  is solution of this task.
   Next we will also move on to hidden failures for hot and warm PMs. Distribution
functions for first branch of transitions can be written as:
                                                           1  е  ht ,t  T ,
                                                Qij t   
                                                           0 ,t  T .
   At the same time distribution functions for second branch of transitions is given
by:
                                                           1  е  wt ,t  T ,
                                               Qij t   
                                                           0 ,t  T ,
where hidden failure rates of warm PMs  w is lower than hidden failure rates of hot
PMs  h by a factor of two to four [7].
   Distribution functions of transitions for repair time of the hot and warm PMs are
given by:
                                         Q ji ( t )  1  1   ht eht
                                                                             ,
                                        Q ji ( t )  1  1   wt e    wt
                                                            ,
where repair rate of hot PMs  h is higher than repair rate of warm PMs  w .
  Overall equation for steady-state availability of IaaS Cloud can be written as:
A   0  7   8   13   14   22   24   27   35   42   46   53   57   67  73 
                                              81   88   92 .
Fig. 5. Semi-Markov availability model of the IaaS Cloud with ten PMs
   Finally, in spite of the fact that second Semi-Markov model (Fig. 5) is more
complex than first model (Fig. 2), nevertheless we can use similar approach in order
to get solution based on embidded Markov Chains. Nowadays we can allege that it is
one of the most notable advantages of Markovian modeling all among the famous
mathematical methods and stochastic approaches, on which scientific researches for
Cloud Computing area is based.


3      Conclusions Statement of the Researches Results

In this paper we performed Semi-Markov modeling for the IaaS Cloud with TSCS
based on embedded Markov Chains. The contributions of this paper are the following.
   The purpose of the model, through the tool implementing it, is studied and used
IaaS Cloud availability measures. It really is the significant assessments for provider.
Indeed sudden and hidden failures of PMs are serious problem for IaaS Cloud provid-
er. We illustrated these results in an availability Semi-Markov model with fourteen
states.
   Our model can be used in order to make profound analysis of different architec-
tures for IaaS Cloud, in particular during accidents, disasters and other negative
events, such as DoS and DDoS attack. Several optimization problems for IaaS Cloud
regarding resource availability can be formulated using our analytical and stochastic
model described in this paper.
   In conclusion, increasing scalability and flexibility of this type of models is future
of IaaS Clouds development.


References

1.   Gamage, T., et al. Mission-Critical Cloud Computing for Critical
     Infrastructures. Smart Grids: Clouds, Communications, Open Source, and
     Automation. CRC Press, pp. 1–16 (2014).
2.   Birke, R., Chen, L. Y., Smirni, E. Data centers in the cloud: A large scale
     performance study. Cloud Computing (CLOUD), IEEE 5th International
     Conference on, pp. 336–343 (2012).
3.   Patel, C., Shah, A. Cost model for planning, development and operation of a data
     center. HP Laboratories Palo Alto, Tech. Rep. (2005).
4.   Ghosh, R., Longo, F., Frattini, F., Russo, S., Trivedi, K.: Scalable analytics for
     IaaS cloud availability. In IEEE Transactions on Cloud Computing, 2(1), pp. 57–
     70 (2014).
5.   Silva, B.: A framework for availability, performance and survivability evaluation
     of disaster tolerant cloud computing systems. Diss. Federal University of
     Pernambuco (2016).
6.   Matos, R., Araujo, J., Oliveira, D., Maciel, P., Trivedi, K.S.: Sensitivity analysis
     of a hierarchical model of mobile cloud computing. Simulation Modelling
     Practice and Theory, no. 50, pp. 151–164 (2015).
7.  Ghosh, R, Longo, F., Xia, R., Naik, V. and Trivedi, K.S.: Stochastic model
    driven capacity planning for an infrastructure-as-a-service cloud. In IEEE
    Transactions on Services Computing, vol. 7, no. 4, pp. 667–680 (2014).
8. Ivanchenko, O, Kharchenko, V. Semi-markov availability models for an
    Infrastructure as a Service Сloud with multiple pools. In Proc. International
    Conference on ICT in Education, Research, and Industrial Applications, pp. 349–
    360 (2016).
9. Longo, F, Ghosh, R, Naik,V.K, Trivedi, K.S. A scalable availability model for
    Infrastructure-as-a-Service Cloud. In Proc. The 41st IEEE/IFIP International
    Conference on Dependable Systems and Networks, pp. 335–346 (2011).
10. Ghosh, R.: Scalable stochastic models for cloud services. Diss. Duke of
    University (2012).
11. Li, Zheng, Liang, Mingfei, O’Brien, Liam, Zhang, He. The cloud's cloudy
    moment: A systematic survey of public cloud service outage. arXiv preprint
    arXiv:1312.6485 (2013).
12. Trivedi, K.S., and Sahner, R.: SHARPE at the age of twenty two. ACM
    Sigmetrics Performance Evaluation Review, vol. 36, no. 4, pp. 52–57 (2009).
13. Cai, B.L., Zhang, R.Q., Zhou, X.B., Zhao, L.P., Li, K.Q.: Experience
    Availability: Tail-Latency Oriented Availability in Software-Defined Cloud
    Computing. In Journal of Computer Science and Technology, vol. 32, no. 2, pp.
    250–257 (2017).
14. Lanus, M., Yin, L., and Trivedi, K.S.: Hierarchical Composition and Aggregation
    of State-Based Availability and Performability Models. In IEEE Trans.
    Reliability, vol. 52, no. 1, pp. 44–52 (2003).
15. Ivanchenko, O., Lovyagin, V., Maschenko, E., Skatkov, A., Shevchenko, V.:
    Distributed critical systems and infrastructures. National Aerospace University
    named after N. Zhukovsky “KhAI”, Kharkiv (2013).

</pre>