=Paper=
{{Paper
|id=Vol-1614/paper_116
|storemode=property
|title=Semi-Markov Availability Models for an Infrastructure as a Service Cloud with Multiple Pools
|pdfUrl=https://ceur-ws.org/Vol-1614/paper_116.pdf
|volume=Vol-1614
|authors=Oleg Ivanchenko,Vyacheslav Kharchenko
|dblpUrl=https://dblp.org/rec/conf/icteri/IvanchenkoK16
}}
==Semi-Markov Availability Models for an Infrastructure as a Service Cloud with Multiple Pools==
Semi-Markov Availability Models for an Infrastructure as a Service Cloud with Multiple Pools Oleg Ivanchenko1, Vyacheslav Kharchenko2 1 University of Customs and Finance, 2/4 Dzerzhinskogo, 217 km, Dnepropetrovsk, Ukraine vmsu12@gmail.com 2 National Aerospace University “Kharkiv Aviation Institute” 17 Chkalova St., 61070, Kharkiv, Ukraine v_s_kharchenko@ukr.net Abstract. Solving of tasks for Cloud Computing is impossible without main- taining of high availability level of Infrastructure as a Service (IaaS) Cloud. Several large IaaS Cloud providers try to solve this problem by means of in- creasing number of physical machines (PMs) in multiple pools. However, mi- grations of available PMs from one pool to another and also repairs, diagnostic instances of failed physical machines are quite complex task for modeling of availability for an IaaS Cloud. In this paper, we show how we can build Semi- Markov availability models with discrete states and how we can use it in order to determine availability level for the IaaS Cloud with three pools. Keywords. Infrastructure as a Service Cloud, three pools of physical machines, Semi-Markov availability models Key Terms. Infrastructure, Mathematical Model, Development, Characteristic 1 Introduction Nowadays Cloud Computing is one of the most widely used services in an enterprise environment. Therefore availability of cloud infrastructures is of paramount impor- tance to improve quality of service (QoS) and development of cloud user’s possibili- ties. Despite this fact, researchers of cloud infrastructure behavior, including avail- ability and reliability analysis of respective components are still regarded as quite complex scientific direction for different kinds of modeling. Large Infrastructure as a Service (IaaS) Cloud providers try to use multiple pools of physical machines (PMs) in order to maintain normal operation of cloud’s compo- nents on quite a long period of time. However, with a larger number of PMs number of states for stochastic model of an IaaS Cloud also increase; the model ought to in- ICTERI 2016, Kyiv, Ukraine, June 21-24, 2016 Copyright © 2016 by the paper authors - 350 - clude a large number of parameters while still being tractable [1]. Some famous re- searchers in order to perform availability analysis of large side clouds had built inter- acting sub-models, before they started to build monolithic model for an IaaS Cloud [2]. They built interacting sub-models on based Markov models and used stochastic reward nets. At the same time these sub-models were built by them with using of their own software package SHARPE [3]. Earlier in paper [4] authors tried to build a con- tinuous time Markov chain (CTMC) availability model, for an example with two PMs in each pool. Another researcher also proposed the availability model with the failure of PMs, repairing process and employment of cold PMs in case of failure in running machines [5].In this model PMs in each pool are modeled by a three-dimensional CTMC too. In spite of the fact that various authors used stochastic approach based Markov models to describe behavior of the physical machine pools, they couldn’t get rigorous analytical expressions. Focus of this paper is to build Semi-Markov availability models for the IaaS Cloud with three pools and different number of PMs in each pool. 2 Statement of the Researches Results 2.1 Metamodel for Availability Analysis of IaaS Cloud Note that the architecture of an IaaS Cloud is not tied to a real cloud implementation [6]. Suppose that researchers have used a simple cloud infrastructure with certain number of PMs. To reduce power consumption, cooling and infrastructure costs, PMs are grouped into three pools such as: hot, warm and cold pools. Assume that hot pool consists from turned on and running PMs; warm pool contains turned on, although not ready physical machines; cold pool consists from turned off PMs. Moreover this ar- chitecture has certain number of virtual machines (VMs), which are deployed on PMs. Deployment of VMs on base PMs allows to reduce power consumption and to maintain enough high performance of the cloud implementation. In difference from other, proposed concept for maintaining of availability for a cloud infrastructure bases on use of two additional systems, namely Technical State- Control System (TSCS)and Resource Provisioning Decision Engine (RPDE) [7].Our IaaS Cloud should be used TSCS, which is working in monitoring and diagnostic modes. In this case, these modes as regarded as an organization form of constant con- trol of the significant parameters that the determinate not only the PMs performabil- ity, but also affect cloud infrastructure readiness to make effective intended use [8]. It's obviously, that monitoring and diagnostic sub-systems provide repair facilitated by information which is needed to repair and migration of PMs from one pool to other. As described in [6], RPDE tries to find a PM that can accept the job provision- ing. Figure 1 shows the portions of the taxonomy metamodel for availability analysis of IaaS Cloud. Researchers in order to deal with the complexities of metamodeling should work in the paradigm of four models, such as scalability, performance, flexi- bility (elasticity), power consumption. Each model ensures the overall metamodel by input parameters, namely initial number of PMs for each pool (scalability model), power consumption for each PM (power consumption model), management metrics - 351 - values, search rates (flexibility model) and failure rates, repair rates, migration rates, number of repair facilities (reliability model).In other words output parameters of these models are input parameters for meta model. At the same time values of design and temporal parameters of such models can be experimentally measured. The stages of meta modeling are colorfully shown by this figure. According to the illustrated Fig. 1, we will try to create analytical models with considering states and stochastic chang- ing of all times failures, repairs and migrations of PMs. Fig. 1. Taxonomy metamodel for availability analysis of IaaS Cloud On this basis, we will construct Semi-Markov model for availability analysis of an IaaS Cloud with three pools. Therefore it is proposed to describe various options of interactions of PMs at availability-model level. 2.2 Analytical Availability Models for an IaaS Cloud with Three Pools Let's consider two interesting analytical models of an IaaS Cloud. Fig. 2 shows a Semi-Markov (SM) model for availability analysis of the IaaS Cloud with three pools (hot, warm and cold) and three PMs in each pool. In our modeling we use the following assumptions and limitations. Hot, warm, and cold pools contain identical PMs [9]. If a hot PM fails the failed PM is replaced by available (non-failed) PM from warm or cold pools, respec- tively. We assume that periodic technical state control (CTS) of hot PMs is operated dur- ing a time interval, which lasts c . To analyze the availability of the IaaS Cloud we also assume that all times to fail- ure of all PMs are exponentially distributed. Typically, mean time to failure (MTTF) of warm PMs ( 1 w ) is higher than MTTF of hot PMs ( 1 h ) by a factor - 352 - of two to four [7]. At the same time MTTF of cold PMs is a very lower than 1 w . However, for process of SM modeling we will use only MTTF of hot PMs, consid- ering quite high reliability level of warm and cold PMs. Moreover in real situations providers haven’t enough time for repair of failed hot PMs, as well as they haven’t enough number of repair facilities. Therefore we also assume that all times to repair are not exponentially distributed. In this occasion we have preferred to use Erlang-k distribution, where k 2, 3 [10]. Parameter 1 is mean time to repair (MTTR) of a PM. Available PMs can migrate from warm and cold pools to hot pool. We also assume that all times to migration (migration delays) of PMs are exponentially distributed. For modeling we have used mean time to migration (MTTM) of PMs from warm ( 1 wh ) and cold ( 1 ch ) pools to hot pool. The migrations of PMs to hot pool are implemented when providers can search non-failed warm or cold PMs with mean time to searches (MTTSs) 1 w and 1 c . We consider that IaaS Cloud becomes unavailable when the SM model enters the state S15 . Suppose that this infrastructure is operated during a time interval t 0 ,T and at the initial moment t 0 the IaaS Cloud is ready for using (state S0 ). The transition from state S0 to state S1 occurs at fixed nonrandom time c T , where parameter T is operation time of IaaS Cloud between two periodic controls of technical state. The state S1 is state of CTS. Note that the periodic CTS includes monitoring and diagnos- tic operations of hot PMs and conduct by means of using Technical State Control System. If third hot PM is available the SM model returns from state S1 to state S0 . Otherwise when the TSCS detects a failure, model goes to state S2 with rate h . In state failure of the third hot PM, model tries to search non-failed warm PM (transition from state S2 to state S3 ) with rate w or cold PM (transition from state S2 to state S4 ) with rate c . When warm or cold PMs are available, model transforms from state S3 to state S0 or from state S4 to state S0 respectively. If the warm and cold pools are empty, repair facility tries to recover the failed hot PM, that is model goes from state S2 to state S0 with repair rate .When recovery the third failed PM is impossible, model transforms from state S2 to state S5 with overall failure rate 3h . It means that next steps of modeling as regards states of S5 – S9 for second hot PM and states of S10 – S12 for first hot PM are repeated. Note that in this case we can maintain that transition from state S7 to state S10 and transition from state S12 to state S15 are implemented with overall failure rates 2h and h re- spectively. We also consider that the model will transition from state S12 to state S15 when the last hot PM fails. - 353 - Fig. 2. SM model for the availability analysis of the IaaS Cloud with three PMs in each pool To solve this task in the following we are inclined to use method of transformation of the SM models into embedded Markov chains [10]. For this type of models the transitions of process from state i to state j occur through unit time. Therefore the transitions of this SM process are interpreted as follows. Since CTS performs within fixed deterministic period of time T , consequently transition from state S0 to state S1 is given by: 0 ,t T , Q01 t 1,t T . The transition from state S1 to state S0 is then given by: 0 ,t c , Q10 t 1,t c . The other similar transitions can be got as follows: , , , , 0 t T Q56 t Q1011 t 1 t T 0 ,t с , Q65 t Q1110 t 1,t с . At the same time, probabilities of sudden failures of hot PMs at random times for transitions from state S1 to state S2 , from state S6 to state S7 and from state S11 to state S12 are given by: Q12 t Q67 t Q1112 t 1 e ht . - 354 - Implementations of transitions from state S2 to state S0 , from state S7 to state S5 , from state S12 to state S10 and from state S7 to state S0 , from state S12 to state S5 , from state S15 to state S10 depend from time to repair of the hot PMs. Therefore in these cases, distribution functions of repair time are given by: Q20 t Q75 t Q1210 t 1 1 t e t , ︵ t︶ Q ︵ t︶ 1 1 t t e t . 2 Q70︵t︶ Q125 2 1510 ︵︶ ︵︶ ︵︶ For our SM availability model, we assume that distribution functions of search time of non-failed warm and cold PMs respectively are given by: Q23 t Q78 t Q1213 t 1 e wt , Q24 t Q79 t Q1214 t 1 e сt . Similarly, distribution functions of migration time for warm and cold PMs respec- tively are given by: Q30 t Q85 t Q1310 t 1 e wht , Q40 t Q95 t Q1410 t 1 e сht . Then steady-state availability [10] of the cloud can be computed as A 0 5 10 , (1) where 0 , 5 , 10 are steady-state probabilities for states S0 , S5 , S10 . On the other hand, the steady-state availability A (1) is given by [11]: l i m A ︵A︶ t , t where︵A︶ t – instantaneous availability of the cloud infrastructure. In the overall case steady-state probabilities of SM availability model are given by: t t t 0 0 , 5 5 , 10 10 , U U U U t0 t1 p12 t 2 p23 t3 p24 t4 t5 t6 p67 t7 p78 t8 p79 t 9 t10 t11 t12 p1213 t13 p1214 t14 p1215 t15 , p67 p710 p12 p25 where , , 1 p1110 p1112 1 p65 p1112 p125 p67 p1210 p1213 p1214 p1215 , p75 p78 p79 , p 1112 , p12 p67 p1112 1 e h c , - 355 - p23 w 1 t e 3 h w c t dt , p24 c 1 t e 3h w c t dt , 0 0 p25 h 1 t e 3 h w c t dt , 0 1 2 2 2 t 2 e 2h w c 2 t dt , p75 t t 2 t 2 1 t 20 2 p78 w 1 t 1 t t 2 e 2h w c 2 t dt , 0 2 p79 c 1 t 1 t t 2 e 2h w c 2 t dt , 0 2 p710 h 1 t 1 t t 2 e 2h w c 2 t dt , p p e h c , 2 65 1110 0 1 p1210 2 t 2 t 2 2 t 2 1 t t 2 e h w c 2 t dt , 20 2 p1214 c 1 t 1 t t 2 e h w c 2 t dt , 0 2 p1215 h 1 t 1 t t 2 e h w c 2 t dt , 0 2 p125 1 p1210 p1213 p1214 p1215 , 3h w c 2 1 1 t0 t5 t10 T , t2 , t3 t8 t13 , t4 t9 t14 , 3h w c 2 wh ch 3 t15 , t1 t6 t11 1 h 1 e h c , t7 1 t 1 t t 2 e 2h w c 2 t dt , 0 2 t12 1 t 1 t t 2 e h w c 2 t dt . 0 2 Plots depending of steady-state availability A from failure rates h of hot PMs and operation time T (repair rates are constant values) are shown in Fig. 3, Fig. 4. The values of steady-state availability A are greatly increased by means of increasing of repair rate and reduction of failure rate h of hot PMs, as depicted in Fig. 3 and Fig. 4. - 356 - Fig. 3. Depending of steady-state availability Ah ,T for T 100 h, 0 ,5 1/h Fig. 4. Depending of steady-state availability Ah ,T for T 100 h, 0 ,75 1/h Let's continue our researches by means of creation more scalable stochastic model for IaaS Cloud. Because with a larger number of PMs in a data center, the overall Cloud service availability increases, leading to lower cost of service downtime [7].Therefore within a unified methodological approach we will try to create an improved SM availability model of infrastructure with a larger number of PMs. Additional researches have shown that IaaS Cloud providers wish to increase number of PMs in order to minimize downtime cost and damage business reputation [4], [6], [7]. Perhaps inspired by using stochastic approaches for solution various - 357 - serious tasks of determining the optimal PM capacity configuration of IaaS Cloud [6], we have been proposing next SM availability model. Assume that our infrastructure contains similar three pools with ten PMs in each pool. This SM model for availability analysis of the IaaS Cloud is shown in Fig. 5. Also suppose that all times to failure of PMs are exponentially distributed and Erlang- k distribution, where k 2 is general distribution for all times to repair. In spite of the fact, that both models are SMs models, we have to take into consideration some interesting features of their implementation. Fig. 5. SM model for the availability analysis of the IaaS Cloud with ten PMsineachpool Unlike first SM model, second SM availability model of the IaaS Cloud includes modeling kernel from five states. The states S0 , S1 , S5 for second model (Fig. 5) are the same as the first model (Fig. 2). But the difference between kernels of first model and second model is that states S4 , S9 for first model are states of search of the cold PM, whilst these states for second model are states of failure of the warm PMs. For second model the following group assumptions can take place. Model contains hot, warm, and cold pools. Every pool consists of ten identical PMs[9]. If a hot PM fails the failed PM is replaced by available (non-failed) PM from warm or cold pools too. Upon failure of the warm PM, the failed PM is replaced by available (non-failed) PM from cold pool. We also assume that periodic technical state control (CTS) of hot PMs is operated during a time interval, which lasts с . To analyze performance and availability of the IaaS Cloud we also assume that all times to failure of all hot and warm PMs are exponentially distributed. We also consider that all times to repair are not exponentially distributed.In this case we have used Erlang-k distribution, where k 2,3 [10]. Parameter 1 is mean time to repair (MTTR) of a PM. Cloud infrastructure becomes unavailable when the SM model enters thestate S50 . Therefore the transitions of modeling kernel for second SM model can be written as follows: - 358 - Q01 t Q56 t Q1011 t Q1516 t Q2021 t Q2526 t Q3031 t Q3536 t 0 ,t T , Q4041 t Q4546 t 1,t T , Q10 t Q65 t Q1110 t Q1615 t Q2120 t Q2625 t Q3130 t Q3635 t 0 ,t c , Q4140 t Q4645 t 1,t c , For other functions we can write the following: Q12 t Q67 t Q1112 t Q1617 t Q2122 t Q2627 t Q3132 t Q3637 t Q4142 t Q4647 t 1 e wt , Q13 t Q68 t Q1113 t Q1618 t Q2123 t Q2628 t Q3133 t Q3638 t Q4143 t Q4648 t 1 e c t , Q14 t Q69 t Q1114 t Q1619 t Q2124 t Q2629 t Q3134 t Q3639 t Q4144 t Q4649 t 1 e wt , Q20 t Q75 t Q1210 t Q1715 t Q2220 t Q2725 t Q3230 t Q3735 t Q4240 t Q4745 t 1 e wh t , Q30 t Q85 t Q1310 t Q1815 t Q2320 t Q2825 t Q3330 t Q3835 t Q4340 t Q4845 t 1 e ch t , Q43 t Q98 t Q1413 t Q1918 t Q2423 t Q2928 t Q3433 t Q3938 t Q4443 t Q4948 t 1 e c t , Q40 t Q95 t Q1410 t Q1915 t Q2420 t Q2925 t Q3430 t Q3935 t Q4440 t Q4945 t 1 1 t e t , Q50 t Q105 t Q1510 t Q2015 t Q2520 t Q3025 t Q3530 t Q4035 t Q4450 t Q5045 t 1 1 t t 2 e t , 2 Q15 t 1 e 1t , (2) Q610 t 1 e 2t , (3) Q1115 t 1 e 3t , (4) Q1620 t 1 e 4 t , (5) Q2125 t 1 e 5 t , (6) Q2630 t 1 e 6 t , (7) Q3135 t 1 e 7 t , (8) Q3640 t 1 e 8 t , (9) Q4145 t 1 e 9 t , (10) - 359 - Q4650 t 1 e 10t . (11) We define the failure rates for j 1,2 ,...,nh ( nh 10 ) PMs nodes[9], [10]: j nh i 0 , i 0 ,1,...k (for k nh 1 ), (12) where 0 – basic failure rate value for all PMs. By replacing the j expression (12) to the s values in the equations (2), (3), …, (11), we will be finished description of second model. As can be seen in Fig. 3 and Fig. 4 in case with three PMs in each pool, IaaS Cloud has quite high of availability level. Results of modeling for second SM availability model will get in the near future time. Overall feature for both SM models is identical modeling kernels. 3 Conclusions Statement of the Researches Results Clearly, proposed stochastic approach based SM models gives opportunity to perform availability analysis of the Cloud Infrastructures with using of different modeling kernels. Thus, the contributions of this paper are the following. If you wanted to make a deep availability and reliability analysis of the IaaS Cloud, for example, when this infrastructure is one of the most important components of Management System Critical Infrastructure, in particular during the accidents and disasters or other negative events, such as sudden or hidden failures, you would be able to use proposed SMs availability models. An additional advantage of these SMs models is that researches can use rigorous analytical expressions from this paper in order to determine availability and reliability values for the IaaS Cloud. Moreover you can use this stochastic approach in order to choose optimal architectures among the many various Cloud Infrastructures. Several optimization problems, including capac- ity planning, management of resources of Cloud Infrastructures can be solved using stochastic approach and SM models described in this paper. References 1. Khazaei, H., Misic, J., Misic, V.B., Mohammadi, N.B.: Availability analysis of cloud com- puting centers. In: Globecom 2012 – Communications Software, Services and Multimedia Symposium (GC12 CSSM) (2012) 2. Longo, F., Ghosh, R., Naik, V.K., Trivedi, K.S.: A scalable availability model for infra- structure-as-a-service cloud. In: Proc. Int Conf on Dependable Systems and Networks, pp. 335--346 (2011) 3. Trivedi, K.S., and Sahner, R.: SHARPE at the age of twenty two. ACM Sigmetrics Per- formance Evaluation Review, 36(4), 52--57 (2009) 4. Ghosh, R., Trivedi, K.S., Naik, V.K., Kim, D.S.: End-to-end performability analysis for in- frastructure-as-a-service cloud: An Interacting Stochastic Models Approach. In: IEEE PRDC, Tokyo (2010) - 360 - 5. Khazaei, H: Performance modeling of cloud computing centers. Diss. The University of Manitoba (2013) 6. Ghosh, R., Longo, F., Xia, R., Naik, V., Trivedi, K.: Stochastic model driven capacity planning for an infrastructure-as-a-service cloud. IEEE Transactions on Services Comput- ing, 7(4), 667--680 (2014) 7. Ghosh, R.: Scalable stochastic models for cloud services. Diss. Duke of University (2012) 8. Ivanchenko, O., Kharchenko, V., Skatkov, A.: Management of critical infrastructures Based on Technical Megastate. Int. J. “Information and Security”, 28(1), 37--51 (2012) 9. Ghosh, R., Longo, F., Frattini, F., Russo, S., Trivedi, K.S.: Scalable analytics for IaaS cloud availability. IEEE Transactions on Cloud Computing, 2(1), 57--70 (2014) 10. Ivanchenko, O., Lovyagin, V., Maschenko, E., Skatkov, A., Shevchenko, V.: Distributed critical systems and infrastructures. National Aerospace University named after N. Zhukovsky “KhAI”, Kharkiv, Ukraine (2013) 11. Abbadi, I.M.: Toward Trustworthy Clouds’ Internet Scale Critical Infrastructure. In: Bao, F., Weng J. (eds.) Information Security Practice and Experience. LNCS-6672, pp. 71–82. Springer Verlag, Heidelberg (2011)