=Paper=
{{Paper
|id=Vol-1940/paper08
|storemode=property
|title=Stochastic Modelling of Large-Scale Distributed Computer Systems Functioning with Group Restorations
|pdfUrl=https://ceur-ws.org/Vol-1940/paper08.pdf
|volume=Vol-1940
|authors=Valery Pavsky,Kirill Pavsky
}}
==Stochastic Modelling of Large-Scale Distributed Computer Systems Functioning with Group Restorations==
Stochastic Modelling of Large-Scale Distributed Computer Systems Functioning with Group Restorations Valery Pavsky 1, and Kirill Pavsky 2 1 Kemerovo Institute of Food Science and Technology (University), Kemerovo, Russia pavva46@mail.ru 2 Rzhanov Institute of Semiconductor Physics Siberian Branch of Russian Academy of Sciences, pr. Lavrentieva, 13, Novosibirsk, 630090, Russia pkv@isp.nsc.ru Abstract. The model of functioning of distributed computer systems with group restora- tions of failed machines is considered. The model is formalized in the form of a system of differential equations, in which are unknown probabilities of the states. The paper proposes solutions for calculating the mathematical expecta- tion of the number of efficient machines and variances, which are the basis for creating indices of potential robustness. The investigation of the functioning of the CS under the assumption of the va- lidity of the exponential law of failure of computers makes it possible, due to a well-developed theory, to obtain profound results, in contrast to the use of other distribution laws. And the obtained analytical solutions can be used for rapid analysis of the functioning of the CS. Keywords: distributed computer systems, mathematical model, group restora- tion, number of working machines, analytical solutions. 1 Introduction The problem of the reliability of high-performance scalable computing devices, in- cluding supercomputers and computer systems [1], increases with the number of ele- mentary machines (EM) [2], the number of which in such systems ranges from sever- al tens to hundreds of thousands [3]. Practice shows that in scalable computing sys- tems (CS) the time between different types of failures can be measured by hours [4,5]. The preservation of the efficiency of the CS in the conditions of failures [6-8], the analysis of functioning with regard to robustness, is an urgent task. This work is devoted to the development of means for analyzing the efficiency of the operation of larger-scale distributed CS [9, 10]. The queuing theory apparatus is used as an analysis tool. The paper proposes formulas for calculating the mathematical expectation and va- riance of the number of working machines, which are the basis for creating indicators of potential robustness in group recovery [2]. The indices of potential robustness of 61 the CS take into account the fact that in the solution of problems all the working EMs are used, the number of which is actually instant. This assumes that parallel programs of complex tasks, when implemented on survivable CS, are capable of using the total performance of all working EM systems. Assuming mathematical idealization, CS can be regarded as a stochastic object. 2 Model of the CS functioning A stochastic model of the distributed CS operation is proposed, in which the EMs are not absolutely reliable [2]. Each of them fails with intensity. Out of order EM gets into the restoration system and is waiting for resoration. At random moments of time, the recovery is exercised by groups in r EM with intensity (Fig.1). We believe that at the initial time the system contains n EMs. As performance indices (evaluation of potential viability), we use numerical characteristics – the mathematical expectation of the number of effective EMs and its variance [2]. Computer System Restoration System r r μ μ μ k . Fig. 1. Model of the functioning of a scalable CS in the group recovery of failed EM. In constructing the model of the functioning of the CS, we use the methods of queuing theory, in which models of this class are formulated according to the tried- and-tested method – a system of differential equations is compiled, and the probabili- ty distribution is considered as unknown functions. The analytical solution of such systems is far from always available, even stationary [2]. Assuming further mathemat- ical idealization of the model, we believe that the number of EMs in the CS is poten- tially infinite, which is permissible because of its scalability. From the formalized system of differential equations of the model, we find an analytical solution, directly for the moments, bypassing the probability distribution. 62 3 Mathematical Model The queuing system (QS) with an infinite number of channels receives a Poisson stream of packets with intensity [11, 12]. Each packet consists of r requirements. At any fixed time t [ 0 , ) , the QS is in one of a number of incompatible states C k - where k is the number of requirements in the QS, including undeserved requirements. The service time of each requirement is subject to an exponential distribution with a parameter . If the system is in the C k state, then one of the k requirements leaves the system with the k intensity. It is required to find the mathematical expectation of the M ( t ) state number, in which the system is located when servicing the re- quirements and the corresponding variance D( t ) , provided that M ( 0 ) n , D( 0 ) 0 . Figure 2 shows a graph-scheme of QS states that allows us to better understand the relationship between the formulation of the model and its formalization by a system of differential equations. μ μ μ μ μ c0 c1 c2 cr cr1 cj c j r c j r 1 λ 2λ rλ (r 1)λ jλ ( j r)λ ( j r 1)λ Fig. 2. Graph-scheme of QS states, for the scalable CS operation model. Pk ( t ) denotes the probability that at the instant t the QS is in the state C k , k = 0,1,… The system of differential equations has the form: P' ( t ) P ( t ) P ( t ), 0 0 1 ' Pk ( t ) ( k )Pk ( t ) ( k 1 )Pk 1( t ), 0 k r ,(*) (1) ' Pk ( t ) ( k )Pk ( t ) Pk r ( t ) ( k 1 )Pk 1( t ), k r . To find the mathematical expectation and variance, we apply the method of generat- ing functions. We transform the system of equations (1) so that in it the middle equa- tion (*), with sliding parameter k, would be obtained from the last equation, at 0