=Paper= {{Paper |id=Vol-2522/paper3 |storemode=property |title=Timeliness of the Reserved Maintenance by Duplicated Computers of Heterogeneous Delay-Critical Stream |pdfUrl=https://ceur-ws.org/Vol-2522/paper3.pdf |volume=Vol-2522 |authors=Vladimir A. Bogatyrev,Stanislav V. Bogatyrev,Aleksey N. Derkach }} ==Timeliness of the Reserved Maintenance by Duplicated Computers of Heterogeneous Delay-Critical Stream== https://ceur-ws.org/Vol-2522/paper3.pdf
                                                  26


     Timeliness of the Reserved Maintenance by Duplicated
      Computers of Heterogeneous Delay-Critical Stream*

    Vladimir A. Bogatyrev1[0000-0003-0213-0223], Stanislav V. Bogatyrev2[0000-0003-0836-8515] and
                          Aleksey N. Derkach1[0000-0002-0108-319X]
    1 Saint-Petersburg National Research University of Information Technologies, Mechanics and

          Optics, Kronverksky prospect, 49, Saint Petersburg, 197101, Russian Federation
       2 NEO St. Petersburg Competency Center, Saint-Petersburg 194214, Russian Federation

                             vladimir.bogatyrev@gmail.com



          Abstract. The research of ways to improve the functional reliability and timeli-
          ness of service requests in duplicate computer systems was conducted. Timeli-
          ness increases as a result of the reserved maintenance of delay-critical requests
          in queues. The timeliness of the calculations is determined by the probability that
          the delay in waiting for service requests do not exceed the established maximum
          permissible time. The effectiveness of the reserved execution of delay-critical
          requests during load balancing as a result of the distribution of non-reserved re-
          quests among workable computers is shown.

          Keywords: timeliness, cluster, reserved service, heterogeneous stream.


1         Introduction

High reliability and fault tolerance [1-3] are supported during clustering and virtualiza-
tion in real-time computer systems that do not allow breaks in the computational pro-
cess. In such systems, the migration of virtual machines (VM) between physical nodes
of the cluster is required in order to reconfigure to adapt the system to the accumulation
of failures [4-6]. The migration of virtual resources and computational processes in
real-time systems can be used while maintaining the continuity of the computational
process after failures of physical nodes [7-8]. For fault tolerance, the system supports
at least two copies of the VM in the memory of different physical computers. VM vir-
tual disk images are stored on dedicated or distributed data storage with synchronous
data replication. The backup computer must support an up-to-date copy of the RAM
[7-9] of the active VM. Recovery time after failures depends on the structure and
amount of data storage, which can be shared or local to each computer. When organiz-
ing a VM migration through a multi-tier network [10- 12], it is necessary to take into
account network delays [13], including multi-path data transmission and reserved trans-
fers through aggregated channels [14-15].


*     Copyright 2019 for this paper by its authors. Use permitted under Creative Commons License
      Attribution 4.0 International (CC BY 4.0).
                                             27


   Models and organization of embedded redundant computer systems are considered
in [16]. Markov models of reliability of cluster systems with VM migration that was
proposed in [17] make it possible to determine stationary and non-stationary availabil-
ity factors of the system characterizing cluster structural reliability. Markov duplicate
computer systems, taking into account the influence of the control system on the read-
iness and safety of systems, are proposed in [18].
   For real-time systems, the functional reliability of a cluster is determined not only
by the probability of its readiness to perform the required functions but also by the
probability of timely execution of delay-critical queries [18].
   In [18], it was shown that redundant servicing in cluster systems potentially allows
increasing the probability of timely servicing of delay-critical requests while reducing
the average waiting time. The backup service of delay-critical requests is considered
successful if at least one of the created copies of the request is executed in a timely
manner. The redundant service of requests leads to a technical contradiction. On the
one hand, it leads to an increase in load, which means additional expectations of re-
quests` copies in queues. On the other hand, to the potential possibility that a certain
copy can be served much faster than the others.
   The purpose of the research is to study the possibilities of increasing the functional
reliability and timeliness of servicing duplicated computer systems as a result of redun-
dant maintenance of delay-critical requests in queues.
   The object of the study - Fault tolerance duplicate complex with a cluster architec-
ture. The cluster contains two computers with local storage devices. Servers are con-
nected through a switch. The data storage system is represented as local storage for
each physical node as a hard disk. Synchronous data replication is maintained between
local repositories to ensure fault tolerance [16].


2      Evaluation of the Requests` Timeliness.

The timeliness of the calculations is determined by the probability that the delay in
waiting for service requests do not exceed the established maximum permissible time
t. Timeliness characterizes the functional reliability of the system.
    We assume that the request stream is heterogeneous and contains requests of differ-
ent criticality to the execution time.
    For the two-machine cluster in question, inoperable and workable states are possible,
in which the input request stream can be executed on two or one of the computers. Each
computer node is represented as a single-channel queueing system with an infinite
queue [19, 20].
    Let us single out the requests of two grades of criticality to the waiting time t1 and t2
(t1 < t2), arriving with intensities Λ, Λβ and executing for the same average time v.
    In the case of readiness (operability) of one computer out of two and the lack of
priorities in service, the probability of fulfilling requests with waiting for criticality
(maximum allowable waiting time) t1 and t2 (t1 < t2) is calculated respectively as:
                                                28


                                                                    1
                                                          1    t1
                        p1 (t1 )  1  (1   )ve                  v
                                                                            ,
                                                                    1
                                                          1    t2
                        p2 (t2 )  1  (1   )ve                  v
                                                                            .
With the operability (readiness) of two computers, there are possible options for organ-
izing the maintenance of a heterogeneous stream of requests, among which we consider
options without and with redundant maintenance of critical requests.
   For the case of non-redundant service, consider the options:
   А1: Stream maintenance is divided by computers — the first computer performs re-
quests for t1 criticality, and the second computer performs t 2 criticality.
   А2: Stream maintenance is not divided among computers - any request can with a
certain probability be sent both to the queue of the first and second computers (the
probabilities of sending requests to different computers for different threads may dif-
fer).
   А3: Stream maintenance of criticality to t2 delays is performed exclusively by the
second computer, and criticality to delays t1 is divided between two computers with
probability g.
   For the maintenance organization option B 1, the probability of fulfilling requests
with a waiting criticality (maximum allowable waiting time) t1 and t2 (t1 < t2) is calcu-
lated respectively as:
                                                         1
                                                        t1
                               p1 (t1 )  1  ve        v
                                                                 ,
                                                           1
                                                         t2
                             p2 (t2 )  1   ve          v
                                                                     .
The probability that requests for criticality to the delays of both t1 and t2 will be fulfilled
in a timely manner will determine how:

                              p12 (t1 , t2 )  p1 (t1 ) p2 (t2 ).                          (1)

For the organization of the computational process with duplication of requests of the
first stream and with the execution of copies on different computers, select the follow-
ing options:
    B1: the second stream is completely directed to the second computer for mainte-
nance.
    В2: the second stream with probabilities g and (1-g) is sent to the second or first
computers, respectively.
    For option B1, the probability of timely execution of a copy of requests of the first
most critical waiting stream during time t 1 in the first and second computers is defined
respectively as:
                                                   29


                                                             1
                                                            t1
                             p11 (t1 )  1  ve             v
                                                                     ,
                                                                           1
                                                                 1    t1
                      p12 (t1 )  1   1    ve                        v
                                                                                   ,

and the probability of timely execution of at least one of the two copies of the request
during t1 is as:
                                               1                                   1
                                              t1                       1    t1
                 p1 (t1 )  1   v  e                1    e
                                    2          v                                   v
                                                                                             .

For the second request stream, the probability of timely execution of requests in a time
not exceeding t2 is calculated as:
                                                                           1
                                                                 1    t2
                      p12 (t1 )  1   1    ve                        v
                                                                                   .

The probability of timely execution of requests for both streams is determined by the
formula (1).
   For option В2, the probability of timely execution of a request copies for the first
stream during time t1 in the first and second computers is defined respectively as:
                                                                             1
                                                                 1  g   t1
                     p11 (t1 )  1   1   g  ve                         v
                                                                                       ,

                                                                                   1
                                                                   1   g    t1
                 p12 (t1 )  1   1    g   ve                               v
                                                                                            ,

the probability of the timely execution of at least one copy of the request for the first
stream during time t1:
                                                      1                                                    1
                                          1  g   t1                                  1   g    t1
   p1 (t1 )  1   v  1   g  e                         1    g   e
                       2                              v                                                    v
                                                                                                                     ,

The probability of timely execution of requests of the second stream in a timeless than
t2 is calculated as:
                                                                                   1
                                                                   1   g    t2
                 p12 (t1 )  1   1    g   ve                               v
                                                                                            .

The probability of timely execution of requests for the first and second threads is deter-
mined by the formula (1).
                                                 30


3      Evaluation of Structural Reliability

A Markov model of a duplicated cluster is constructed in accordance with [16] to eval-
uate the structural reliability determined by the operational readiness ratio.
   The disciplines of maintenance for a duplicated computer system were investigated
in [16]:

 D1 – operational recovery, which begins immediately after the failure;
 D2 – a recovery that begins after the system has passed into an inoperative state or
  a condition of a certain level of degradation.

The diagram of states and transitions for recovery discipline D 2 is shown in fig. 1. In
the diagram, the failure intensity is denoted by λ0 and restoration by μ0 of the server; of
the disk by λ1, μ1; of the commutator by λ2, μ2.
   In the diagram, two lines cross out inefficient nodes, and workable ones that are not
involved in the computation process by one line, the inoperative states of the system
are darkened.
   The system of differential equations in accordance with the diagram of states and
transitions in Fig. 1 has the form:

 P0` (t )  (20  2  21 ) P0 (t )  40 P4 (t )  50 P5 (t )  60 P6 (t )  70 P7 (t )  80 P8 (t ),
 `
 P1 (t )  (1  0 ) P1 (t )  20 P0 (t ),
 P ` (t )  (   ) P (t )   P (t ),
 2               1      0   2          2 0

 P3 (t )  (1  0 ) P3 (t )  21P0 (t ),
      `

 `
  P4 (t )   40 P4 (t )  1 P1 (t )  0 P3 (t ),
  `
  P5 (t )   50 P5 (t )  0 P1 (t ),
  P ` (t )    P (t )   P (t ),
  6             60 6          1 2

  P7 (t )   70 P7 (t )  0 P2 (t ),
      `

  `
  P8 (t )   80 P8 (t )  1 P3 (t ),
Wherein
                                                                    1
                             40   1   1   1  
                                        5     0     1

                                                               1
                               50   1   2  
                                            5         0   
                                            31

                                                                    1
                         60   1   1   1  
                                    5     1     2

                                                                        1
                         70   1   1   1  
                                     5          0              2   

                               80   1   2  
                                           5          1   
                                                           1
                              5   1   1  
                                        3     4


Where μ5 data recovery intensity, while μ3 - load intensity of the actual data replica, μ 4
- the intensity of work associated with the launch of the VM and user applications on
the backup server.
   If we determine the probabilities of states by solving the presented system of equa-
tions, then we can determine the availability factor as the sum of working states:
                                                 3
                                          k   Pi .
                                                i 0
   The probability of timely execution of requests R, taking into account the probability
of requests when one or two computers are ready:
                                                           3
                                    R  p2 P0  p1  Pi ,
                                                       i 1
   where p1 and p2 are the probabilities of timely maintenance of requests when one
and two computers are ready, taking into account the above organization of maintaining
the request stream.
                                                    32


                                                                        C0

                       μ40                                    ВМ
                                              2λ0
                                                          В
                                                          М
                                                                   λ2
                                                                                                               μ90
                                                                                          2λ1
                             C1

                                  ВМ
                                        μ50
                                                                                                          C3
                                                          C2
                                                                                    μ80                          ВМ
                                                    ВМ



                  λ1         λ0         μ60
                                                                             μ70                     λ1
                                                     λ1             λ0                                                λ0


           C4                C5          C6
                                                                               C7               C8                         C9




                 Fig. 1. Markov diagram of states and transitions of a cluster


4      The example of Calculating the Probability of Timely
       Execution of Requests.

Let us determine the probability of the timeliness of the redundant service with a het-
erogeneity of the input stream, which includes requests for two grades of criticality for
the waiting time t1 = 0.2 s, t1 = 0.4 s for the same query execution time v = 0.1 s.
   The probability`s dependence of requests timely execution of different criticality on
the intensity of the requests stream Λ is shown in Fig. 3. In Fig. 3 at β = 0.5, 0.8, 1.5,
the curves 1-3 correspond to the variant of maintenance organization streams without
redundant maintenances of the first requests stream А1, and the curves 4-6 with redun-
dant maintenances according to option В2.
   Calculations show the expediency of requests' redundant maintenance that is most
critical to the expectation, however, this effect is lost as the intensity of input streams
Λ increases.
                                               33




 Fig. 2. Dependence of the probability of requests’ timely execution on the intensity of the in-
             put stream with redundant and non-redundant maintenance requests

For the disciplines of redundant maintenance of the first stream B1 and В2 there are in
Fig. 4 the dependence of the probability of timely servicing the requests of two streams
on the intensity of requests Λ and the requests` shares g and (1-g) of the second stream
which are sent in discipline В2 to the queue of the first or second computers. In fig. 2
the curves 1-3 for discipline B1 correspond to β = 0.5, 0.8, 1.5, and the curves 4-6 for
discipline В2 with g = 0.8, correspond to values β = 0.5, 0.8, 1.5. The presented de-
pendencies show the importance of the stream control`s effect (change g) on the time-
liness of maintaining requests for the total requests` stream of different criticality to
delays. The presented dependencies show the importance of the influence of the control
of requests` streams, which have different criticality to delays (change g), on the time-
liness of service of requests of the total stream.
                                             34




Fig. 3. Probability`s dependence of timely maintenance requests of two threads with redundant
                                     service for option B2

In Fig. 5 there is the probability`s dependence of timely servicing of requests for two
streams with redundant services for option В2 on the fraction g, requests of the second
stream which sent to the second computer. In the figure at β = 1.5, the curves 1–3, and
at β = 1, the curves 4–6 correspond to the intensity of the input stream Λ = 3, 3.5, 4 1/s.
                                               35




 Fig. 4. The probability`s dependence of timely servicing of requests for two streams with re-
dundant services for option В2 on the fraction g, requests of the second stream which sent to the
                                        second computer


5      Conclusions

For duplicate cluster real-time systems, the options for organizing the maintenance of
a heterogeneous requests` stream that has different criticalities to their delays in queues
are proposed.
   The influence`s materiality of the organization of maintaining requests with varying
criticality to delays in the computer nodes` queue, that have remained operable, on
functional reliability and the probability of timely execution of requests for a heteroge-
neous stream is shown.
   The effectiveness of the redundant execution of delay-critical requests during load
balancing as a result of the distribution of other requests among workable computers is
shown.


References
 1. Kopetz, H.: Real-Time Systems: Design Principles for Distributed Embedded Applications.
    2nd edn. Springer, Germany (2011). DOI: 10.1007/978-1-4419-8237-7
 2. Utkin, L.V., Coolen, F.P.A., Robust weighted SVR-based software reliability growth model.
                                             36


    Reliability Engineering & System Safety 176, 93-101 (2018), DOI:
    10.1016/j.ress.2018.04.007.
 3. Sorin, D.: Fault Tolerant Computer Architecture. Morgan & Claypool, USA (2009), 104 p.
    DOI: 10.2200/S00192ED1V01Y200904CAC005.
 4. Sahni, S., Varma, V.: A hybrid approach to live migration of virtual machines. Proc. IEEE
    Int. Conf. on Cloud Computing for Emerging Markets (CCEM 2012), 12–16, Bengalore,
    India (2012), DOI: 10.1109/CCEM.2012.6354587.
 5. Knowledge sharing portal UNIX/Linux-systems, open-source systems, networks, and other
    related things. Available at: < http://xgu.ru/wiki/Kemari > Last accessed 2019/03/25.
 6. Dittner, R., Rule, D.: The Best Damn Server Virtualization Book Period. 2nd edn. Syngress,
    USA (2011).
 7. Chandak, A., Jaju, K., Kanfade, A.: Dynamic Load Balancing of Virtual Machines using
    QEMU-KVM. International Journal of Computer Applications (0975 – 8887) 6(46), 10-14
    (2012).
 8. Adamova, K.: Anomaly Detection with Virtual Service Migration in Cloud Infrastructures.
    Master Thesis 263-0800-00L (2012).
 9. Hu, L., Zhao, J., Xu G., Ding Y. HMDC: Live Virtual Machine Migration Based on Hybrid
    Memory Copy and Delta Compression, Appl. Math. Inf. Sci. 7, No. 2L, 639-646, China.
    (2013), DOI: 10.12785/amis/072L38.
10. Poymanova, E.D., Tatarnikova, T.M. Models and Methods for Studying Network Traf-
    fic. 2018 Wave Electronics and its Application in Information and Telecommunication Sys-
    tems (WECONF) (26-30 Nov. 2018), DOI: 10.1109/WECONF.2018.8604470.
11. Kutuzov, O.I., Tatarnikova, T.M. On the Acceleration of Simulation Modeling. XXI Inter-
    national Conference on Soft Computing and Measurements (SCM'2018) (23-25 May 2018)
12. Korobeynikov, A.G., Fedosovsky, M.E., Zharinov, I.O., Shukalov, A.V., Gurjanov, A.V.
    Development of conceptual modeling method to solve the tasks of computer-aided design
    of difficult technical complexes on the basis of category theory. International Journal of
    Applied Engineering Research 6(12), 1114-1122 (2017).
13. Zhmylev, S., Martynchuk, I.G., Kireev, V.I., Aliev, T. Analytical methods of nonstationary
    processes modeling. CEUR Workshop Proceedings, IET - 2019, Vol. 2344 (2019).
14. Bogatyrev, A.V., Bogatyrev, S.V., Bogatyrev, V.A. Analysis of the Timeliness of Redun-
    dant Service in the System of the Parallel-Series Connection of Nodes with Unlimited
    Queues. 2018 Wave Electronics and its Application in Information and Telecommunication
    Systems (WECONF), pp. 1-4 (2018), DOI: 10.1109/WECONF.2018.8604379.
15. Bogatyrev, V.A On interconnection control in redundancy of local network buses with lim-
    ited availability. Engineering Simulation, 16 (4), pp. 463-469 (1999).
16. Bogatyrev, V.A., Aleksankov, S.M., Derkach, A.N. Model of Cluster Reliability with Mi-
    gration of Virtual Machines and Restoration on Certain Level of System Degradation. 2018
    Wave Electronics and its Application in Information and Telecommunication Systems
    (WECONF), pp. 1-4 (2018), DOI: 10.1109/WECONF.2018.8604317.
17. Ageev, A.M.: Configuring of Excessive Onboard Equipment Sets. Journal of Computer and
    Systems Sciences International 4 (57), 640-654 (2018). DOI: 10.1134/S1064230718040020
18. Bogatyrev, V.A., Bogatyrev, A.V. Functional Reliability of a Real-Time Redundant Com-
    putational Process in Cluster Architecture Systems. Automatic Control and Computer Sci-
    ences, Vol. 49 No. 1, pp. 46-56 (2015), DOI: 10.3103/S0146411615010022.
19. Kleinrock, L. Queueing Systems, Volume I: Theory. Wiley Interscience, USA. p. 417
    (1975), DOI: 10.1002/net.3230060210.
20. Kleinrock, L. Queueing Systems, Volume II: Computer Applications. Wiley Interscience,
    USA. p. 576 (1976), doi: 10.1002/net.3230070308.