=Paper= {{Paper |id=Vol-2122/paper-7 |storemode=property |title=Markov Model of FPGA Resources as a Service Considering Hardware Failures |pdfUrl=https://ceur-ws.org/Vol-2122/paper_143.pdf |volume=Vol-2122 |authors=Inna Kolesnyk,Vitaliy Kulanov,Artem Perepelitsyn |dblpUrl=https://dblp.org/rec/conf/icteri/KolesnykKP18 }} ==Markov Model of FPGA Resources as a Service Considering Hardware Failures== https://ceur-ws.org/Vol-2122/paper_143.pdf
       Markov Model of FPGA Resources as a Service
             Considering Hardware Failures

                Inna Kolesnyk1, Vitaliy Kulanov2, Artem Perepelitsyn3

      National Aerospace University “KhAI”, Chkalov str. 17, 61070 Kharkov, Ukraine.
                          1
                              i.zarizenko@csn.khai.edu
                            2
                               v.kulanov@csn.khai.edu
                        3
                          a.perepelitsyn@csn.khai.edu



       Abstract. The FaaS architecture is analyzed. Based on information on the struc-
       ture and principles of the architecture FaaS the structural reliability diagram is
       developed. Markov model for structural reliability diagram considering possible
       hardware failures is presented. The expert evaluation of intensities of failure
       and maintenance is proposed. The reliability evaluation of FaaS based on ob-
       tained results is performed.


       Keywords: Markov model, Programmable Logic, FPGA, FaaS, Hardware Fail-
       ures, Computer System


1      Introduction

   The popularity of cloud services as well as the current level of technology devel-
opment makes it possible to provide resources of programmable logic integrated cir-
cuits to an end user via the Internet. In the work [1], a solution was proposed to use
FPGA as a Service (FaaS). The proposed architecture can be recommended for tasks
focused on intensive information processing when the requirements for the input and
output data streams are not so strict. A resource intensive computational task was
performed to test the proposed approach and it consisted of searching polynomials for
shift registers with nonlinear feedback of the second degree by the "brute force"
method [2]. This has saved time and resources for obtaining final results.
   One of the main requirements for cloud services is their high availability achieved
by reducing and managing failures as well as minimizing planned downtime time.
Classical models of the reliability of data storage, built on the basis of Markov chains
in continuous time, the models are considered in a number of works [3, 4]. Markov
models retain a significant advantage in productivity and speed of calculations (up to
150 times, see [5]) in comparison with full-scale imitation modeling.
   Disadvantages of classical models with one-dimensional Markov chains without
memory are described in detail in a widely known paper [6]. In [7, 8], the Markov
model of the availability of SBC based on the analysis of hardware and software fail-
ures was developed and investigated.



                                                 56
   Thus, the goal of the study is to improve the systems reliability implementing FaaS
with the provision of FPGA resources for user tasks. To achieve this goal it is neces-
sary to solve the problem of formalizing actions order to evaluate the reliability of
these systems based on Markov model, as well as the practical application of the pro-
posed method.


2      Assessment of Reliability of FaaS based on Markov Model,
       Considering Possible Hardware Failures

   The FaaS architecture is a complex multi-level and multi-component hardware and
software system. The FaaS infrastructure components are conditionally divided into
two parts: client and server (Figure 1). The model of server part with FPGA boards
will be considered.


                                                                                      Server-side
                                                       Binary/HEX
                                                        Data Files
                                                                                Data
                                     FaaS Tasks
                                                                             Distributor
                                     WEB Server
                                                                            and Collector
                                                       Task Status/
                                                       Results Data   and Data Sets
                                                                       Commands




                                                                                              Produced Data
                                                                                                Status and
           Internet



                                     JTAG Server           SOFs

                                                                                      FPGA Board



                          Fig. 1. FPGA architecture as a service

   The FaaS architecture assumes the usage of multiple components in the server part.
The use of redundancy is advisable for such systems, because their individual nodes
efficiency determines the entire system operability. In case of failure of one system
component (the reservation backup of which is provided), the system remains opera-
tional. Such degradation changes in the system affect the probability of safe operation
at any given time.
   The main functional elements of the system are the server part components and
they are subject to failure which may also be caused by hardware defects. Failure of
system elements is a random and independent event.
   The Markov model is a convenient tool for describing the processes of system
components failure and recovery with the described properties. Since the basic com-
ponents of FaaS are a priori known, it is possible to generalize the process of evaluat-
ing the reliability indicators these systems using Markov models.




                                              57
  To evaluate the reliability of the FaaS architecture hardware, the following se-
quence of actions is suggested:

 identify the set of hardware components that make up FaaS;
 determine the availability and type of reservation in the architecture in question;
 build a structural scheme of reliability for a set of components;
 determine the failure rates and recovery for each system component;
 considering the CLS, construct the system state graph;
 using Markov models mathematical apparatus numerically determine the system
  reliability indicators.
   To build the FPGA infrastructure as a service, we will simulate the components of
the server part using Markov chains.
   We analyze the redundant computing system FaaS infrastructure designed to per-
form service functions by a client request, giving it access to certain resources and
managing programmable logic integrated circuits (FPGAs) that can be programmed
by the user request via the Internet.
   The FaaS includes combination of programmable logic and classical computer
components that required to organize the service itself.
   The system's performance in the above case is presented in accordance with the
structural reliability scheme (Figure 2).




        Fig. 2. Structural diagram of reliability the FPGA system as a service

   The block diagram used to describe the system reliability is a combination of serial
and parallel connections. All components of the system operate simultaneously. Fail-
ure of a main system element does not affect the system's operability according to
system failure definition. It remains in working order, since the back-up elements
provide the functionality. The main elements are: processor (CPU), random access
memory (RAM), permanent storage (ROM), which must ensure the system function-
ing. Each block in the chain is duplicated.
   In these FaaS architecture each FPGA chip implements unique functions provided
by service user. To organize the function properly of this computer system, all FPGAs
involved in the structural diagram are connected in series.
   If the user wants to implement on-chip redundancy he can program FPGA using
fault tolerant approach, but this case will be considered separately.




                                            58
3      Development of the Markov model FPGA as a service
   During the process of using the Markov analysis apparatus, following computa-
tional difficulties can arise: the growth is simple, the sparseness of the matrix of the
intensities of the transitions between the states of the Markov model (MM) and its
rigidity. Since one of the main process requirements is assessing the reliability of the
aircraft and ensuring high accuracy of the results, it is necessary to consider each
feature of the Markov analysis apparatus at all stages of the aircraft readiness assess-
ment. Our task is to minimize the probability of its occurrence, and in case of occur-
rence, identify it in a timely manner and take measures to prevent the elimination of
consequences.
   Consider the computer system FPGA as a service as a redundant system with par-
allel connection of backup system equipment. In this scheme, all elements of backup
equipment samples have different failure rates. To this variant of reservation, the rule
to determine the reliability of parallel independent elements is applicable.
   To evaluate the reliability of recoverable objects, the differential equations method
is applied. It is based on the assumption of exponential time distributions between
failures (operating time) and recovery time.
   To apply this method, we need to have a mathematical model for the set of possible
states of the system S = {S1, S2, ..., Sn}, in which it can be located in the event of
system failures and failures.
   From time to time, the system S jumps from one state to another under the influ-
ence of failures and restoration of its individual elements. When analyzing the behav-
ior of the system in time during wear, it is convenient to use a state graph on the basis
of which we obtain a system of equations. We will illustrate the graph of the state
reflecting the dynamics of the system.
   The dynamics of the system can be reflected by changing the states of the ele-
ments. Each of the elements can be in one of three states:
   1 – mode of operation;
   2 – mode of the main element failure;
   3 – mode of the backup element failure.
   Then the set of states of the system has the form:
   S0 – it functions;
   S1 – Element CPU1 has failed, the system operates in standby mode;
   S2 – failed element RAM 1, the system operates in standby mode;
   S3 – ROM 1 failed, the system operates in standby mode;
   S4 – FPGA1..4 failed, the system operates in standby mode;
   S5 – the elements CPU1 and RAM1 failed, the system operates in standby mode;
   S6 – the elements CPU1 and ROM1 failed, the system operates in standby mode;
   S7 – the elements of CPU1 and FPGA1,4 failed, the system operates in standby
         mode;
   S8 – elements of RAM1 and ROM1 failed, the system operates in standby mode;
   S9 – the elements of RAM1 and FPGA1,4 failed, the system operates in standby
         mode;



                                             59
   S10 – the elements of ROM1 and FPGA1,4 failed, the system operates in standby
        mode;
   S11 – the elements CPU1, RAM1 and ROM1 failed, the system operates in the
        standby mode;
   S12 – elements of CPU1, RAM1 and FPGA1,4 failed, the system operates in
        standby mode;
   S13 – the elements CPU1, ROM1 and FPGA1.4 failed, the system operates in
        standby mode;
   S14 – elements of RAM1, ROM1 and FPGA1,4 failed, the system operates in
        standby mode;
   S15 – the elements CPU1, RAM1, ROM1 and FPGA1.4 failed, the system operates
        in a redundant mode;
   S16 – the system is inoperable.
   The illustrated graph of the state reflecting the dynamics of the system is provided
in figure 3.
   Based of result of solving the system of differential equations in the computer
mathematics system Mathcad the curve of failure-free state probability of the pro-
posed FaaS architecture was created. The assessment result are shown in figure 4.


                                         λ
                                             μ                                                       λ
                 λ
                     μ
                                                                 λ
                                                                     μ
                                                                                     λ
                                                                                                                                         λ
                                 λ                                                                                               μ
                     λ               μ                                                                                                           μ
                             μ                                                                                   μ
                                                                                                             λ
                                                                                                                             λ
λ                                                            λ
                                                                         μ               μ
                                                                                     λ

                                 λ       μ
                                                                                                             μ
                                                                                                                         λ
                                                                                                         λ




                                                                                 μ                                                   μ
                                                                         λ
                                                 λ
                                                                                                 λ
                         λ                               λ                                                           λ                       λ
                                                                                         λ
       λ

                         λ




                                                     λ                       λ               λ
                             λ




              Fig. 3. The Markov graph, which is part of the model for states




                                                                                 60
        Fig. 4. Diagram of failure-free state probability for proposed FaaS architecture


4      Conclusion

   In this paper following results were achieved: a method for assessing the FaaS reli-
ability based on the Markov model, in a view of possible hardware failures, consider-
ing the order of finding failure rates and restoring parts of the system, and practical
implementation of the proposed method for a specific implementation of FaaS.
   Within the framework of practical implementation, a computer system was simu-
lated on the basis of continuous Markov chains. When assessing the reliability of
complex redundant and recoverable systems, the Markov chain method leads to com-
plex solutions because of the large number of states.
   Based on the marked state graph of the system, i.e. graph of transitions, in which
the intensities of all transitions are known, it is possible to determine the probabilities
of these states as a function of time.
   Based on the graph, the probability of the system state was determined for various
parameters λ and μ. Based on obtained results the reliability evaluation of FaaS was
performed.


References
1.  Perepelitsyn A.E., Kulanov V.O., Kolesnyk I.N.: Providing of FPGA Resources as a Ser-
   vice: Technologies, Deployment and Case-Study. In: Proceedings of the PhD Symposium at
   13th International Conference on ICT in Education, Research, and Industrial Applications
   co-located with 13th International Conference on ICT in Education, Research, and Industrial
   Applications, Kiev, pp. 63-68 (2017)
2. Poluyanenko, N.: Development of the search method for non-linear shift registers using
   hardware, implemented on field programmable gate arrays. In: EUREKA: Physics and En-
   gineering, pp. 53-60 (2017). doi: 10.21303/2461-4262.2017.00271




                                                 61
 3.   Reibman A. and Trivedi K. S. A.: Transient Analiysis of Comulative Measures of Markov
    Model Behavior. In: Communications in Statistics-Stochastic Models. pp. 6683-710 (1989)
 4. Malhotra M., Trivedi K.S.: Reliability Analysis of Redundant Arrays of Inexpensive
    Disks. In: Jornal of Parallel and Distributed Computing – Special issue on parallel I/O Sys-
    tem. pp. 146-151 (1993)
 5. Karmakar P., Gopinath K.: Are Markov Models Effective for Storage Reliability Model-
    ing? In: Arxiv: 1503.07931v1, (2015)
 6. Greenan K. M., Plank J. S., Wylie J. J.: Mean Time To Meaningless: MTTDL, Markov
    models, and Storage System Reliability. In: Proceedings of the 2 nd USENIX conference on
    Hot topics in storage and file systems. pp. 1-5 (2010)
 7. Kharchenko V., Kolisnyk M., Piskachova I., Bardis N.: Markov Model of the Smart Busi-
    ness Center Wired Network Considering Attacks on Software and Hardware Components.
    In: International journal of computers and communications ISSN: 2074-1294, Volume 10.
    pp. 113-119 (2016)
 8. Kharchenko V., Kolisnyk M., Piskachova I., Bardis N.: Reliability and Security Issues
    for IoT-Based Smart Business Center: Architecture and Markov Model. In: IEEE;
    Computer of science, MCSI 2016, Paper ID: 4564699 (2016)
 9. Kharchenko V.S., Cherepakhin D.A.: Risk Analysis of Control Systems by Use of QD-
    diagrams and FMECA-approach. In: Proceedings of 12 th European Conference on Safety
    and Reliability, Turin, Italy, pp. 16 - 20 (2001).
10. Gorbenko A.V., Kharchenko V.S.: Application of FMEA - technology with reliability and
    safety of computer networks for critical applications. In: «EUREKA: Physics and Engineer-
    ing, pp. 53-60 (2017). doi: 10.21303/2461-4262.2017.00271
11. Kilts S.: Advanced FPGA Design. Architecture, Implementation and Optimization. In: The
    Institute of Electrical and Electronics Engineers, Inc., New York. (2007)
12. Fedukhin A.B., Mukha A.A., Mukha A.A.: FPGA systems as a means of increasing fault
    tolerance. In: Mathematical machines and systems. pp.198-204 (2010)
13. Hahanov V.I.: Infrastructure Diagnostic Service SoC. In: tutorial of Omsk State Universi-
    ty. pp. 74-101 (2008)
14. Yacoub, S.M., Cukic, B., Ammar, H.H.: A scenario-based reliability analysis approach for
    component-based software. IEEE Transactions on Reliability 53(4), pp. 465–480 (2004)
15. Andryukhin A.I.: Switching modeling and diagnosis of the main fault models of CMOS
    structures. In: DonNTU. pp. 54-65. (2011)
16. E.J., Kon, Kulagina M.M.: Reliability and diagnostics of components of
    infocommunication and information management systems. In: tutorial of Perm State Tech-
    nical University. pp. 167- 179 (2011)
17. Vilkomir, S.A., Parnas, D.L., Mendiratta, V.B., Murphy, E.: Availability evaluation of
    hardware/software systems with several recovery procedures. In: Proc. 29th Int. Computer
    Software and Applications Conference (COMPSAC 2005), pp. 473–478. IEEE Computer
    Society Press, Los Alamitos (2005)
18. Kappler, T., Koziolek, H., Krogmann, K., Reussner, R.: Towards Automatic Construction
    of Reusable Prediction Models for Component-Based Performance Engineering. In: Proc.
    Software Engineering 2008 (SE 2008). LNI, vol. 121, February 2008. pp. 140–154. GI
    (2008)
19. Martin L. Shooman: Reliability of Computer Systems and Networks: Fault Tolerance,
    Analysis, and Design. pp.446-449 (2002)




                                                  62