Designing Risk Resilient Networked High-Load Computing Web-Systems for Information Flow Processing Nadiia Pasieka1, Zora Říhová2, Marta Vohnoutová2, Nelly Lysenko1, Oleksandra Lysenko1, Vasyl Sheketa3, Mykola Pasyeka3 and Nataliia Kulchytska1 1 Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, 76000, Ukraine 2 University of South Bohemia, Branišovská 1645/31a, České Budějovice, 370 05, Czechia 3 National Tech. University of Oil & Gas, Ivano-Frankivsk, 76068, Ukraine Abstract Improved models and algorithms for the architecture of a high-loaded risk resilient Web-system, whose main differences are the possibility of aggregation and sharing of large sets of heterogeneous computing resources to process information data distributed between geographically separated territories. The proposed model and algorithms allow the efficient and secure use of additional network resources connected to the functional network, in contrast to traditional approaches, when these resources are not available within a single computing node on an independent computing platform. Subsequently, innovative approaches have been developed to build high-load risk resilient distributed cluster software systems, which provides a significant increase in the total amount of effective processing of information flows of the node communication system as a whole. Therefore, the use of this approach is appropriate for distributed risk-tolerant software systems where rapid loss of information flows is highly undesirable. Next, algorithms were developed for automated load management of independent computer platforms of information data flows for efficient scaling (clustering) of risk-resistant software systems. Keywords 1 Risky software systems, task distribution, algorithms, architecture, Web-systems. 1. Introduction Balancing the load on computing nodes provides an even load of hardware and software systems on independent computing platforms. A risk-resistant software system that balances the computational load must automatically decide on which node to perform the affective calculations associated with the new information task. So, the main task of balancing is the process of effective support of the process of transferring (migration) part or the whole calculation from the most loaded computing CITRisk’2021: 2nd International Workshop on Computational & Information Technologies for Risk-Informed Systems, September 16–17, 2021, Kherson, Ukraine EMAIL: pasyekanm@gmail.com (N.Pasieka); zora.rihova@trilogic.cz (Z.Říhová); marta.vohnoutova@gmail.com (M.Vohnoutová); nelli.lysenko@gmail.com (N.Lysenko); lysenkowa@gmail.com (O.Lysenko); vasylsheketa@gmail.com (V.Sheketa); pms.mykola@gmail.com (M.Pasieka); nataliia.kulchytska@pnu.edu.ua (N.Kulchytska) ORCID: 000-0002-4824-2370 (N.Pasieka); 0000-0003-3896-4297 (Z.Říhová); 0000-0002-8915-8626 (M.Vohnoutová); 0000−0002−1029−7843 (N.Lysenko); 0000−0002−1029−7843 (O.Lysenko); 0000−0002−1318−4895 (V.Sheketa); 0000−0002−3058−6650 (M.Pasieka); 0000-0001-9308-6840 (N.Kulchytska) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) nodes to the less loaded ones, input factor loading i of i = 1, ..., r; uj – output parameter weighting, output parameter weighting j of j = 1, ..., s. When solving the problem of risky maximization of an efficiency criterion, the actual problem of having a share in the distribution of two linear aggregate quantities arises [46]. Also, the problem of maximizing the efficiency of risk-sustainable software Web-systems is called linear particle programming. At the same time, there are many possibilities for transforming linear particle programming into a linear programming problem. ∑𝑟𝑟𝑖𝑖=1 𝑣𝑣𝑖𝑖 𝑥𝑥𝑖𝑖 𝑓𝑓0 = 𝑠𝑠 → 𝑚𝑚𝑚𝑚𝑚𝑚!, (1) ∑𝑗𝑗=1 𝑢𝑢𝑗𝑗 𝑦𝑦𝑗𝑗 ∑𝑟𝑟 𝑣𝑣 𝑥𝑥 with ∑𝑠𝑠𝑖𝑖=1 𝑢𝑢𝑖𝑖 𝑦𝑦𝑖𝑖 ≥ 1– or all modules m = 1, 2, …, n; uj ≥ 0, j = 1, 2, ..., s; vi ≥ 0, 𝑗𝑗=1 𝑗𝑗 𝑗𝑗 i = 1, 2, ..., r; yj – expression j – and the output parameter of the investigated computational module (node); xim – expression i – that input factor m – of that computing node with i = 1, ..., r and m = 1, ..., n; yjm – expression j – that input factor m – of that computing node with i = 1, ..., r and m = 1, ..., n; vi – input factor weights i with i = 1, ..., r; uj – weighing the output parameter j with j = 1, ..., s. For further investigation we modify this nonlinear problem of risky criterion optimization using a complex fractional programming theory algorithm, a more traditional and less resource-intensive linear programming algorithm. Thus, the complex problem of optimizing the risk-sustainable efficiency criterion can be simplified to a linear problem using linear optimization techniques [28]. To obtain the efficiency criterion value for all computational modules (nodes), it is necessary to solve the maximization problem individually for each individual module involved in scientific research. In this case, the vectors xim and yim are each time replaced by the profile of input and output parameters of the computational module under study, respectively [24]. In the other problem of maximizing the risky criterion, they remain the same for each computational node [18, 30, 35, 45]. In this problem, constraints are imposed to ensure that qualitative efficiency values are found in the region between zero and one (e0 ∈[0,1]). The main property of this model is formed from the constraint criterion of risky sustainable efficiency. Its target function proportionally tends to increase the output parameter of the observed computational module to the limit of the risk-sustainable efficiency criterion. When a certain function is mathematically decoupled, after applying the duality theorem to it, an equivalent or so called encompassing form is formed. The principle of duality when using linear programming indicates that for each considered direct linear model of optimization of the risk-sustainable efficiency criterion there exists a dual corresponding linear optimization model, and in the case of solving one model contains all expressions for solving the other model. Thus, the original optimized model will be treated as a direct linear programming problem. Using the dual method, we minimize the weighted sum of the input parameters in one normalized output parameter. Mathematical formulation of linear programming model when using dual method is carried out as follows: 𝑟𝑟 min 𝑔𝑔0 = � 𝑡𝑡𝑖𝑖 𝑥𝑥𝑖𝑖, (2) 𝑖𝑖=1 with g0 – the value of the efficiency criterion of the studied computing node; xi – expression i – of that computing node input factor with i = 1, ..., r; ti – variable weighting factors. Linear combinations determine potential reference groups for measuring the efficiency criterion of computational nodes involved in the study. So, according to the task of optimization (2) of the target function which is minimized, the proposed method selects such a reference group in which the criterion of efficiency of the involved computational node looks not effective enough [10, 37, 40]. Therefore, this mathematical problem can be interpreted as follows: for the investigated computational node to determine the minimum efficiency criterion of the input parameter G0, where compared to the weighted probabilities of comparative units, the weighted combination of output parameters of any input parameter does not detract from the output parameter, and the weighted integral combination of input factors of any input factor is G0 times greater than the input factor. 2. Parametric network model with time metrics to calculate load volume The basic principles of operation and functioning of the network model of cloud computing on independent computing platforms differ significantly from traditional serial and parallel models. The main feature of the network model of cloud computing is the ability to aggregate and collectively use large information sets of heterogeneous computational data flows, distributed geographically. Basically, this approach provides significant advantages, for example, when a software system is developed that requires information resources that are not available within one computing node, it can get them in other computing nodes connected to the cloud network [5, 8, 9, 23, 27]. However, the use of such a complex architecture for processing data flows has several caveats and some problems. To a highly heterogeneous, dynamically formed distributed computing environment it is rather difficult to apply such traditional metrics of performance criterion definition as the speed of data flow calculation, bandwidth of exchange channels, etc. to such a highly heterogeneous, dynamically formed distributed computing environment. That's why to estimate the quality of the provided cloud service it's necessary to use specialized computational metrics [11, 13, 31]. Let's assume that in a network cloud environment m computing resources are available and there is a system of distribution of task flow t, which provides an even distribution of information tasks j∈ t into available resources. Within the framework of using such developed software system with the use of cloud technology each user's information task can be divided into certain computing actions k ∈ j. When setting a task, the processing time dj is determined in order to get the corresponding junction. Each information task of user j and all corresponding actions k ∈ j are in the cloud grid and at a certain point rj. So, a cloud network that operates in online mode, with custom rj. values that are unknown in advance for a significant number of these tasks. As soon as a certain custom task for processing arrives in the cloud network, it is planned to search for and allocate the necessary resources to run it. Suppose that as a result of the final distribution of computing tasks S for each action k ∈ j a certain time for processing Ck(S). is required. So, a user's computational task in the cloud network j can be processed not faster than in a certain period of time determined by the expression (3): 𝐶𝐶𝑗𝑗 (𝑆𝑆) = max 𝐶𝐶𝑘𝑘 (𝑆𝑆), (3) 𝑘𝑘∈𝑗𝑗 with Ck(S). – the maximum processing time of the user's task; k – action; j – mission. Let us define pj as the processing time of user's task k ∈ j. So, the processing time of user's task in a cloud network can be calculated as follows (4): 𝑝𝑝𝑗𝑗 = 𝐶𝐶𝑗𝑗 (𝑆𝑆) − 𝑚𝑚𝑚𝑚𝑚𝑚(𝐶𝐶𝑘𝑘 (𝑆𝑆) − 𝑝𝑝𝑘𝑘 ), (4) 𝑘𝑘∈𝑗𝑗 with pj – user task processing time; Ck(S) – execution time; Cj(S) – maximum execution time; pk – decision implementation time. The received integral values allow estimating properties of the cloud network computing environment. To analyze the quality of the cloud service provided by the network computing environment, you can use the value of the maximum delay in user tasks (5): 𝐿𝐿𝑚𝑚𝑚𝑚𝑚𝑚 = max�𝐶𝐶𝑗𝑗 (𝑆𝑆) − 𝑑𝑑𝑗𝑗 �, (5) 𝑗𝑗∈𝑡𝑡 with Lmax – maximum delay of the user's task; Cj(S) – maximum processing time; dj – time of uncompleted user tasks. In order to optimize the work of a distributed cloud, it is necessary to achieve the minimization of Lmax value, and you can also use the Tj value, which determines how late the user's task is (6): 𝑗𝑗 ∈ 𝑡𝑡 ⋀ 𝐶𝐶𝑗𝑗 > 𝑑𝑑𝑗𝑗 , (6) with t – information computing resources are available; dj – time of uncompleted user tasks; Cj – time to complete one task; k – action; j – mission. Consequently, this indicator provides information on the number of outstanding computational requests from users who have entered the cloud network for processing. Consumption of computing resources of the RCk cloud network is a certain subtask of computing, which is defined as the product of the corresponding solution time by the number of resources used in the network (7): 𝑅𝑅𝐶𝐶𝑘𝑘 = 𝑝𝑝𝑘𝑘 × 𝑚𝑚𝑘𝑘 , (7) with RCk – total consumption of network computing resources; pk – decision implementation time; mk – the number of resources used. Using the value of integrated consumption of computing resources of the cloud network, you can calculate the value of available information resources U (8): 𝑅𝑅𝑅𝑅(𝑆𝑆) 𝑈𝑈 = , (8) 𝑚𝑚 × (max 𝐶𝐶𝑗𝑗 (𝑠𝑠) − min(𝐶𝐶𝑗𝑗 (𝑆𝑆) − 𝑝𝑝𝑗𝑗 ) 𝑗𝑗∈𝑡𝑡 𝑗𝑗∈𝑡𝑡 with U – the amount of use of available information resources; m – resource quantity; pj – processing time of the j-th problem; RC(S) – time of total consumption; Cj(S) – time of the j-th task. By a certain value that characterizes the criterion of optimally used computing resources of the cloud distributed network. In the process of processing information requests in the cloud distributed network, there are often situations when the software system fails during the processing of the user's task [6, 7, 26, 43, 44, 47-48]. Then the user's task to process the information request must be run several times for its successful execution. Because users and administrators may have different (and even conflicting) requirements for a cloud-based Web system, it is difficult to find the right metrics that are universal and satisfying for everyone. From the user's point of view of the developed software system using cloud technology, the following metrics of the average response time to the request (Average Response Time) and the average waiting time of the request (Average Wait Time) can be distinguished: 1 𝐴𝐴𝐴𝐴𝐴𝐴 = �(𝐶𝐶𝑗𝑗 (𝑆𝑆)) , (9) |𝑡𝑡| 𝑗𝑗∈𝑡𝑡 with ART – average response time; Cj(S) – time of the j-th task; pj – time of realization of the j–th task; t – available resources. 1 𝐴𝐴𝐴𝐴𝐴𝐴 = ��𝐶𝐶𝑗𝑗 (𝑆𝑆) − 𝑝𝑝𝑗𝑗 � , (10) |𝑡𝑡| 𝑗𝑗∈𝑡𝑡 with AWT – average waiting time; Cj(S) – time of execution of the j – th task; t – available resources. The ART parameter value characterizes the average response time to a user's information request, namely, how quickly user tasks are processed. Besides, value of parameter AWT is very important for developers of algorithm of actions on rather small information tasks, and also on processing of their inquiries. So, the optimum and simple method of measurement of fair efficiency of use of information and hardware resources is calculation of deviation of weighted average time of expectation of processing of inquiry: 1 𝐶𝐶𝑗𝑗 (𝑆𝑆) − 𝑝𝑝 𝑗𝑗 2 𝐴𝐴𝑊𝑊𝑊𝑊𝑊𝑊 = ��(𝐶𝐶𝑗𝑗 (𝑆𝑆) − 𝑝𝑝 𝑗𝑗 )2 − (� ) , (11) |𝑡𝑡| |𝑡𝑡| 𝑗𝑗∈𝑡𝑡 𝑗𝑗∈𝑡𝑡 with AWTD – deviation of the average waiting time for processing the user's information request; t - available resources; Cj(S) – time of execution of the j – th task; pj – time of realization of the j – th task. AWTD must be minimized in order to achieve optimal results in the cloud network. In modern cloud network Web-systems, the ability to complete processing of a given volume of tasks is more important than the acceleration of distributed visco-productive Web-system, obtained with this approach to processing. It should be noted that the information tasks of the user, which are carried out in cloud network environments, can have a rather complex architecture than the information tasks of the user, which are carried out in parallel systems with traditional architecture. For example, the information flows of user tasks have a more complex logical structure than the packages of corresponding tasks [25]. Using a cloud mesh requires changing such notions as errors of a developed program system of Web-services which is designed on a mesh model, generates error messages as soon as there comes a situation when it is impossible to successfully execute and finish a user's information task. For example, a failure of a developed software system using cloud technology may occur if it is impossible to find the appropriate resources to perform information calculations or through their completion. Using the concept of fault tolerance in software systems using cloud technology, we can define as the main possibility to increase the time delay of both software and hardware errors until there is no chance that the work on processing user requests will be successfully completed. Metric of completed user request processing in a software system using cloud technology Workload Completion, which is formed as a ratio of successfully completed user tasks to the total volume of all requests set by the cloud network scheduler: ∑ 𝑗𝑗 ∈ 𝑡𝑡 ⋀ (𝑗𝑗 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐) 𝑊𝑊𝑊𝑊 = , (12) |𝑡𝑡| with WC - indicator of completed user requests processing; t – available resources; j – task. This metric allows to define the main limitations of the cloud network software system, and its maximization can be the main goal. However, it also has some critical limitations from the point of view of using free information and hardware resources, as a user's task with a smaller number of computational actions has a significant influence on the changes of this value [16]. Task completion calculates the number of completed actions to the total number of performed actions, which were implemented within the cloud software system of user tasks distribution: This metric allows you to define the main limitations of a cloud network software system, and its maximization can be the main goal. However, it also has some critical limitations from the point of view of using free information and hardware resources because the task of a user with less computational actions has a much greater influence on the change of this value. Task completion calculates the number of completed actions to the total number of performed actions, which were implemented within the cloud software system of user tasks distribution: ∑ 𝑗𝑗 ∈ 𝑡𝑡 ⋀ 𝑘𝑘 ∈ 𝑗𝑗 ⋀ (𝑘𝑘 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐) 𝑇𝑇𝑇𝑇 = , (13) ∑ 𝑗𝑗 ∈ 𝑡𝑡 |𝑗𝑗| with TC – index of completed actions for processing user tasks; t - available resources; k - action; j - task. It is also worth considering such a notion as the completion of unlocked actions from user tasks for processing enabled task completion, i.e. when they can be performed only when all corresponding dependencies for a given sequence of actions will be executed by a software system using cloud technology: ∑ 𝑗𝑗 ∈ 𝑡𝑡 ⋀ 𝑘𝑘 ∈ 𝑗𝑗 ⋀ (𝑘𝑘 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐) 𝐸𝐸𝐸𝐸𝐸𝐸 = , (14) ∑ 𝑗𝑗 ∈ 𝑡𝑡 ⋀ 𝑘𝑘 ∈ 𝑗𝑗 ⋀ (𝑘𝑘 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒) with ETC – integral indicator of completed unlocked actions relative to user tasks; t – available resources; k – action; j – task. So, we have considered the main principles of the cloud network model's functioning while developing program systems which differ greatly from traditional serial and parallel models. Their main difference is the possibility of aggregating and sharing large sets of heterogeneous information resources distributed between geographically separated computational independent platforms. Basically such architectural approaches to developing program systems with the use of cloud technology bring considerable advantages and financial benefits, for example, when a program Web- system requires considerable information resources which are inaccessible within one computing node, it can get them in another node connected to the cloud network. 3. Models and algorithms of distributed high-performance software Web-system architecture optimization Any Web-systems or cloud services that serve a lot of users, a priori highly loaded. However, highly loaded distributed software Web-systems cannot effectively apply models, methods and algorithms that are used as basic approaches to the development of ordinary Web-sites [4, 42]. A considerable increase of the user audience and corresponding calculations without proper approaches to optimization of the architecture of program systems development with the use of cloud technology can lead to considerable complications in their maintenance in time [1, 17, 21, 29]. Now all the models, methods and algorithms for developing the architecture of distributed fault-tolerant Web- systems are developed and used with the use of generally accepted information technologies [19]. Methods of sending information packets should use an algorithm for calculating the checksum of the control protocol transfer of the package, which allows you to display the checksum of the control protocol transfer of data of the package using the checksum of the header of this package [2, 12, 20, 22, 36]. Having considered the checksum recalculation algorithm, which is to use the output package checksum and the input package checksum to calculate the checksum. So, if the clipboard is divided into two information parts, the checksum of the entire buffer can be linearly expressed through checksums of its parts. Typical length of the header protocol control transmission of the package, usually has a shorter length by several times than the basis of the entire information package. Consequently, the computational load is reduced in n-times the resources of the entire software Web-system and allows to significantly reduce the cost of redirection of client requests from one node to another. At the next technological stage, the user request passes to the computing node for processing. The architecture of the computing node for processing user's information requests should be organized in a hierarchy style. At the bottom of this hierarchy are important computing modules for processing user requests, the failure of which can temporarily slow down the overall operation of the Web-system, but not paralyze it as a whole. If the lower computation modules do not work, the cloud network Web-system allows to use the corresponding computation modules of other nodes. However, at such organization of computing process it is clear that the level of communication between nodes on processing of user requests should be perfect. Features of the organization of the architecture of the computing node for processing high-user requests requires a clear hierarchy of these modules. Its essence is that each of the computing modules of the node must be as isolated from the others as possible and to exchange information at the level of common system messages while being information dependent. So, the design of software Web-system should be developed according to the following principle: «where the failure is a certain part of the Web-services work, which should be controlled at the level of system messages exchange». The architecture of the hierarchy of failures consists in the fact that any Web-system consists of computing modules, and the next has a sequence of actions when the modules of descendants refused. The information flow in this case is in ascending order, because only the parent computing module contains data about what actions should be performed if one or more modules do not work on the cloud computing platform. However, in cases when one of the computing modules of a node fails or is loaded, a request is made to another module, the highest in its hierarchy. It should also be noted that the main computing module may not participate in the development of the client's request, and act as a Web service, which can be in direct contact with other nodes of the cloud network and simultaneously conduct diagnostics of the corresponding modules of the node. After that, the information flows of data generated by the user are developed, and if they cannot be executed, the management decision is made to transfer the information flow from the idle or heavily loaded module, to one which has redundant resources (Figure 1). Figure 1: Model of failure hierarchy processing in software Web-systems The choice of a computing module for executing user requests is based not only on its load, but also on information about the number of failures in other modules of the cloud node. This algorithm is necessary in order not to create information packages of instantaneous data flow for processing from extremely busy node modules to less busy ones. As a result of these manipulations, a complex cloud network for exchange of information arrays of calculations between the corresponding modules of computing nodes is formed. However, the main purpose in building a highly loaded distributed software Web-systems is to create not such a software system using cloud technology that does not fail during industrial use, and that as long as possible will be able to work and cope with various information challenges and their own mistakes [41]. After the information message flow has been redirected, the computational module does not stop working. So, we can emphasize the strategies that can be implemented in each of these computing modules [15]. In the first strategy we will understand that the computational module was under a significant load. In such a situation, the module executes the whole list of user tasks that came into it or remained in it for processing, performing the primary reboot, and generates an information message to the module that is higher in the hierarchy, and is ready to work. Given that the cloud network of computing nodes modules and the main load balancer constantly exchange information messages about readiness for processing, and the module will receive a new data stream for processing at each subsequent iteration of data flow in the distribution of the load on the nodes. The snapshot/restore method of data duplication is often used in highly loaded distributed Web systems. Thus, a hybrid system model with a central balancing node of load balancing and cloud network communication between computing nodes allows you to simply add more nodes by applying a cluster structure. High level of communication between computing nodes creates a problem for the data cloning method, because in case of failure of one of the nodes it is necessary to recalculate the integral performance of the Web-system as a whole. It is clear that in this case we can divide the problems connected with the failure of the computational node into two conditional groups. The first group of software and hardware technical problems. Since the redistribution of computational volume occurs at the level of computational modules in case if the software of the developed Web-system generates an error, the system will automatically and quickly restore its performance if its computational module at the top of the hierarchy was properly designed and did not get a cascade error. Then it will alert other computational nodes about errors and redirect the flow of computational information data to other modules. So, in this situation there is no duplication of a particular computational node on an independent platform, but just a launch of a standard instance. In order to significantly reduce the risk of loss of a significant number of user requests, it is necessary that the balancing computing node contains a buffer that will act as a «retry» in the case when a certain node is physically unavailable. Therefore, using the method of duplication, provides the creation of a typical computing node and simultaneously restores the performance of the Web- system. However, in the development of program systems with the use of cloud technology there is a whole class of distributed algorithms where there are certain structures of information data flows (arrays, records), the size of which depends on the total number of simultaneously interacting processes that perform processing by different computing nodes on independent platforms. An example of such algorithm application is the protocol of making a coordinated decision [34]. When modifying the topology of a distributed highly-advantageous and computational Web-system, the structure of data flows and algorithms of their processing will also change. Therefore, there is an urgent need to modify the scenario of behavior of the used simulation model, namely, to change the algorithms and structures of data flows describing the behavior of the corresponding objects. 4. Implementation of data flow processing methods for recovery of computing nodes In the development of highly loaded distributed Web-systems, architects make up of several computing nodes, and which are an instance of a single computing system, thus forming a «clustering». However, the modified algorithm of data flow processing requires a clear separation not only on the physical level, but also on the program one. Software of Web-systems is traditionally divided into logical modules, that is by functional purpose [33]. By working out of software Web- system the architect needs its design so that computing modules in the structure had as little logic as possible as the main task of redistribution of volume of loading is support of system balance. No less important requirement to architecture of software of Web-system is avoidance of strong connectivity. Strong connectivity between computing modules is that one of the modules comes into contact with a large number of other modules on independent platforms. That is, if it breaks down a significant number of modules will send a message to the service module, and that in turn will notify the other nodes of the need for their temporary replacement. To effectively overcome possible computational problems, you can use such methods to restore the functionality of developed software Web-system using cloud technology: the clone_args method stores the values of the arguments that were sent to the computational module that failed [3, 14, 38, 39]. Since the absence of a timely response from the module is classified as its failure in this case, the parent method of the module must get the appropriate information about the arguments that were sent to the child method. However, the direct features of the clone_args method do not end there, in particular there are indirect properties to which they relate: • the current integral processing time value for the computing module or failure module method; • the fixed time for which the processing of cloud Web-systems computing nodes failed; • launch frequency of a computational module, which allows you to determine the need for this module, and therefore, when distributing the load between the nodes need to more effectively allocate the necessary software and hardware resources. The report method allows sending statistical data for the service module about processing the information flows of all child nodes at certain intervals. Also, its use facilitates automated forecasting of degradation dynamics of the developed software Web-system. So, the report method can be used not only to inform about the planned statistical indicators but also when one or several child computing modules have failed. The restart method provides the developed Web-system program with the use of cloud technology by rebooting the computational module in case of its failure. The main purpose of rebooting is to avoid destructive incoming data streams. The callback method allows sending an information message to the parent module about the successful or unsuccessful reboot of the computing node. This method is passed as an argument to the computing module, which should be checked for stable operation after starting or restarting a software Web system that provides a "callback method". Having analyzed the character of errors which can occur in computing modules of the developed program Web-system we'll notice several main reasons: errors in the program code, and also problem data flows and high load in nodes. The feedback method provides communication between the computational nodes of the developed program Web-system using cloud technology and is as independent of others, that's why it can be surrounded by a lot of micro services of Web-systems with which it is necessary to exchange information flows. To do this, of course, you should inform the service computing module, which is on top of the corresponding hierarchy. That's why each computing module implements the feedback method that knows how to contact it and is able to redistribute data flows from modules that fail or are heavily loaded to work nodes. Modified computational model of the recovery method, the modules that have failed, are shown in (Figure 2) [32]. Figure 2: Algorithm of load redistribution in case of failure computing nodes of the software Web-system This modified algorithm implements only service modules for processing data flows. In contrast to conventional computing modules, service modules provide a much smaller number of operations to obtain a positive result, which directly affects their work. In this case, service modules act as a «stable part of a software system in one computing point» on the node on an independent computing platform. Consequently, when developing a program Web-system it is important to use methods for providing the minimum vulnerability of this computational module. These computational methods constantly inform about the results of work for a certain quantum of time or unplanned in case of leaving the working state. It is clear that the service module has information about the work of all developed software Web-systems. The less time quantum of updating of information streams about results of work of developed program Web-system, the more precisely and effectively it is possible to define necessary computing node for transfer of its load. Packaging, state transfer and data flow. As soon as in the developed program Web-system with the use of cloud technology there was a consensus between computing nodes that delegate and accept a part of computational work, it is necessary to prepare a package of raw data for a new node. And only after that to establish a connection between the modules that send user requests to those that fail, and to another working module in another computing node. Also, the service module sends a message marked that the computing module is damaged – «temporarily unavailable for new requests for processing data flows», then performs the standard procedure of rebooting. This procedure is implemented in each computing module thanks to the corresponding restart method, and it is also connected to the service module on the computing node by the feedback method. As soon as the software system receives the message that the computational module has failed, clone_args copies all the input data streams sent for processing and an indirect set of data about the current state of the system. In this case there is little time for making managerial decisions, so the service module may rather try to pick up a new computational node, simultaneously copying the whole information call thread through the clone_args method and directing it to the service module. When analyzing the sequence of the corresponding actions it becomes clear that it will be necessary to form a list of processing calls which are now waiting for other parent computing modules or their methods. However, this list is a kind of queue for processing user requests that are processed by a new working computing module from the created queue. Amazon Simple Queue Service works by the same principle - a service that quickly receives queues of user messages for storage and processing. As soon as the computational node becomes available for information flows of data, the service node composes this data and links the corresponding modules and their states (Figure 3). Figure 3: Model of packing, transfer of states and information flows of data To ensure the reliability of data flow transmission, it is necessary to divide a specific set of these data into appropriate fragments and add to each of them a header with the sequence number. The fragments of information data flows obtained in this way form a segment. On the next step of data processing each segment passes into a package, and then with the help of transport information protocols comes to the recipient's computing node. After the package is delivered to the recipient's computing node, the correctness of receiving the information flow in the segment is checked by means of a checksum recalculation, and it is automatically determined that the previous information flow segments were also successfully received. At this stage, the recipient's computing node sends an information request to the sender's node about a new data packet or a repeat transmission of the previous data packet. This operation algorithm ensures that all previous packets from the data stream sequence have been successfully received. In the developed modified model computing modules are isolated from each other and do not have information about the general state of the Web-system, while establishing a connection between them, as well as receiving messages about their state. For interaction of computational module, we use asynchronous exchange of information messages where each corresponding module has its own turn of information messages. So, a computing module that sends a message waits for the next notification about its delivery, otherwise the recipient ignores this message. 5. Conclusions The conducted system analysis examines the basic design principles of distributed risky Web- systems and the technologies that architects of such software systems most often use in their design. We also conclude that redundancy is an indispensable attribute in the design of most risk resilient Web-systems, which are characterized by a high load capacity with respect to user requests. The main criterion in the design of risk resilient Web-systems is their scalability, which is used when operations require significant computational resources, which significantly reduces the performance of the system and requires it to increase its overall capacity. Also, methods for assessing the reliability and resilience of developed risk-resistant software Web-systems are investigated, as any software system must be objectively monitored, and this must be predicted in their work. The main aspects of operation of distributed fault-tolerant software Web-systems are considered, as problems of their administration, as well as restoration of performance in case of failure may arise in the process of highly profitable operation. Analyzing the technologies used in the design of software Web-systems in the study, we have improved methods, according to which a comprehensive analysis of computing load balancing, which is the main task in the design of distributed fault-tolerant software Web-systems. To solve the problem of efficient operation of the developed software system, we use algebraic methods to optimize the balancing of computational load on the nodes, as well as the network model of its distribution, thereby creating a hybrid mechanism of information flows. We consider the theoretical and practical foundations that ensure the effective functioning of the failure hierarchy mechanism, which allows us to provide a risk- sustainable efficient calculation of the computational load of software modules and nodes as a whole. Analyzing a high-loaded distributed fault-tolerant software Web-system, it is found that its fundamental criterion is the efficiency of processing information data streams, generally calculated as the quotient of the sum of all output volumes by the sum of all input data streams. For each specific computational module or node, its value of efficiency is determined. Comparison of the effectiveness of computing nodes is carried out using the linear programming method and at the same time different basic models and their variants. It is proposed to determine the number of computational modules or nodes involved to build a criterion for the boundary of risk efficiency, and for all other cases - the criterion of their inefficiency. Improved functionality of the network model, which differs significantly from the usual sequential and parallel models. The main difference is the ability to aggregate and share significant sets of heterogeneous computing resources for the development of geographically distributed information flows. In many cases this provides significant advantages, since the developed software Web-system requires additional resources that are not available within a single computational node, at the same time they can be obtained from other nodes connected to the functional grid. During the study, we improved the algorithm for transferring the computational load without using redundant resources and proposed a modified model of interaction between computing modules to process information flows within a single node, and the developed software Web-system in general. Also considered the methodology for determining the causes of failure of computing modules and nodes, highlighted two types of major problems associated with the failure of the node, software and hardware problems. We modify the model of computational process control for the development of information data streams, where modules isolated from each other do not have a common state, but between them you can establish an information connection and receive notifications about their state. Asynchronous exchange of information messages is used for interaction of computing modules, where each module organizes its own message queue. A module sends an information message, waits for an acknowledgement of the corresponding delivery message, but does not receive it if the recipient of the message ignores it. An important aspect of the study is that the risk of stable free computing resources of the developed software Web-system can be found within its capabilities. References [1] S.Aleti, S.Bjornander, L.Grunske, I.Meedeniya, ArcheOpterix: An extendable tool for architecture optimization of AADL models, ICSE Workshop on Model-Based Methodologies for Pervasive and Embedded Software 2009, pp. 73-77 [2] H.Jamalludin, Y.Jamalludin, Analysis of success factors of technology transfer process of the information and communication technology, “International Conference on Advances in Electrical, Electronic and Systems Engineering (ICAEES), 2016, Putrajaya, Malaysia, pp. 382-387, doi:10.1109/ICAEES.2016.7888074 [3] V.Andrunyk, A.Vasevych, L.Chyrun, N.Chernovol, N.Antonyu, A.Gozhyj, M.Korobchynskyi, Development of information system for aggregation and ranking of news taking into account the user needs, Paper presented at the CEUR Workshop Proceedings, 2604, 2020, pp. 1127-1171 [4] B.D.Rosenberg, N.Siegel, Critical Quality Factors for Rapid, Scalable, Agile Development, 19-th International Conference on Software Quality, Reliability and Security Companion (QRS-C), Sofia, Bulgaria, 2019, pp. 514-515. doi:10.1109/QRS-C.2019.00101 [5] S.Babichev, V.Lytvynenko, J.Skvor, M.Korobchynskyi, M.Voronenko Information technology of gene expression profiles processing for purpose of gene regulatory networks reconstruction, IEEE 2nd International Conference on Data Stream Mining and Processing, DSMP 2018, 2018, pp. 336-341. doi:10.1109/DSMP.2018.847845 [6] Ch.M.Shoga, B.Boehm, Exploring the Dependency Relationships between Software Qualities, 19-th International Conference on Software Quality, Reliability and Security Companion (QRS-C), Sofia, Bulgaria, 2019, pp. 105-108. doi:10.1109/QRS-C.2019.00032 [7] Zhi et al., Quality Assessment for Large-Scale Industrial Software Systems: Experience Report at Alibaba, 26-th Asia-Pacific Software Engineering Conference (APSEC), Putrajaya, Malaysia, 2019 [8] D.Ageyev, A.Mohsin, T.Radivilova, L.Kirichenko, Infocommunication Networks Design with Self-Similar Traffic, IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM), Polyana, Ukraine, 2019, pp. 24-27. doi:10.1109/CADSM.2019.8779314. [9] D.Ageyev, O.Bondarenko, T.Radivilova, W.Alfroukh, Classification of existing virtualization methods used in telecommunication networks, 9th International Conference on Dependable Systems, Services and Technologies, Kiev, 2018, pp. 83-86. doi:10.1109/DESSERT.2018.8409104. [10] I.Dronjuk, M.Nazarkevych, O.Fedevych, Asymptotic method of traffic simulations, 2014. doi:10.1007/978-3-319-05209-0_12 [11] I.Dronyuk, M.Nazarkevych, O.Fedevych, Synthesis of Noise-Like Signal Based on Ateb- Functions. In: Vishnevsky V., Kozyrev D, (eds), Distributed Computer and Communication Networks, DCCN 2015, Communications in Computer and Information Science, vol 601, Springer, Cham, 2016. doi:10.1007/978-3-319-30843-2_14 [12] I.Dronyuk, I.Moiseienko, Ja.Greguš, Analysis of Creative Industries Activities in Europеan Union Countries, The International Workshop on Digitalization and Servitization within Factory-Free Economy, (D&SwFFE 2019) November 4-7, 2019, Coimbra, Portugal Procedia Computer Science 160, pp. 479–484 [13] A.AlOmar, M.W.Mkaouer, A.Ouni, M.Kessentini, On the Impact of Refactoring on the Relationship between Quality Attributes and Design Metrics, ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Porto de Galinhas, Recife, Brazil, 2019, pp. 1-11. doi:10.1109/ESEM.2019.8870177 [14] E.Awad, M.W.Caminada, G.Pigozzi, M.Podlaszewski, I.Rahwan, Pareto optimality and strategy-proofness in group argument evaluation, Journal of Logic and Computation, vol. 27, no. 8, 2017, pp. 2581–2609 [15] A.Qasem, A.Qusef, Team Building Activities for Virtual Teams in Jordanian Companies: Vision and Survey, International Conference of Computer Science and Renewable Energies (ICCSRE), Agadir, Morocco, 2019, pp. 1-7. doi:10.1109/ICCSRE.2019.8807677 [16] A.Galkin, R.Umiarov, O.Grigorieva, D.Ageyev, Approaches for Safety-Critical Embedded Systems and Telecommunication Systems Design for Avionics Based on FPGA, IEEE International Scientific-Practical Conference Problems of Infocommunications, Science and Technology (PIC S&T), Ukraine, 2019, pp. 391-396. doi:10.1109/PICST47496.2019.9061421. [17] L.Junwen, Z.Ziyan, H.Jiakai, The application of quantum communication technology used in electric power information & communication system confidential transmission, 19th International Conference on Advanced Communication Technology, 2017, pp. 305-308 [18] Lv Shu Ping, Tian-Zhenjie, GA's arithmetic on time-cost optimization in architecture engineering, International Conference on Electric Information and Control Engineering, 2011, pp. 3293-3296 [19] M.Brown, Learning Apache Cassandra - Manage Fault Tolerant and Scalable Real-Time Data Mat Brown, Birmingham: Packt Publishing, 2015, 276 p. [20] M.Kabir, M.Rashed, Multi-level client server network and its performance analysis. Saarbrücken: LAP Lambert Academic Publishing, 2012, 124 p. [21] M.L.Abbott, M.T.Fisher, The art of scalability: scalable web architecture, processes, and organizations for the modern enterprise, 2nd Edition, Kindle Edition, Boston: Addison- Wesley Professiona, 2015, 618 p. [22] M.Vladymyrenko, V.Sokolov, V.Buriachok, A.Platonenko, D.Ageyev, Analysis of Implementation Results of the Distributed Access Control System, IEEE International Scientific-Practical Conference Problems of Infocommunications, Science and Technology (PIC S&T), Kyiv, Ukraine, 2019, pp. 1-6. doi:10.1109/PICST47496.2019.9061376 [23] M.Medykovsky, I.Droniuk, M.Nazarkevich, O.Fedevych, Modelling the Pertubation of Traffic Based on Ateb-functions, In: Kwiecień A., Gaj P., Stera P. (eds), Computer Networks, CN 2013, Communications in Computer and Information Science, vol 370, 2013. doi:10.1007/978-3-642-38865-1_5 [24] M.Medykovskyy, M.Pasyeka, N.Pasyeka, O.Turchyn, Scientific research of life cycle perfomance of information technology, 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, 1, pp. 425- 428. doi:10.1109/STC-CSIT.2017.8098821 [25] O.Mishchuk, R.Tkachenko, I.Izonin, Missing Data Imputation through SGTM Neural-Like Structure for Environmental Monitoring Tasks, Advances in Intelligent Systems and Computing, vol. 938, 2020, pp. 142-151. doi:10.1007/978-3-030-16621-2_13 [26] M.Zeeshan, Z.Mehtab, M.W.Khan, A fast convergence feed-forward automatic gain control algorithm based on RF characterization of Software Defined Radio, International Conference on Advances in Electrical, Electronic and Systems Engineering, Putrajaya, Malaysia, 2016, pp. 100-104. doi:10.1109/ICAEES.2016.7888017 [27] H.Mykhailyshyn, N.Pasyeka, V.Sheketa, M.Pasyeka, O.Kondur, M.Varvaruk, Designing network computing systems for intensive processing of information flows of data, 2021. doi:10.1007/978-3-030-43070-2_18 [28] M.A.J.Idrissi, H.Ramchoun, Y.Ghanou, M.Ettaouil, Genetic algorithm for neural network architecture optimization, 3 International Conference on Logistics Operations Management (GOL), 2016, pp. 1-4 [29] N.Pasieka, V.Sheketa, Y.Romanyshyn, M.Pasieka, U.Domska, A.Struk, Models, methods and algorithms of web system architecture optimization, IEEE International Scientific- Practical Conference Problems of Infocommunications, Science and Technology (PIC S&T), Kyiv, Ukraine, 2019, pp. 147-153. doi:10.1109/PICST47496.2019.9061539. [30] M.Nazarkevych, M.Logoyda, O.Troyan, Y.Vozniy, Z.Shpak, The ateb-gabor filter for fingerprinting. in international conference on computer science and information technology, 2019, September, Springer, Cham, 2019, pp. 247-255 [31] O.Riznyk, Yu.Kynash, O.Povshuk,V.Kovalyk, Recovery schemes for distributed computing based on bib-schemes, First International Conference on Data Stream Mining & Processing (DSMP), 2016, pp.134-137 [32] P.Haindl, R.Plösch, Towards Continuous Quality: Measuring and Evaluating Feature- Dependent Non-Functional Requirements in DevOps, International Conference on Software Architecture Companion (ICSA-C), Hamburg, Germany, 2019, pp. 91-94. doi:10.1109/ICSA-C.2019.00024 [33] P.Jain, A.Sharma, P.K.Aggarwal, Key Attributes for a Quality Mobile Application, 10-th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2020, pp. 50-54. doi:10.1109/Confluence47617.2020.9058278 [34] M.Pasyeka, V.Sheketa, N.Pasieka, S.Chupakhina, I.Dronyuk, System analysis of caching requests on network computing nodes, 3rd International Conference on Advanced Information and Communications Technologies, AICT2019 - Proceedings, pp. 216-222. doi:10.1109/AIACT.2019.8847909 [35] M.Pasyeka, T.Sviridova, I.Kozak, Mathematical model of adaptive knowledge testing, 5th International Conference on Perspective Technologies and Methods in MEMS Design, MEMSTECH 2009, 2009, pp. 96-97 [36] R.Paul, J.R.Drake, H Liang, Global Virtual Team Performance: The Effect of Coordination Effectiveness, Trust, and Team Cohesion, IEEE Transactions on Professional Communication, Sept. 2016, 2016, vol. 59, no. 3, pp. 186-202. doi:10.1109/TPC.2016.2583319 [37] R.Privman, S.R.Hiltz, Y.Wang, In-Group (Us) versus Out-Group (Them) Dynamics and Effectiveness in Partially Distributed Teams, IEEE Transactions on Professional Communication, March 2013, vol. 56, no. 1, 2013, pp. 33-49. doi:10.1109/TPC.2012.2237253 [38] O.Riznyk, O.Povshuk, Y.Kynash, M.Nazarkevich, I.Yurchak, Synthesis of non-equidistant location of sensors in sensor network. 14th International Conference on Perspective Technologies and Methods in MEMS Design, MEMSTECH 2018 - Proceedings, 2018, pp. 204-208. doi:10.1109/MEMSTECH.2018.8365734 [39] S.K.Land, The Importance of Deliberate Team Building: A Project-Focused Competence- Based Approach, In IEEE Engineering Management Review, 1 Secondquarter, june 2019, doi vol. 47, no. 2, 2018, pp. 18-22.:10.1109/EMR.2019.2915600 [40] S.Pradhan, V.Nanniyur, P.Melanahalli, M.Palla, S.Chulani, Quality Metrics for Hybrid Software Development Organizations – Case Study, 19-th International Conference on Software Quality, Reliability and Security Companion (QRS-C), Sofia, Bulgaria, 2019, pp. 505-506. doi:10.1109/QRS-C.2019.00097 [41] T.Aslam, T.Rana, M.Batool, A.Naheed, A.Andaleeb, Quality Based Software Architectural Decision Making, International Conference on Communication Technologies (ComTech), Rawalpindi, Pakistan, 2019, pp. 114-119, doi: 10.1109/COMTECH.2019.8737836 [42] T.B.Alakus, R.Das, I.Turkoglu, An Overview of Quality Metrics Used in Estimating Software Faults, International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey, 2019, pp. 1-6. doi:10.1109/IDAP.2019.8875925 [43] V.Bandyra, A.Malitchuk, M.Pasieka, R.Khrabatyn, Evaluation of quality of backup copy systems data in telecommunication systems, IEEE International Scientific and Practical Conference Problems of Infocommunications Science and Technology, PIC S&T′2019, 08- 11 October 2019, Ukraine, 2019, pp. 329-325. doi:10.1109/PICST47496.2019.9061379 [44] V.Sheketa, M.Chesanovskyy, L.Poteriailo, V.Pikh, Y.Romanyshyn, M.Pasyeka, Case-based notations for technological problems solving in the knowledge-based environment, IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, vol.1, 2019, pp. 10–15 [45] W.Liu, J.Yang, Y.Song, X.Yu, S.Zhao, Research on Software Quality Evaluation Method Based on Process Evaluation and Test Results, 6-th International Conference on Dependable Systems and Their Applications (DSA), Harbin, China, 2020, pp. 480-483. doi:10.1109/DSA.2019.00077. [46] Y.Romanyshyn, V.Sheketa, L.Poteriailo, V.Pikh, N.Pasieka, Y.Kalambet, Social- communication web technologies in the higher education as means of knowledge transfer, IEEE 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, vol.3 ,2019, pp. 35–39. [47] Y.-F.Zhang, H.-Y.Duan, Z.-L.Geng, Evolutionary mechanism of frangibility in social consensus system based on negative emotions spread, Complexity, vol. 2017, Article ID 4037049, 2017, 8 p. [48] Z.Říhová, L.Dostálek, Information Management - the Basis for Fulfillment of People's Information Needs, 2021, 11th International Conference on Advanced Computer Information Technologies (ACIT), 2021, pp. 469-472, doi: 10.1109/ACIT52158.2021.9548386