Distribute load among concurrent servers ⋆ Denys Bakhtiiarov1,2,*,†, Bohdan Chumachenko1,†, Oleksandr Lavrynenko1,†, Volodymyr Chupryn1,† and Veniamin Antonov1,† 1 National Aviation University, 1 Kosmonavta Komarova ave., 03058 Kyiv, Ukraine 2 State Scientific and Research Institute of Cybersecurity Technologies and Information Protection, 3 Maksym Zaliznyak, 03142 Kyiv, Ukraine Abstract A technical implementation option for load balancing among concurrently operating application servers is proposed to mitigate the risks of overload amid substantial unpredictable fluctuations in request flow to the application system and the variable processing durations by each application server. The structural- functional model for load balancing inside the server line of the application system is delineated, and designed to operate under conditions where the incoming request flow from clients is characterized as random, unexpected, non-stationary, and pulsing. A proposal is made for a system that generates a flow of requests to the application server line, ensuring the alignment of the stationary intervals of this flow with the intervals of discrete control for equalizing server load factors. A technological framework for load balancing on application servers is proposed, facilitating the equalization of load factors among application system servers through real-time transmission, allowing the redistribution of a portion of incoming request traffic from more heavily loaded servers to those with lesser loads. Keywords request, application, server, client, load balancing1 1. Introduction Users between the line servers (steps 2 and 3) will implement the distribution strategy outlined below. The In practice, when utilizing computerized real-time request redirection server transmits the IP address of the application systems like ‘client/server’ that permit remote subsequent application server, as determined by the access for clients via the Internet, such as various interactive distribution method, to the user terminal (step 4), and help systems, the effectiveness is assessed by the value of subsequently readies itself to handle a new request from τs—the average service duration of each stream of customer another user, advancing to step 1. The user utilizes the IP requests entering the application system input. A reduced address of the designated application server to retrieve the value indicates that the consumer is likely to receive a online result of processing his request from that server response to their request more promptly [1]. At low request (step 5). The designated server resolves the application issue flow intensities, queues at the application system’s input are and transmits the outcome to the user (step 6) [2]. virtually nonexistent, thereby making τs directly contingent Specifically, Fig. 1 illustrates that a series of specialized upon the performance of the server hardware hosting the application software and hardware servers process client application software. Issues occur when the volume of requests concurrently. Choosing the number of servers in incoming requests is misaligned with the processing speed the configuration should align the request traffic intensity of the server infrastructure, leading to the accumulation of with the application system’s performance. Nonetheless, the unprocessed requests, which in turn results in an issues get intricate when addressing an erratic and unacceptable increase in service request duration and unpredictable influx of requests, characterized by certain instances, the loss of some requests. Given the high substantial fluctuations in both intensity and duration. In intensity of request flow in several applications, it is this scenario, due to erratic variations in request volume and essential to partition it in real-time into parallel the uncertain processing times by application servers, these demultiplexed substreams and execute their concurrent servers, in the absence of specific interventions, experience online processing utilizing a series of application servers uneven and arbitrary loading—resulting in some servers with identical functionality. For instance, as illustrated in becoming overloaded and consequently losing requests, Fig. 1. Before the processing of a user’s request by an while others remain underutilized. Unforeseen variations in application server, it is initially received by the request the volume of requests directed to any application server redirection server (step 1), which employs a block to can impede request processing due to potential transient ascertain the current application server number designated server overloads. for the request and allocates the request stream in real-time. CPITS-II 2024: Workshop on Cybersecurity Providing in Information 0000-0003-3298-4641 (D. Bakhtiiarov); and Telecommunication Systems II, October 26, 2024, Kyiv, Ukraine 0000-0002-0354-2206 (B. Chumachenko); ∗ Corresponding author. 0000-0002-3285-7565 (O. Lavrynenko); † These authors contributed equally. 0000-0001-9412-7413 (V. Chupryn); bakhtiiaroff@tks.nau.edu.ua (D. Bakhtiiarov); 0000-0003-2244-262X (V. Antonov) bohdan.chumachenko@npp.nau.edu.ua (B. Chumachenko); © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). oleksandrlavrynenko@tks.nau.edu.ua (O. Lavrynenko); volodymyr.chupryn@npp.nau.edu.ua (V. Chupryn); veniamin.antonov@npp.nau.edu.ua (V. Antonov) CEUR Workshop ceur-ws.org ISSN 1613-0073 260 Proceedings Figure 1: Generalized structural and functional model for the allocation of user requests among application servers Consequently, there is both theoretical and practical and autonomously, with a software server that facilitates interest in developing a mechanism for load balancing on real-time adaptive distribution of request flow among the application servers, specifically a dynamic load balancing application servers to achieve more or less uniform load approach among collaborating application servers in real- balancing. The parameters of the examined load balancing time. This method’s implementation aims to avert potential technology are established through the resolution of the short-term overloads of individual application servers boundary value problem associated with the analytical during their operation, thereby fostering the sustainable design of the relevant regulator, utilizing the synthesis of functioning of the application system amid uncertainties in the corresponding R. Bellman functional and iterative the dynamics of the aforementioned environmental factors. numerical integration of the derived tuning equation. The The suggested technique must assure the stability of the implemented technical solution facilitates nearly uniform request distribution process, considering the dynamics of loading of server equipment under the specified conditions unforeseen fluctuations in this flow. The theoretical while maintaining an acceptable average waiting time for foundation of this strategy is explained in [3–5]. This paper service requests with the minimal necessary server presents a potential option for its technical implementation, resources. the core of which is as follows. The application system hardware depicted in pic.1 comprises a software server 2.1. System model for load balancing on (ROM server+server definition unit) that concurrently and servers autonomously manages multiple application servers. This This work introduces a structural and functional model for software server facilitates a real-time adaptive distribution load balancing throughout the server line of the application of requests among the application servers to maintain a system, designed to operate under conditions where the more uniform load during unpredictable surges in request incoming request flow from clients is random, unexpected, flow. non-stationary, and pulsing. Server load balancing entails the real-time redistribution of incoming request flows from 2. Main Part heavily loaded application servers to those with lighter The theoretical foundation of the employed load balancing loads, thereby achieving a more uniform distribution of load method is delineated in [1, 2, 6]. This paper presents a across the servers. Fig. 2 illustrates this model as a series of potential option for its technical implementation, the core numbered blocks, each representing a certain functional of which is as follows. The application system comprises a component of the model’s structure [7]. series of application servers that must function concurrently 261 Figure 2: Structural and functional paradigm for load balancing between concurrently operating servers of the applied information system Fig. 2 use the following designations for functional blocks: that the generated quasi-stationary traffic segments receive 1—smoothing of an input request stream; 2—creation of approximately equal load factors across all servers. The quasi-stationary segments of incoming request traffic at model illustrated in Fig. 2 is founded on the adaptive time intervals ∆ti—smoothing steps (as the formation principle of reallocating demultiplexed subflows of requests process is executed as a stepwise iterative procedure with a among application servers through real-time monitoring of step ∆ti, while monitoring fluctuations in the intensity of fluctuations in the current intensity of the incoming request the incoming request flow); 3—demultiplexing of the stream and the existing load levels of the application resulting input stream of requests at each smoothing servers. Consequently, this paradigm necessitates the real- interval ∆ti; 4—configurator of smoothing and alignment time implementation of the following three processes: procedures (referring to the process of synchronizing the current values of load factors for application servers seen in 1) The establishment of an incoming request flow to pic.2), executed by software-controlled clock generators; 5— attain a more uniform temporal distribution, assessing the current values of the intensity of the generated thereby preventing short-term overloads in the input request stream at each smoothing interval ∆ti; 6— application server line. buffering requests (establishing a queue of requests for 2) The demultiplexing of the incoming request processing by the i-th application server) at the input of the stream into several concurrently operating i-th application server; 7—evaluating the current values of subflows corresponds to the number of application the load factor of the i-th application server at each servers in the line. alignment step; 8—determining a singular matrix of 3) The equalization of current application server load regulatory relationships among the variables to be aligned factors diminishes the likelihood of short-term (i.e., between load factors on servers) at each alignment step; overload on any individual server. Examine the 9—ascertaining the precise values of the resource allocation characteristics of each of these processes. (i.e., the amount of requests) to be allocated among the input queues of application servers at each stage of the alignment; 2.2. Establishment of the incoming request 10—data processing of the relevant issue; A—incoming flow request stream; B—produced flow of requests; B—query For the proper functioning of this load-balancing method, substreams post-demultiplexing. Fig. 2 illustrates that to the incoming request traffic must be transformed into a create quasi-stationary traffic segments, the non-stationary series of quasi-stationary segments representing a discrete incoming request stream is initially smoothed and random process, which can be partially refined by structured accordingly. The created input stream is specialized averaging techniques. The load balancing demultiplexed, and the resulting parallel substreams are technology on the application system’s servers necessitates allocated to the application system’s servers based on the the accurate structuring of request flow, specifically to established load-balancing method. The primary objective maintain the consistency between the stationary intervals of balancing is to attain the most accurate estimate of the of this flow, ∆Ts, and the intervals of the discrete control uniform load across the application system servers. In other process for equalizing server load factors, τk. Some traffic words, under conditions of unpredictable fluctuations in creation technologies do not allow for this possibility. The incoming traffic and varying request processing times by “bucket tokens” method [6, 8] has a notable constraint in its each server, the balancing algorithm must operate to ensure applicability, being suitable solely for scenarios where 262 actual traffic exhibits the traits of a stationary random servers, largely unjustifiable. This study presents a process. Nevertheless, actual traffic and its derivatives must structural and functional framework for the development of be regarded as a non-stationary discontinuous process, request flow, intended as a component of adaptive load- rendering the straight application of the “token bucket” balancing technology for parallel servers within the method, along with other established traffic generating application system. This diagram is illustrated in Fig. 3. techniques, in adaptive load redistribution systems on Figure 3: Structural and functional diagram of the request processing pipeline by a series of application servers Fig. 3 employs the following designations for functional The implementation of this traffic processing scheme is blocks: 1—the request queue buffer at the input of the warranted if it can transform a non-stationary flow, marked application system (i.e., the input request storage); 2—the by unpredictable average speeds and fluctuating volumes, parameter (generator) defining the size of the smoothing into a series of quasi-stationary process segments with step; 3—the measurement of the number of requests defined maximum current thresholds. This transformation received at the input of the balancing system during a single enables the implementation of discrete control. The token smoothing step duration; 4—generator of virtual events to bucket technique is extensively discussed in the literature, transmit the request via the gateway (token generator); 5— albeit within rather limited domains of applicability. The repository of virtual events for the request sent through the operational architecture of this algorithm is altered to gateway (“bucket of tokens”); 6—gateway for routing facilitate its integration into the load-balancing system requests to the input of the demultiplexer; 7—demultiplexer circuit. for the input stream of requests. Fig. 3 illustrates that the foundation of this approach is the ‘buckets of tokens’ 2.3. Demultiplexing the incoming request method, but with some adjustments and enhancements that stream facilitate its application in the processing of non-stationary Demultiplexing the incoming request stream from request flows. In this scenario, the request gateway 6 application system clients is essential when the functions as a lock jumper, allowing requests from the input performance of a single application server is inadequate to queue to go to the multiplexer only when the fill level of the effectively process this stream, necessitating the utilization ‘bucket’ of virtual events permits the request to traverse the of multiple parallel application servers with identical ‘bucket’, achieving the average flow rate at the current functionality. One can select from many ways of stream smoothing step. The velocity of the token generator 4 is multiplexing. The most straightforward option is to allocate contingent upon the strength of the incoming request requests from the incoming stream uniformly across stream. Based on the intensity measurements conducted by application system servers. In this instance, the disparity in meter 3 at each smoothing step, the configuration of the request processing times would result in certain servers token generator is executed. Consequently, we acquire experiencing temporary overloads, leading to request quasi-stationary segments of the generated request flow. losses, while other application servers operate under The applicability of this traffic generation strategy is capacity. Consequently, it is prudent to execute the restricted to instances when there exists a possibility: multiplexing of the input stream precisely as seen below. 1) Establish time intervals, referred to as stationary intervals (∆Tc), during which the average flow rate 2.4. Model training (Rc) at the input of the load balancing system The processing time for each request is an unpredictable remains almost constant. variable, resulting in real-time fluctuations of application 2) Ensure the regulated magnitude of pulsations in server load factors. Under these circumstances, balancing the smoothed stream of queries. server load factors is recommended. Fig. 4 illustrates the structural and functional framework of load balancing on application servers. 263 Figure 4: Structural and functional framework for load balancing on application servers Fig. 4 uses the subsequent designations for functional an adaptive controller with a specified quantity of blocks: 1—settler (generator) of the alignment step application servers is to mitigate the risk of server magnitude; 2—buffer for the request queue at the server equipment overload and to maintain the stability of the load application input; 3—assessment of the current value of the balancing process amidst the unpredictable duration of server application load factor (evaluations are conducted at request processing by each server. The objective of each alignment step); 4—calculation of the determinant of synthesizing such a regulator pertains to the established the matrix of regulatory connections among server boundary value problem of analytically designing applications (resulting from the resolution of the regulators to minimize the R. Bellman functional within the configuration equation); 5—computation of the determinant realm of continuous dynamic control systems for entities of the resource share ∆ (specifically, the number of requests characterized by ordinary first-order linear differential to be redistributed at each alignment step among each equations. The application of the synthesis results server application). The load balancing process is a facilitated a more uniform loading of the server equipment deliberate iterative procedure for the real-time and ensured the requisite stability and length of the redistribution of requests inside the request queue buffers balancing procedure despite the aforementioned for processing at the inputs of each application server. A unanticipated events. The trajectory of traffic flow specific quantity of requests is extracted from one server’s regulation is dictated by the suitably constructed R. Bellman queue and subsequently transferred to another server’s functional. The role of monitoring trends in variations in queue by the established alignment procedure. This processed flow intensity on servers is executed through the redistribution aims to diminish the disparity between the incremental integration of the relevant differential tuning load factor values of the servers comprising the line, equation. In the analytical design of the controller, the facilitating load balancing across each server in the line. The structure of the Bellman function was defined, enabling the technique operates so that at each alignment step, formulation of the tuning equation, the specification of the determined by setter 1 based on the measured current load function, and the derivation of the appropriate Bellman values of each server, it ascertains the current state of the equation. The task of designing a controller is simplified to control link matrix 4 (as a result of the incremental solving the Riccati equation, a matrix quadratic equation solution). This matrix delineates the direction of request essential for determining the matrix component of the redistribution across server pairs, while the resource share Bellman function. Substituting the identified matrix into the determinant of 5, derived from measurements of current control expression yields the final formulation for the incoming request traffic intensity, specifies the number of required controller. A regulator is synthesized to maintain a requests to be transferred from one server to another. This consistent trajectory of state changes in the regulation publication does not include a formal synthesis of the object’s phase space C2, adhering to defined quality adaptive system controller that executes load balancing on parameters of the transient process. The controller must application servers. A synthesis was specifically conducted observe both the variations in the intensity of incoming in [1]. The principles of analytical regulator theory are request flows and the dynamics of the transient process of presented in references [9–14]. Only the subsequent load factor equalization to minimize control errors while information should be noted. The objective of synthesizing considering constraints that maintain the stability of the 264 control system. Initial parameters of the equalization design of this regulator must address the following inherent system: the number of servers in the queue and the physical restrictions. Physical Constraint 1: attenuation coefficient for the Bellman function α. The s1  s2  s3  ...  sn  F . (1) Here’s the translated text: where F represents the to the aforementioned constraints will decrease the risk of total bandwidth of the application server line, server traffic overflow. F  f1  f 2  f3  ...  f n  const , f1 , f 2 , f3 ,..., f n are the 2.5. Essential Factors for Operating PHP server bandwidths, and s1 , s2 , s3 ,..., sn are the flow Applications Across Multiple Servers intensities of requests at the inputs of application servers. Physical constraint 2: the unpredictability of request Having addressed load balancing, the subsequent flow ripples. pertinent inquiry is: how are sessions managed? Sessions Physical constraint 3: Ambiguity regarding the enable programs to circumvent the stateless characteristic processing duration of each specific request by each of HTTP and retain information across multiple requests application server. The efficiency of the load balancing (e.g., authentication status and shopping cart contents). procedure on the servers, from a physical perspective, is the PHP, by default, retains sessions on the server’s disk that aggregate of the squares of the discrepancies in the load processes the user’s request. For instance, when User A factors of each pair of application servers. This number submits a request to Server B, a session for User A is should be reduced, as a value of zero indicates that the load established and retained on Server B (Fig. 5) [11]. factors of each server in the line will be identical. Adhering Figure 4: Basic load balancer schematic Nonetheless, when requests are distributed among imperative to ensure that the session store does not become numerous servers, this setup is likely to lead to a singular point of failure. This can be circumvented by malfunctioning functionality. For instance, consumers may configuring the store in a clustered arrangement. discover their shopping cart is unexpectedly empty midway Consequently, if one server in the cluster fails, it is not through the process; they may be arbitrarily redirected to catastrophic, as another can be incorporated to substitute it the login page; or they may realize that all their responses [15]. Persistent Sessions. An alternative to session caching in a survey have been erased while completing it. Two is Session Stickiness, also known as Session Persistence. alternatives exist to mitigate this: centrally stored sessions User queries are routed to the same server for the duration and sticky sessions. Centrally Stored Sessions. Sessions may of their session. Although it may initially appear to be a be centrally saved via a caching server (e.g., Redis or wonderful concept, there are various possible downsides, Memcached), a database (e.g., MySQL or PostgreSQL), or a including Will thermal gradients emerge within the cluster? shared filesystem (e.g., NFS or GlusterFS). The optimal What occurs when a server is inaccessible, overloaded, or choice among these choices is a caching server. This is due requires an upgrade? Consequently, I do not endorse this to two factors: They are an in-memory storage system based strategy. on key-value pairs, providing superior responsiveness compared to SQL databases; sessions are consistently 3. Conclusions written upon the conclusion of a request, whereas SQL databases need writing to the database with each request. In several application systems, such as ‘client/server’, which This requirement may result in table locking and sluggish exhibit high traffic intensity, the processing of client write operations. When centrally storing sessions, it is requests is executed by a series of concurrently operating application servers. Owing to the erratic fluctuations in 265 request flow and the variable duration of their processing References by application servers, these servers, unless specific measures are implemented, experience random and uneven [1] D. Bakhtiiarov, G. Konakhovych, O. Lavrynenko, An loading—resulting in some servers becoming overloaded Approach to Modernization of the Hat and COST 231 and consequently losing requests, while others remain Model for Improvement of Electromagnetic underutilized. In [1], a formal balancing method was Compatibility in Premises for Navigation and Motion developed to avert potential short-term overloads of Control Equipment, in: 5th International Conference application servers during their operation, thereby on Methods and Systems of Navigation and Motion promoting the sustainable functioning of the application Control (MSNMC) (2018) 271–274. doi: system amidst uncertainties in the dynamics of the 10.1109/MSNMC.2018.8576260. aforementioned factors. This study presents a potential [2] F. Xia, et al., Community-based Event Dissemination option for the technical implementation of this strategy. with Optimal Load Balancing, IEEE Trans. Comput. The structural-functional model of load balancing for 64(7) (2015) 1857–1869. the application system’s server line is delineated, and [3] A. Nahir, A. Orda, D. Raz, Schedule First Manage designed to operate in conditions where the incoming Later: Network-Aware Load Balancing, Proc. IEEE request flow from clients is random, unexpected, non- INFOCOM (2013) 510–514. stationary, and pulsating. The model utilizes the adaptive [4] J. Doncel, S. Aalto, U. Ayesta, Economies of Scale in principle of reallocating demultiplexed request sub-streams Parallel-Server Systems, Proc. IEEE INFOCOM (2017) across application servers through real-time monitoring of 1–9. fluctuations in the incoming request stream intensity and [5] O. Veselska, et al., A Wavelet-Based Steganographic the current load levels of the application servers. This Method for Text Hiding in an Audio Signal, Sensors, paradigm necessitates the implementation of the following 22(15) (2022) 5832. three processes: [6] R. Odarchenko, et al., Empirical Wavelet Transform in Speech Signal Compression Problems, in: IEEE 8th 1) Establishment of the incoming request flow to International Conference on Problems of prevent short-term server line overloads. Infocommunications, Science and Technology (PIC 2) Demultiplexing the incoming request stream into S&T) (2021) 599–602, doi: 10.1109/PICST54195.2021. multiple parallel substreams based on the number 9772156. of application servers in the line. [7] D. S. Boger, J. S. Fraga, E. Alchieri, Reconfigurable 3) Equalization of the current load factor values of Scalable State Machine Replication, LADC (2016) 1–8. application servers. [8] N. Santos, A. Schiper, Achieving High-Throughput State Machine Replication in Multi-Core Systems, The formation of an incoming request stream to the ICDCS (2013). application server line is examined. It is demonstrated that [9] O. Lavrynenko, et al., Protected Voice Control System the proper functioning of this load-balancing method of UAV, in: IEEE 5th International Conference Actual requires the incoming request traffic to be converted into a Problems of Unmanned Aerial Vehicles Developments sequence of quasi-stationary segments representing a (APUAVD) (2019) 295–298. doi: 10.1109/APUAVD- discrete random process. It is essential to align the intervals 47061.2019.8943926. of stationarity of this request flow with the intervals of the [10] O. Solomentsev, et al., A Procedure for Failures discrete control steps for equalizing the load factor values of Diagnostics of Aviation Radio Equipment, application servers. A modification of the established Proceedings—International Conference on Advanced technological approach for packet traffic creation, referred Computer Information Technologies, ACIT (2023) to as the “bucket of tokens”, is proposed. The token 100–103. doi: 10.1109/ACIT58437.2023.10275337. generator’s performance is determined by the intensity of [11] D. Bakhtiiarov, et al., Method of Binary Detection of the incoming request stream. Specifically, based on the Small Unmanned Aerial Vehicles, in: Cybersecurity intensity measurements conducted by the meter at each Providing in Information and Telecommunication smoothing step, the token generator is calibrated. Systems, vol. 3654 (2024) 312–321. Consequently, we acquire quasi-stationary segments of the [12] P. J. Marandi, et al., Filo: Con-Solidated Consensus as generated request flow. a Cloud Service, ATC (2016). A technological technique for load balancing on [13] M. Poke, T. Hoefler, DARE: High-Performance State application servers has been created, characterized as a Machine Replication on RDMA Networks, HPDC deliberate iterative procedure for the real-time (2015) 107–118. redistribution of requests stored in the buffers of request [14] W. Zhao, Performance Optimization for State queues at the entry points of each application server. This Machine Replication based on Application Semantics, redistribution aims to diminish the disparity between the J. Syst. Software, 122(C) (2016) 96–109. load factor values of the servers constituting the line. The [15] J. R. Lorch, et al., Leveraging Lightweight Virtual implemented balancing algorithm enables a specified Machines to Easily and Efficiently Construct Fault- number of application servers to mitigate the risk of short- Tolerant Services, NSDI (2015). term server overloads and ensures the stability of the load- balancing process amidst the unpredictable duration of request processing by each server. 266