=Paper=
{{Paper
|id=Vol-3826/short15
|storemode=property
|title=Distribute load among concurrent servers (short paper)
|pdfUrl=https://ceur-ws.org/Vol-3826/short15.pdf
|volume=Vol-3826
|authors=Denys Bakhtiiarov,Bohdan Chumachenko,Oleksandr Lavrynenko,Volodymyr Chupryn,Veniamin Antonov
|dblpUrl=https://dblp.org/rec/conf/cpits/BakhtiiarovCLCA24
}}
==Distribute load among concurrent servers (short paper)==
Distribute load among concurrent servers ⋆
Denys Bakhtiiarov1,2,*,†, Bohdan Chumachenko1,†, Oleksandr Lavrynenko1,†,
Volodymyr Chupryn1,† and Veniamin Antonov1,†
1
National Aviation University, 1 Kosmonavta Komarova ave., 03058 Kyiv, Ukraine
2
State Scientific and Research Institute of Cybersecurity Technologies and Information Protection, 3 Maksym Zaliznyak, 03142
Kyiv, Ukraine
Abstract
A technical implementation option for load balancing among concurrently operating application servers is
proposed to mitigate the risks of overload amid substantial unpredictable fluctuations in request flow to the
application system and the variable processing durations by each application server. The structural-
functional model for load balancing inside the server line of the application system is delineated, and
designed to operate under conditions where the incoming request flow from clients is characterized as
random, unexpected, non-stationary, and pulsing. A proposal is made for a system that generates a flow of
requests to the application server line, ensuring the alignment of the stationary intervals of this flow with
the intervals of discrete control for equalizing server load factors. A technological framework for load
balancing on application servers is proposed, facilitating the equalization of load factors among application
system servers through real-time transmission, allowing the redistribution of a portion of incoming request
traffic from more heavily loaded servers to those with lesser loads.
Keywords
request, application, server, client, load balancing1
1. Introduction Users between the line servers (steps 2 and 3) will
implement the distribution strategy outlined below. The
In practice, when utilizing computerized real-time request redirection server transmits the IP address of the
application systems like ‘client/server’ that permit remote subsequent application server, as determined by the
access for clients via the Internet, such as various interactive distribution method, to the user terminal (step 4), and
help systems, the effectiveness is assessed by the value of subsequently readies itself to handle a new request from
τs—the average service duration of each stream of customer another user, advancing to step 1. The user utilizes the IP
requests entering the application system input. A reduced address of the designated application server to retrieve the
value indicates that the consumer is likely to receive a online result of processing his request from that server
response to their request more promptly [1]. At low request (step 5). The designated server resolves the application issue
flow intensities, queues at the application system’s input are and transmits the outcome to the user (step 6) [2].
virtually nonexistent, thereby making τs directly contingent Specifically, Fig. 1 illustrates that a series of specialized
upon the performance of the server hardware hosting the application software and hardware servers process client
application software. Issues occur when the volume of requests concurrently. Choosing the number of servers in
incoming requests is misaligned with the processing speed the configuration should align the request traffic intensity
of the server infrastructure, leading to the accumulation of with the application system’s performance. Nonetheless, the
unprocessed requests, which in turn results in an issues get intricate when addressing an erratic and
unacceptable increase in service request duration and unpredictable influx of requests, characterized by
certain instances, the loss of some requests. Given the high substantial fluctuations in both intensity and duration. In
intensity of request flow in several applications, it is this scenario, due to erratic variations in request volume and
essential to partition it in real-time into parallel the uncertain processing times by application servers, these
demultiplexed substreams and execute their concurrent servers, in the absence of specific interventions, experience
online processing utilizing a series of application servers uneven and arbitrary loading—resulting in some servers
with identical functionality. For instance, as illustrated in becoming overloaded and consequently losing requests,
Fig. 1. Before the processing of a user’s request by an while others remain underutilized. Unforeseen variations in
application server, it is initially received by the request the volume of requests directed to any application server
redirection server (step 1), which employs a block to can impede request processing due to potential transient
ascertain the current application server number designated server overloads.
for the request and allocates the request stream in real-time.
CPITS-II 2024: Workshop on Cybersecurity Providing in Information 0000-0003-3298-4641 (D. Bakhtiiarov);
and Telecommunication Systems II, October 26, 2024, Kyiv, Ukraine 0000-0002-0354-2206 (B. Chumachenko);
∗
Corresponding author. 0000-0002-3285-7565 (O. Lavrynenko);
†
These authors contributed equally. 0000-0001-9412-7413 (V. Chupryn);
bakhtiiaroff@tks.nau.edu.ua (D. Bakhtiiarov); 0000-0003-2244-262X (V. Antonov)
bohdan.chumachenko@npp.nau.edu.ua (B. Chumachenko); © 2024 Copyright for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
oleksandrlavrynenko@tks.nau.edu.ua (O. Lavrynenko);
volodymyr.chupryn@npp.nau.edu.ua (V. Chupryn);
veniamin.antonov@npp.nau.edu.ua (V. Antonov)
CEUR
Workshop
ceur-ws.org
ISSN 1613-0073
260
Proceedings
Figure 1: Generalized structural and functional model for the allocation of user requests among application servers
Consequently, there is both theoretical and practical and autonomously, with a software server that facilitates
interest in developing a mechanism for load balancing on real-time adaptive distribution of request flow among the
application servers, specifically a dynamic load balancing application servers to achieve more or less uniform load
approach among collaborating application servers in real- balancing. The parameters of the examined load balancing
time. This method’s implementation aims to avert potential technology are established through the resolution of the
short-term overloads of individual application servers boundary value problem associated with the analytical
during their operation, thereby fostering the sustainable design of the relevant regulator, utilizing the synthesis of
functioning of the application system amid uncertainties in the corresponding R. Bellman functional and iterative
the dynamics of the aforementioned environmental factors. numerical integration of the derived tuning equation. The
The suggested technique must assure the stability of the implemented technical solution facilitates nearly uniform
request distribution process, considering the dynamics of loading of server equipment under the specified conditions
unforeseen fluctuations in this flow. The theoretical while maintaining an acceptable average waiting time for
foundation of this strategy is explained in [3–5]. This paper service requests with the minimal necessary server
presents a potential option for its technical implementation, resources.
the core of which is as follows. The application system
hardware depicted in pic.1 comprises a software server 2.1. System model for load balancing on
(ROM server+server definition unit) that concurrently and servers
autonomously manages multiple application servers. This
This work introduces a structural and functional model for
software server facilitates a real-time adaptive distribution
load balancing throughout the server line of the application
of requests among the application servers to maintain a
system, designed to operate under conditions where the
more uniform load during unpredictable surges in request
incoming request flow from clients is random, unexpected,
flow.
non-stationary, and pulsing. Server load balancing entails
the real-time redistribution of incoming request flows from
2. Main Part heavily loaded application servers to those with lighter
The theoretical foundation of the employed load balancing loads, thereby achieving a more uniform distribution of load
method is delineated in [1, 2, 6]. This paper presents a across the servers. Fig. 2 illustrates this model as a series of
potential option for its technical implementation, the core numbered blocks, each representing a certain functional
of which is as follows. The application system comprises a component of the model’s structure [7].
series of application servers that must function concurrently
261
Figure 2: Structural and functional paradigm for load balancing between concurrently operating servers of the applied
information system
Fig. 2 use the following designations for functional blocks: that the generated quasi-stationary traffic segments receive
1—smoothing of an input request stream; 2—creation of approximately equal load factors across all servers. The
quasi-stationary segments of incoming request traffic at model illustrated in Fig. 2 is founded on the adaptive
time intervals ∆ti—smoothing steps (as the formation principle of reallocating demultiplexed subflows of requests
process is executed as a stepwise iterative procedure with a among application servers through real-time monitoring of
step ∆ti, while monitoring fluctuations in the intensity of fluctuations in the current intensity of the incoming request
the incoming request flow); 3—demultiplexing of the stream and the existing load levels of the application
resulting input stream of requests at each smoothing servers. Consequently, this paradigm necessitates the real-
interval ∆ti; 4—configurator of smoothing and alignment time implementation of the following three processes:
procedures (referring to the process of synchronizing the
current values of load factors for application servers seen in 1) The establishment of an incoming request flow to
pic.2), executed by software-controlled clock generators; 5— attain a more uniform temporal distribution,
assessing the current values of the intensity of the generated thereby preventing short-term overloads in the
input request stream at each smoothing interval ∆ti; 6— application server line.
buffering requests (establishing a queue of requests for 2) The demultiplexing of the incoming request
processing by the i-th application server) at the input of the stream into several concurrently operating
i-th application server; 7—evaluating the current values of subflows corresponds to the number of application
the load factor of the i-th application server at each servers in the line.
alignment step; 8—determining a singular matrix of 3) The equalization of current application server load
regulatory relationships among the variables to be aligned factors diminishes the likelihood of short-term
(i.e., between load factors on servers) at each alignment step; overload on any individual server. Examine the
9—ascertaining the precise values of the resource allocation characteristics of each of these processes.
(i.e., the amount of requests) to be allocated among the input
queues of application servers at each stage of the alignment; 2.2. Establishment of the incoming request
10—data processing of the relevant issue; A—incoming flow
request stream; B—produced flow of requests; B—query For the proper functioning of this load-balancing method,
substreams post-demultiplexing. Fig. 2 illustrates that to the incoming request traffic must be transformed into a
create quasi-stationary traffic segments, the non-stationary series of quasi-stationary segments representing a discrete
incoming request stream is initially smoothed and random process, which can be partially refined by
structured accordingly. The created input stream is specialized averaging techniques. The load balancing
demultiplexed, and the resulting parallel substreams are technology on the application system’s servers necessitates
allocated to the application system’s servers based on the the accurate structuring of request flow, specifically to
established load-balancing method. The primary objective maintain the consistency between the stationary intervals
of balancing is to attain the most accurate estimate of the of this flow, ∆Ts, and the intervals of the discrete control
uniform load across the application system servers. In other process for equalizing server load factors, τk. Some traffic
words, under conditions of unpredictable fluctuations in creation technologies do not allow for this possibility. The
incoming traffic and varying request processing times by “bucket tokens” method [6, 8] has a notable constraint in its
each server, the balancing algorithm must operate to ensure applicability, being suitable solely for scenarios where
262
actual traffic exhibits the traits of a stationary random servers, largely unjustifiable. This study presents a
process. Nevertheless, actual traffic and its derivatives must structural and functional framework for the development of
be regarded as a non-stationary discontinuous process, request flow, intended as a component of adaptive load-
rendering the straight application of the “token bucket” balancing technology for parallel servers within the
method, along with other established traffic generating application system. This diagram is illustrated in Fig. 3.
techniques, in adaptive load redistribution systems on
Figure 3: Structural and functional diagram of the request processing pipeline by a series of application servers
Fig. 3 employs the following designations for functional The implementation of this traffic processing scheme is
blocks: 1—the request queue buffer at the input of the warranted if it can transform a non-stationary flow, marked
application system (i.e., the input request storage); 2—the by unpredictable average speeds and fluctuating volumes,
parameter (generator) defining the size of the smoothing into a series of quasi-stationary process segments with
step; 3—the measurement of the number of requests defined maximum current thresholds. This transformation
received at the input of the balancing system during a single enables the implementation of discrete control. The token
smoothing step duration; 4—generator of virtual events to bucket technique is extensively discussed in the literature,
transmit the request via the gateway (token generator); 5— albeit within rather limited domains of applicability. The
repository of virtual events for the request sent through the operational architecture of this algorithm is altered to
gateway (“bucket of tokens”); 6—gateway for routing facilitate its integration into the load-balancing system
requests to the input of the demultiplexer; 7—demultiplexer circuit.
for the input stream of requests. Fig. 3 illustrates that the
foundation of this approach is the ‘buckets of tokens’ 2.3. Demultiplexing the incoming request
method, but with some adjustments and enhancements that stream
facilitate its application in the processing of non-stationary
Demultiplexing the incoming request stream from
request flows. In this scenario, the request gateway 6
application system clients is essential when the
functions as a lock jumper, allowing requests from the input
performance of a single application server is inadequate to
queue to go to the multiplexer only when the fill level of the
effectively process this stream, necessitating the utilization
‘bucket’ of virtual events permits the request to traverse the
of multiple parallel application servers with identical
‘bucket’, achieving the average flow rate at the current
functionality. One can select from many ways of stream
smoothing step. The velocity of the token generator 4 is
multiplexing. The most straightforward option is to allocate
contingent upon the strength of the incoming request
requests from the incoming stream uniformly across
stream. Based on the intensity measurements conducted by
application system servers. In this instance, the disparity in
meter 3 at each smoothing step, the configuration of the
request processing times would result in certain servers
token generator is executed. Consequently, we acquire
experiencing temporary overloads, leading to request
quasi-stationary segments of the generated request flow.
losses, while other application servers operate under
The applicability of this traffic generation strategy is
capacity. Consequently, it is prudent to execute the
restricted to instances when there exists a possibility:
multiplexing of the input stream precisely as seen below.
1) Establish time intervals, referred to as stationary
intervals (∆Tc), during which the average flow rate 2.4. Model training
(Rc) at the input of the load balancing system The processing time for each request is an unpredictable
remains almost constant. variable, resulting in real-time fluctuations of application
2) Ensure the regulated magnitude of pulsations in server load factors. Under these circumstances, balancing
the smoothed stream of queries. server load factors is recommended. Fig. 4 illustrates the
structural and functional framework of load balancing on
application servers.
263
Figure 4: Structural and functional framework for load balancing on application servers
Fig. 4 uses the subsequent designations for functional an adaptive controller with a specified quantity of
blocks: 1—settler (generator) of the alignment step application servers is to mitigate the risk of server
magnitude; 2—buffer for the request queue at the server equipment overload and to maintain the stability of the load
application input; 3—assessment of the current value of the balancing process amidst the unpredictable duration of
server application load factor (evaluations are conducted at request processing by each server. The objective of
each alignment step); 4—calculation of the determinant of synthesizing such a regulator pertains to the established
the matrix of regulatory connections among server boundary value problem of analytically designing
applications (resulting from the resolution of the regulators to minimize the R. Bellman functional within the
configuration equation); 5—computation of the determinant realm of continuous dynamic control systems for entities
of the resource share ∆ (specifically, the number of requests characterized by ordinary first-order linear differential
to be redistributed at each alignment step among each equations. The application of the synthesis results
server application). The load balancing process is a facilitated a more uniform loading of the server equipment
deliberate iterative procedure for the real-time and ensured the requisite stability and length of the
redistribution of requests inside the request queue buffers balancing procedure despite the aforementioned
for processing at the inputs of each application server. A unanticipated events. The trajectory of traffic flow
specific quantity of requests is extracted from one server’s regulation is dictated by the suitably constructed R. Bellman
queue and subsequently transferred to another server’s functional. The role of monitoring trends in variations in
queue by the established alignment procedure. This processed flow intensity on servers is executed through the
redistribution aims to diminish the disparity between the incremental integration of the relevant differential tuning
load factor values of the servers comprising the line, equation. In the analytical design of the controller, the
facilitating load balancing across each server in the line. The structure of the Bellman function was defined, enabling the
technique operates so that at each alignment step, formulation of the tuning equation, the specification of the
determined by setter 1 based on the measured current load function, and the derivation of the appropriate Bellman
values of each server, it ascertains the current state of the equation. The task of designing a controller is simplified to
control link matrix 4 (as a result of the incremental solving the Riccati equation, a matrix quadratic equation
solution). This matrix delineates the direction of request essential for determining the matrix component of the
redistribution across server pairs, while the resource share Bellman function. Substituting the identified matrix into the
determinant of 5, derived from measurements of current control expression yields the final formulation for the
incoming request traffic intensity, specifies the number of required controller. A regulator is synthesized to maintain a
requests to be transferred from one server to another. This consistent trajectory of state changes in the regulation
publication does not include a formal synthesis of the object’s phase space C2, adhering to defined quality
adaptive system controller that executes load balancing on parameters of the transient process. The controller must
application servers. A synthesis was specifically conducted observe both the variations in the intensity of incoming
in [1]. The principles of analytical regulator theory are request flows and the dynamics of the transient process of
presented in references [9–14]. Only the subsequent load factor equalization to minimize control errors while
information should be noted. The objective of synthesizing considering constraints that maintain the stability of the
264
control system. Initial parameters of the equalization design of this regulator must address the following inherent
system: the number of servers in the queue and the physical restrictions. Physical Constraint 1:
attenuation coefficient for the Bellman function α. The
s1 s2 s3 ... sn F . (1)
Here’s the translated text: where F represents the to the aforementioned constraints will decrease the risk of
total bandwidth of the application server line, server traffic overflow.
F f1 f 2 f3 ... f n const , f1 , f 2 , f3 ,..., f n are the
2.5. Essential Factors for Operating PHP
server bandwidths, and s1 , s2 , s3 ,..., sn are the flow
Applications Across Multiple Servers
intensities of requests at the inputs of application servers.
Physical constraint 2: the unpredictability of request Having addressed load balancing, the subsequent
flow ripples. pertinent inquiry is: how are sessions managed? Sessions
Physical constraint 3: Ambiguity regarding the enable programs to circumvent the stateless characteristic
processing duration of each specific request by each of HTTP and retain information across multiple requests
application server. The efficiency of the load balancing (e.g., authentication status and shopping cart contents).
procedure on the servers, from a physical perspective, is the PHP, by default, retains sessions on the server’s disk that
aggregate of the squares of the discrepancies in the load processes the user’s request. For instance, when User A
factors of each pair of application servers. This number submits a request to Server B, a session for User A is
should be reduced, as a value of zero indicates that the load established and retained on Server B (Fig. 5) [11].
factors of each server in the line will be identical. Adhering
Figure 4: Basic load balancer schematic
Nonetheless, when requests are distributed among imperative to ensure that the session store does not become
numerous servers, this setup is likely to lead to a singular point of failure. This can be circumvented by
malfunctioning functionality. For instance, consumers may configuring the store in a clustered arrangement.
discover their shopping cart is unexpectedly empty midway Consequently, if one server in the cluster fails, it is not
through the process; they may be arbitrarily redirected to catastrophic, as another can be incorporated to substitute it
the login page; or they may realize that all their responses [15]. Persistent Sessions. An alternative to session caching
in a survey have been erased while completing it. Two is Session Stickiness, also known as Session Persistence.
alternatives exist to mitigate this: centrally stored sessions User queries are routed to the same server for the duration
and sticky sessions. Centrally Stored Sessions. Sessions may of their session. Although it may initially appear to be a
be centrally saved via a caching server (e.g., Redis or wonderful concept, there are various possible downsides,
Memcached), a database (e.g., MySQL or PostgreSQL), or a including Will thermal gradients emerge within the cluster?
shared filesystem (e.g., NFS or GlusterFS). The optimal What occurs when a server is inaccessible, overloaded, or
choice among these choices is a caching server. This is due requires an upgrade? Consequently, I do not endorse this
to two factors: They are an in-memory storage system based strategy.
on key-value pairs, providing superior responsiveness
compared to SQL databases; sessions are consistently 3. Conclusions
written upon the conclusion of a request, whereas SQL
databases need writing to the database with each request. In several application systems, such as ‘client/server’, which
This requirement may result in table locking and sluggish exhibit high traffic intensity, the processing of client
write operations. When centrally storing sessions, it is requests is executed by a series of concurrently operating
application servers. Owing to the erratic fluctuations in
265
request flow and the variable duration of their processing References
by application servers, these servers, unless specific
measures are implemented, experience random and uneven [1] D. Bakhtiiarov, G. Konakhovych, O. Lavrynenko, An
loading—resulting in some servers becoming overloaded Approach to Modernization of the Hat and COST 231
and consequently losing requests, while others remain Model for Improvement of Electromagnetic
underutilized. In [1], a formal balancing method was Compatibility in Premises for Navigation and Motion
developed to avert potential short-term overloads of Control Equipment, in: 5th International Conference
application servers during their operation, thereby on Methods and Systems of Navigation and Motion
promoting the sustainable functioning of the application Control (MSNMC) (2018) 271–274. doi:
system amidst uncertainties in the dynamics of the 10.1109/MSNMC.2018.8576260.
aforementioned factors. This study presents a potential [2] F. Xia, et al., Community-based Event Dissemination
option for the technical implementation of this strategy. with Optimal Load Balancing, IEEE Trans. Comput.
The structural-functional model of load balancing for 64(7) (2015) 1857–1869.
the application system’s server line is delineated, and [3] A. Nahir, A. Orda, D. Raz, Schedule First Manage
designed to operate in conditions where the incoming Later: Network-Aware Load Balancing, Proc. IEEE
request flow from clients is random, unexpected, non- INFOCOM (2013) 510–514.
stationary, and pulsating. The model utilizes the adaptive [4] J. Doncel, S. Aalto, U. Ayesta, Economies of Scale in
principle of reallocating demultiplexed request sub-streams Parallel-Server Systems, Proc. IEEE INFOCOM (2017)
across application servers through real-time monitoring of 1–9.
fluctuations in the incoming request stream intensity and [5] O. Veselska, et al., A Wavelet-Based Steganographic
the current load levels of the application servers. This Method for Text Hiding in an Audio Signal, Sensors,
paradigm necessitates the implementation of the following 22(15) (2022) 5832.
three processes: [6] R. Odarchenko, et al., Empirical Wavelet Transform in
Speech Signal Compression Problems, in: IEEE 8th
1) Establishment of the incoming request flow to International Conference on Problems of
prevent short-term server line overloads. Infocommunications, Science and Technology (PIC
2) Demultiplexing the incoming request stream into S&T) (2021) 599–602, doi: 10.1109/PICST54195.2021.
multiple parallel substreams based on the number 9772156.
of application servers in the line. [7] D. S. Boger, J. S. Fraga, E. Alchieri, Reconfigurable
3) Equalization of the current load factor values of Scalable State Machine Replication, LADC (2016) 1–8.
application servers. [8] N. Santos, A. Schiper, Achieving High-Throughput
State Machine Replication in Multi-Core Systems,
The formation of an incoming request stream to the ICDCS (2013).
application server line is examined. It is demonstrated that [9] O. Lavrynenko, et al., Protected Voice Control System
the proper functioning of this load-balancing method of UAV, in: IEEE 5th International Conference Actual
requires the incoming request traffic to be converted into a Problems of Unmanned Aerial Vehicles Developments
sequence of quasi-stationary segments representing a (APUAVD) (2019) 295–298. doi: 10.1109/APUAVD-
discrete random process. It is essential to align the intervals 47061.2019.8943926.
of stationarity of this request flow with the intervals of the [10] O. Solomentsev, et al., A Procedure for Failures
discrete control steps for equalizing the load factor values of Diagnostics of Aviation Radio Equipment,
application servers. A modification of the established Proceedings—International Conference on Advanced
technological approach for packet traffic creation, referred Computer Information Technologies, ACIT (2023)
to as the “bucket of tokens”, is proposed. The token 100–103. doi: 10.1109/ACIT58437.2023.10275337.
generator’s performance is determined by the intensity of [11] D. Bakhtiiarov, et al., Method of Binary Detection of
the incoming request stream. Specifically, based on the Small Unmanned Aerial Vehicles, in: Cybersecurity
intensity measurements conducted by the meter at each Providing in Information and Telecommunication
smoothing step, the token generator is calibrated. Systems, vol. 3654 (2024) 312–321.
Consequently, we acquire quasi-stationary segments of the [12] P. J. Marandi, et al., Filo: Con-Solidated Consensus as
generated request flow. a Cloud Service, ATC (2016).
A technological technique for load balancing on [13] M. Poke, T. Hoefler, DARE: High-Performance State
application servers has been created, characterized as a Machine Replication on RDMA Networks, HPDC
deliberate iterative procedure for the real-time (2015) 107–118.
redistribution of requests stored in the buffers of request [14] W. Zhao, Performance Optimization for State
queues at the entry points of each application server. This Machine Replication based on Application Semantics,
redistribution aims to diminish the disparity between the J. Syst. Software, 122(C) (2016) 96–109.
load factor values of the servers constituting the line. The [15] J. R. Lorch, et al., Leveraging Lightweight Virtual
implemented balancing algorithm enables a specified Machines to Easily and Efficiently Construct Fault-
number of application servers to mitigate the risk of short- Tolerant Services, NSDI (2015).
term server overloads and ensures the stability of the load-
balancing process amidst the unpredictable duration of
request processing by each server.
266