Optimization of distributed file placement registrations on a computer network Yevhen Davydenko1, , Hlib Horban1,*, ,Alyona Shved1 and Kateryna Antipova1, 1 Petro Mohyla Black Sea National University St. 68 Desantnykiv 10, 54003, Mykolaiv, Ukraine Abstract The article considers a method for optimal placement of data files in a local network, taking into account the criterion of total request service time during the distribution of registrations. This method allows to effectively regulate the server load and use resources with maximum performance, which leads to a reduction in query execution time. An intelligent algorithm for balancing the load of distributed database network nodes that can optimize the processing of large amounts of data is investigated. The research results confirm the possibility of a significant increase in data processing speed through the use of mechanisms for optimizing the load of network nodes. Keywords distributed system, computer network, file, request, node, distribution 1 1. Introduction The development of information and communication technologies has led to the proliferation of distributed data processing systems based on computer networks. There are scientific and technical tasks of processing ultra-large data sets, the realization of which is not enough for a single computer with a single-processor architecture. An example of an ultra-large database is the EOS/DIS Earth Observation System database, which includes data from many satellites that collect information to study long-term trends in the state of the atmosphere, oceans and earth's surface with a volume of 1 Pbyte of information per year. The Stanford Linear Accelerator Center has a similar system, which has a database created for the BABAR experiment (studying the collision of subatomic particles to determine the impact of matter and antimatter behavior on the formation of the universe) and, has a volume of 1492.0 TB. Today, real computer networks are characterized by some disadvantages, especially a distributed network is a very heterogeneous medium of information transmission: some sections can be built using ATM or FDDI techniques, while others are based on slow X.25 protocols. The actual speed of information transfer directly depends on the bandwidth of the slowest network. Thus, a remote user's access to a corporate database can be significantly hampered. On the other hand, does a remote user always need absolute access to the entire information base? In many cases, only information that is directly related to their business environment is requested. The efficiency of an IoT system directly depends on traffic activity, and the lower it is, the faster the funds invested in its construction will pay off [1]. Systematic implementation requires proper Information Technology and Implementation (IT&I-2024), November 20-21, 2024, Kyiv, Ukraine Corresponding author. These authors contributed equally. davydenko@chmnu.edu.ua (Y. Davydenko); hlib.horban@chmnu.edu.ua (H. Horban); avshved@chmnu.edu.ua (A. Shved); antipova.katerina@chmnu.edu.ua (K. Antipova) 0000-0002-6512-3576 (H. Horban); 0000-0002-0547-3689 (Y. Davydenko); 0000-0003-4372-7472 (A. Shved); 0000-0002- 9012-5290 (K. Antipova) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 346 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings coordination of data distribution and storage. The optimal method for reducing traffic in communication channels is to use the client-server technique [2, 3, 4, 5, 6]. Thanks to the packet switching technique, in-depth and applied research has been conducted in 3 areas. The first direction is related to the development of the basics of packet switching theory in distribution systems [7, 8]. The second area of in-depth research is related to the mathematical theory of optimizing flows in networks and selecting profitable network routes with packet switching [9]. Such research should be conducted, in particular, using methods of expert evaluation [10, 11], the results of which obtained allow to carry out a more profound analysis of the obtained expert information aimed at a synthesis an effective and substantiated group decisions. The third direction is the implementation of scientific and applied research on the development of modern hardware and software for packet switching technology [7]. In general, research is driven by the need to improve system performance and reliability, reduce overall costs, and expand the range of services provided [7, 8, 9]. Paper [12] investigates the issue of distributing information resources across computer network nodes. Optimization-oriented algorithms for placing information files were considered. The average amount of data sent over communication channels per unit time; total request processing time; total cost of network traffic, etc. were considered as optimization criteria. There is a need to choose a numerical optimality criterion that determines the average time of user requests execution and is convenient for optimal file placement. The choice of such a characteristic of the mass service system is due to the fact that users are usually interested not in minimizing the size of the queue or any other characteristics of the mass service system, but in ensuring that their requests are processed in as little time as possible. When determining the average waiting time for requests W in the service queue, it is recommended [13] to use the following formula: ρ2 W= , λ(1 − ρ) where ρ is the load factor of the service device (0 ≤ ρ < 1); λ is the intensity of the request flow (the average number of packets claiming to be transmitted per unit of time). When setting the task of optimizing the placement of files among network nodes in order to obtain high quality service, you can keep the average request service time (excluding the service waiting time) constant and independent of the file placement. The value ρ depends on λ and the bandwidth of the serving device μ: ρ = λ/μ . Usually, the maximum allowable waiting time for requests in the service queue M is constant, so the maximum allowable service device load factor is determined from the expression: 𝑀 ρ 𝑀+𝑏 . 𝑚𝑎𝑥 When distributing requests among the service devices, they must minimize the value of W, while the route of the request is unknown in advance, i.e., the request can be processed by one service device or by several service devices sequentially. The target function (quality criterion) is selected as a combination of traffic parameters through communication channels in the network: 𝑁 𝑄(ρ) = ∑ 𝐶𝑖 ρ𝑖 , 𝑖=1 where 𝐶𝑖 is the weighting factors that take into account the average packet service time of the ρ communication channel: 𝐶𝑖 = 𝑏𝑖 ; 𝑏 = λ. The problem of creating switching systems designed to analyze the state of the network at any given time and optimize data transportation has not been fully resolved. Systems that perform relatively simple optimization of the distribution of data transmission over network channels remain 347 extremely expensive, and the efficiency of using the capabilities of universal switches when transmitting large amounts of multimedia information over several channels simultaneously is relatively low. When providing multi-user access to information resources stored in the form of a database, it is necessary to rationally place the database files in the nodes of a computer network. There are several relevant mathematical models that differ in the type of objective function and the set of constraints that are taken into account when searching for an optimal method [14]. Today, only application software packages, namely Matlab, are actually used to find optimal solutions [6]. After identifying the optimal solution, model stability and sensitivity analysis is usually performed. 2. Organization of optimization of distributed file placement registrations in a computer network based on the theory of queuing Let's consider a method of building a model of rational file distribution of a DBMS over the nodes of a computer network, the essence of which is the mathematical apparatus of the concept of queuing. The queue theory is the basis for building a computer network model in many works on optimizing file allocation in a DBMS [9, 15, 16], however, a mass service system with 1 servicing device - a single bus - is taken as a mathematical model of a local network with a bus topology. Let's analyze the network as a multi-device mass service system, i.e., as a system where several identical service devices process 1 request queue. The request at the very beginning of the queue is sent to one of the free devices for service. The multi-device queue shown in Figure 1 differs from the queue coordination in Figure 2, which shows several single-device queues operating in parallel. If in all cases the service devices and the incoming flow of requests are identical, and in singledevice queues, requests arrive randomly and, once in the queue, remain there (otherwise, moving to another queue is prohibited), then it turns out that the operation of a multi-device queue is preferable to single-device queues operating in parallel. Figure 1: Batch processing when using a queue with a number of service devices. 348 Figure 2: Batch processing when using several parallel queues, with 1 service device. Let's set the optimal number of copies for each file of distributed databases, considering the computer network as a series of overlapping multi-device SMOs with the 1st queue of requests to a particular file. During the design of a CMA, worst-case scenarios are often performed. In this situation, the estimates are not very accurate, but at least the errors provide a margin of safety. In real systems, service times fluctuate. The variation can be expressed by calculating the mean and standard deviation for the service time of specific types of equipment. The best case is when the service time is constant, i.e., standard deviation = 0 (i.e., no deviation from the mean). The worst case is also when the maintenance time follows an exponential distribution, i.e., when the standard deviation = the mean (for a standard deviation, this is too high a value, which shows that there is a large spread of maintenance time values). It is worth noting that the exponential distribution is not always the worst case; for example, the mean of 5, 10, 20, and 200 = 58.75, and the standard deviation is approximately 81 [16]. Let's denote the average 𝑥 by 𝐸(𝑥) (mathematical expectation), and the standard deviation 𝑥 − σ𝑥 . Then: 2 (𝑥−𝐸(𝑥)) σ𝑥 = √ . (1) 𝑁 Guided by empirical information rather than specific indicators, in the expression 𝑥 − σ𝑥 expression, 𝑁 − 1 is used instead of 𝑁, where 𝑁 is the number of experiments. Hence: (𝑥−𝑥̅ )2 σ𝑥 = √ 𝑁−1 , (2) where 𝑥̅ is the average value of the experimental value. The estimation of the mean and standard deviation is more accurate the more empirical values are used. When 𝑁 is larger, the difference between equations (1) and (2) is negligible. Another method of determining the typical deviation: σ𝑥 = √𝐸(𝑥 2 ) − 𝐸 2 (𝑥). Paper [17] shows that 95% of the query response times do not exceed the average response time plus 2 standard deviations. In other words, about 5% of responses take longer than this value. 349 In order to simplify mathematical calculations, system load is usually expressed in relative terms compared to the maximum load that the system can handle. As a rule, the value is denoted by the letter ρ. As shown in the above definitions, fully utilized equipment has ρ = 1 and free equipment has ρ = 0. Thus, the equipment utilization rate ranges from zero to one, and is sometimes expressed as a percentage. There is a rule [18] according to which the response time curve rises sharply when equipment utilization exceeds eighty percent. When designing a queuing system, the goal is to ensure that its utilization at constant loads is within sixty to seventy percent [9]. The exact method for determining the equipment utilization rate in a mass service system with 1 service device is given by the expression: ρ = 𝐸(𝑛)𝐸(𝑡𝑠 ), where 𝐸(𝑛) is the average number of requests received per unit of service time; 𝐸(𝑡𝑠 ) is the average time for servicing the 1st request. Suppose there are M service devices of the same type. So, it sends requests to any device per unit of time 𝐸(𝑛)/𝑀 requests per unit of time. Consequently, the utilization rate of a particular device: 𝐸(𝑛)𝐸(𝑡𝑠 ) ρ= . (3) 𝑀 The ratio ρ should be less than one. Let 𝑤 be the number of requests waiting to be served at a certain time, and 𝑞 be the number of system requests waiting and being served at that time. Let it go on 𝑡𝑤 is the service waiting time, and 𝑡𝑞 is the time a request stays in the system, i.e. the time it spends both waiting and being served. The average values 𝑤, values 𝑞, 𝑡𝑤 and 𝑡𝑞 , let's set it to 𝐸(𝑤), 𝑞, 𝐸(𝑡𝑤 ) and 𝐸(𝑡𝑞 ). Equality is always objective: 𝑡𝑞 = 𝑡𝑤 + 𝑡𝑠 , 𝐸(𝑡𝑞 ) = 𝐸(𝑡𝑤 ) + 𝐸(𝑡𝑠 ). Because E(n) is the average number of incoming requests, and then analyzes the steady state 𝐸(𝑤) = 𝐸(𝑛)𝐸(𝑡𝑤 ), (4) and 𝐸(𝑞) = 𝐸(𝑛)𝐸(𝑡𝑞 ). The quantities 𝑤, 𝑞, 𝑡𝑤 and 𝑡𝑞 refer to requests waiting on any of the servers 𝑀. Substituting in 𝐸(𝑞) = 𝐸(𝑛)𝐸(𝑡𝑞 ) = 𝐸(𝑛)𝐸(𝑡𝑤 ) + 𝐸(𝑛)𝐸(𝑡𝑠 ), the corresponding values from equations (3) and (4), we obtain 𝐸(𝑞) = 𝐸(𝑤) + 𝑀ρ. The probability of having N requests in the system at a given time. (𝑀ρ)𝑁 𝑃(𝑞 = 𝑁) = 𝑃0 , if 𝑁 < 𝑀, 𝑁! and (𝑀ρ)𝑁 𝑃(𝑞 = 𝑁) = 𝑃 , if 𝑁 ≥ 𝑀, 𝑀! 𝑀𝑁−𝑀 0 where 𝑀−1 −1 (𝑀ρ)𝑁 (𝑀ρ)𝑀 𝑃0 = [ ∑ + ] . 𝑁! (1 − ρ)𝑀! 𝑁=0 The probability that all service devices are busy at a given time is ∞ 𝐵 = 𝑃(𝑞 ≥ 𝑀) = ∑ 𝑃(𝑞 = 𝑁), 𝑁=𝑀 and is calculated by the formula 350 (𝑀ρ)𝑁 ∑𝑀−1 𝑁=0 1− 𝑁! 𝑀 (𝑀ρ)𝑁 ∑𝑁=0 𝐵= 𝑁! . (𝑀ρ) 𝑁 ∑𝑀−1 𝑁=0 1−ρ 𝑁! (𝑀ρ)𝑁 ∑𝑀𝑁=0 𝑁! The equation reduces to 𝐵 = ρ when 𝑀 = 1. The factor B is present in all other equations for systems with multiple service devices. To determine its quantitative indicators, the function B is described, which determines the probability of loading all devices depending on the numerical value of the equipment utilization factor and the number of service devices 𝑀. In a QS with multiple service devices, the average number of requests pending service is ρ 𝐸(𝑤) = 𝐵 . 1−ρ So, ρ 𝐸(𝑞) = 𝐵 + 𝑀ρ. 1−ρ A typical deviation for 𝑤 is: 1 σ𝑤 = √𝐵ρ(1 + ρ − 𝐵ρ). 1−ρ Average waiting period before processing is: 𝐵𝐸(𝑡𝑠 ) 𝐸(𝑡𝑤 ) = . 𝑀(1 − ρ) So, the average time spent in the queue is: 𝐵𝐸(𝑡𝑠 ) 𝐸(𝑡𝑞 ) = + 𝐸(𝑡𝑠 ). 𝑀(1 − ρ) Typical waiting period before processing rejection is: 𝐸(𝑡𝑠 ) σ𝑡𝑤 = √𝐵(2 − 𝐵) , 𝑀(1 − ρ) and the typical deviation of the time spent in the queue is: 𝐸(𝑡𝑠 ) σ𝑡 𝑞 = √𝐵(2 − 𝐵) + 𝑀2 (1 − ρ)2 . 𝑀(1 − ρ) The probability that the waiting time exceeds t is determined by the following formula: 𝑃(𝑡𝑤 ≥ 𝑡) = 𝐵𝑒 −𝑀(1−ρ)𝑡/𝐸(𝑡𝑠 ) . A real system can be coordinated so that a few requests do not wait for service at all, and a small fraction of them are delayed for a long period of time. In this case, the average waiting time of a delayed request is much higher than 𝐸(𝑡𝑤 ). Let's set the average delay time 𝐸(𝑡𝑤 ) as the average period of time for requests that must wait. The probability that a request will be in the queue is B. Thus, the average waiting time is 𝐸(𝑡𝑤 ) = 𝐵𝐸(𝑡𝑑 ) + (1 − 𝐵)0 = 𝐵𝐸(𝑡𝑑 ) . But: 𝐵𝐸(𝑡𝑠 ) 𝐸(𝑡𝑤 ) = . 𝑀(1 − ρ) Therefore: 𝐸(𝑡𝑠 ) 𝐸(𝑡𝑑 ) = . 𝑀(1 − ρ) The previous equations for queues with a number of servers are based on the assumption that service times follow an exponential distribution. There are no simple expressions that describe multi- instrument QS systems that have better service times than an exponential distribution, but it would be useful to use a mathematical approximation tool to estimate in such situations. 351 There are several cases where the theory described above is incorrect. The above formulas serve to approximate the most difficult situations that exist in reality. The reason is the assumption of arbitrariness of the request and (sometimes) indicativeness of the service time. In reality, there may be a more favorable request than a random one. But there are 2 types of situations where queues and delays are much worse than the ones obtained from the above formulas. First, the maximum number of requests can be received in a short period of time. In some cases, it cannot be assumed that the arrival time value follows a Poisson distribution. It is worth emphasizing that most forecasting programs for these systems are designed for a Poisson input event stream. To select a more suitable model from those on offer, it is necessary to assess user requirements. Table 1 shows whether or not the models include the following features of computer networks, information bases, and applications. Table 1 Comparison of models 1 2 3 4 5 6 Model I + + - - - - Model II + - + + + - Model III - + - + + + Studying the numerical results of the implementation of models I and II, built with a unified approach, we can emphasize the following features of the models: 1. The obtained examples of tests of the optimal allocation matrix for distributed databases show a huge dependence between the chosen optimality aspect and the final allocation matrix. 2. In the obtained matrices of rational file allocation, when minimizing the average amount of data sent and minimizing the single processing time of absolutely all requests received by the system per unit of time, the assumption of uniform load of network nodes is clearly violated. It is clear that the 1st nodes will be the most loaded. 3. To increase the system throughput, you can apply a restriction on the time it takes to wait for a request from any node as an auxiliary condition. Let 𝑎𝑖𝑗𝑠 be the waiting time required to execute a request initiated at node 𝐾𝑗 to file 𝐹𝑖 contained in the s-th node; 𝑇𝑖𝑗 be the maximum request execution time for file 𝐹𝑖 initiated at node 𝐾𝑗 . There is a relationship between the values 𝑎𝑖𝑗𝑠 and 𝑇𝑖𝑗 : 𝑎𝑖𝑗𝑠 (1 − 𝑥𝑖𝑗 )𝑥𝑖𝑠 ≤ 𝑇𝑖𝑗 . For 𝑗 ≠ 𝑠, 1 ≤ 𝑖 ≤ 𝑚. In order to obtain constraints from this relation, we need to express the values of 𝑎𝑖𝑗𝑠 in terms of the variables 𝑥𝑖𝑗 . This is very difficult to do. 4. The above query processing scheme practically does not fit into the parallelism of information processing in the network, and also does not take into account the very common situation of complex queries (simultaneous access to several files from 1 node). For example, the local database of the host 𝐾𝑗 contains the files 𝐹𝑖 , and 𝐹𝑖+1 and the local information database of the node 𝐾𝑗+1 contains the files 𝐹𝑖+1 and 𝐹𝑖+2 . The node 𝐾𝑗 starts a complex request for the files 𝐹𝑖 and 𝐹𝑖+1 . According to the given scheme, both of these files will be processed in the node 𝐾𝑗 in turn. However, it would be more logical to send a request to process file 𝐹𝑖+1 loaded) when searching for file 𝐹𝑖 on node 𝐾𝑗 . 5. However, if the issue is solved in a comprehensive manner, i.e., by software optimization and even hardware upgrades, then the load of some network nodes will not affect the speed of operation. Therefore, despite the above disadvantages, these models of the optimization problem can be applied in practice when designing certain databases. Thus, the proposed mathematical models of the optimization problem of file allocation of the DBMS on the nodes of the local network can be successfully applied in the design of certain distribution 352 databases, using a preliminary assessment of user requirements and a software package for the purpose of statistical collection and optimal redistribution of requests. 3. Coordination of optimization of file placement of the database by a single time of request service Several works [8, 9, 14, 16] have been devoted to solving the problem of rational placement of information files on local network nodes, which differ in both the problem statement and the methods of its solution. We study a network with a single bus topology. Local area networks with a bus topology are characterized by relative ease of management, low arbitration time, ease of expansion, and fairly high reliability (due to parallel connections of nodes to the channel) [9]. Let's say that a query that comes to any network node involves access to a database file. We will distinguish between 2 types of requests: search requests and fix requests. Queries are served in the node in the order of receipt. To save resources, we do not implement a priority system. A search request is initiated in a specific node. If a copy of the required file is contained in the local node database from which the request came, it is processed. If a copy of the required file is not in the local database of this node, the search request is sent to a free node that contains a copy of the required file, processed there, and the result is sent to the original node. As an aspect of rationality, a single time required to service requests received by the system within a unit of time is accepted. The bus topology, the uniformity of communication lines and their short length in local area networks make the sending time independent of the request node and the transmission node. Let: • 𝑛 is the number of network nodes; • 𝑚 is the number of independent files of distributed databases; • 𝐾𝑗 is the j-th steam node; • 𝐹𝑖 is the i-th file of distributed databases; • 𝐿𝑖 file size 𝐹𝑖 ; • 𝑏𝑗 is the storage capacity of the 𝐾𝑗 node, which is intended to host files; • s is the number of search query types; • λ𝑘𝑖𝑗 is the intensity of k-type search requests to the file 𝐹𝑖 from the node 𝐾𝑗 ; • 𝑡𝑘𝑖𝑗 is the processing time of a k-type search request to the file 𝐹𝑖 in the node 𝐾𝑗 ; (1) • 𝑇𝑘𝑖 is the time of sending a k-type search request to the file; (2) • 𝑇𝑘𝑖 is the time it takes to send a response to a k-type search request to the file 𝐹𝑖 ; • r is the number of types of corrections; • λ′𝑙𝑖𝑗 is the intensity of l-type fixes of the file 𝐹𝑖 from the node 𝐾𝑗 ; • ′ 𝑡𝑙𝑖𝑗 is the processing time for fixing the l-type of the file 𝐹𝑖 in the node 𝐾𝑗 ; • ′ 𝑇𝑙𝑖 is the time of sending the l-type file patch 𝐹𝑖 ; • 𝑥𝑖𝑗 (𝑖 = 1, … , 𝑚; = 1, … , 𝑛)are the values determined by the formula: − 𝑥𝑖𝑗 = 1, if a copy of the file 𝐹𝑖 is located in the node 𝐾𝑗 , − 𝑥𝑖𝑗 = 0, if a copy of the file 𝐹𝑖 is not located in the node 𝐾𝑗 . The time it takes to send data from the node 𝐾𝑗 data during the execution of a k-type search query (1) (2) to the file 𝐹𝑖 , is equal to (𝑇𝑘𝑖 + 𝑇𝑘𝑖 ) (1 − 𝑥𝑖𝑗 ). Then the only time required to send data through 353 communication channels between nodes when executing search queries received by the system during a unit of time is set as 𝑠 𝑚 𝑛 (1) (2) 𝑇 = ∑ ∑ ∑ λ𝑘𝑖𝑗 (𝑇𝑘𝑖 + 𝑇𝑘𝑖 ) (1 − 𝑥𝑖𝑗 ). 𝑘=1 𝑖=1 𝑗=1 By accepting 𝑠 (1) (2) 𝑎𝑖𝑗 = ∑ λ𝑘𝑖𝑗 (𝑇𝑘𝑖 + 𝑇𝑘𝑖 ) , 𝑘=1 we get 𝑚 𝑛 𝑇 = 𝑇0 − ∑ ∑ 𝑎𝑖𝑗 𝑥𝑖𝑗 , 𝑖=1 𝑗=1 where 𝑇0 ≡ ∑𝑚 𝑛 𝑖=1 ∑𝑗=1 𝑎𝑖𝑗 . The time it takes to send from the node 𝐾𝑗 information when performing a l-type file correction 𝐹𝑖 is equal to ∑𝑛𝑝=1,𝑝≠𝑗 𝑇𝑙𝑖′ 𝑥𝑖𝑝 .Then the only time required to send information over communication channels between nodes when fulfilling adjustment requests received by the network within a unit of time is set as 𝑟 𝑚 𝑛 𝑛 𝑇 = ∑ ∑ ∑ ∑ λ′𝑙𝑖𝑗 𝑇𝑙𝑖′ 𝑥𝑖𝑝 . ′ 𝑙=1 𝑖=1 𝑗=1 𝑝=1,𝑝≠𝑗 If you put 𝑟 𝑎𝑖𝑗 = ∑ λ′𝑙𝑖𝑗 𝑇𝑙𝑖′ , ′ 𝑙=1 then 𝑚 𝑛 𝑛 𝑚 𝑛 𝑛 𝑚 𝑛 𝑇 = ∑ ∑ 𝑎𝑖𝑗 ∑ 𝑥𝑖𝑝 = ∑ ∑ 𝑥𝑖𝑗 ∑ 𝑎𝑖𝑝 = ∑ ∑ 𝑎̂𝑖𝑗 𝑥𝑖𝑗 , 𝑖=1 𝑗=1 𝑝=1,𝑝≠𝑗 𝑖=1 𝑗=1 𝑝=1,𝑝≠𝑗 𝑖=1 𝑗=1 where 𝑎̂𝑖𝑗 = ∑𝑛𝑝=1,𝑝≠𝑗 𝑎𝑖𝑝 Processing time for a k-type search request to a file 𝐹𝑖 from the node 𝐾𝑗 can be represented by the formula 𝑡𝑘𝑖𝑗 𝑥𝑖𝑗 + 𝑡̂𝑘𝑖𝑗 (1 − 𝑥𝑖𝑗 ) = 𝑡̂𝑘𝑖𝑗 + (𝑡𝑘𝑖𝑗 − 𝑡̂𝑘𝑖𝑗 )𝑥𝑖𝑗 , 1 where 𝑡𝑘𝑖𝑗 = 𝑛−1 ∑𝑛𝑝=1,𝑝≠𝑗 𝑡𝑘𝑖𝑝 . The only time required to process all search queries received by the network during a unit of time is set as 𝑚 𝑛 𝑡 = 𝑡0 + ∑ ∑ 𝑏𝑖𝑗 𝑥𝑖𝑗 , 𝑖=1 𝑗=1 where 𝑠 𝑚 𝑛 𝑡0 = ∑ ∑ ∑ λ𝑘𝑖𝑗 𝑡̂𝑘𝑖𝑗 , 𝑘=1 𝑖=1 𝑗=1 𝑠 𝑏𝑖𝑗 = ∑ λ𝑘𝑖𝑗 (𝑡𝑘𝑖𝑗 − 𝑡̂𝑘𝑖𝑗 ). 𝑘=1 The time it takes to process a fix for an l-type file 𝐹𝑖 is equal to 𝑛 ′ ∑ 𝑡𝑙𝑖𝑞 𝑥𝑖𝑞 . 𝑞=1 354 The standardized time required to fulfill all patch requests that come into the network during a unit of time is set as 𝑟 𝑚 𝑛 𝑛 𝑟 𝑚 𝑛 𝑛 𝑡 ′ = ∑ ∑ ∑ ∑ λ𝑙𝑖𝑗 𝑡𝑙𝑖𝑞 𝑥𝑖𝑞 = ∑ ∑ ∑ ∑ λ′𝑙𝑖𝑞 𝑡𝑙𝑖𝑗 ′ ′ ′ 𝑥𝑖𝑗 . 𝑙=1 𝑖=1 𝑗=1 𝑞=1 𝑙=1 𝑖=1 𝑗=1 𝑞=1 By putting 𝑟 𝑛 𝑏𝑖𝑗 = ∑ ∑ λ′𝑙𝑖𝑞 𝑡𝑙𝑖𝑗 ′ ′ , 𝑙=1 𝑞=1 we get 𝑚 𝑛 ′ ′ 𝑡 = ∑ ∑ 𝑏𝑖𝑗 𝑥𝑖𝑗 . 𝑖=1 𝑗=1 ′ Noting 𝑓0 ≡ 𝑇0 + 𝑡0 , 𝐶𝑖𝑗 = −𝑎𝑖𝑗 + 𝑎̂𝑖𝑗 + 𝑏𝑖𝑗 + 𝑏𝑖𝑗 ,we obtain an exact model of the problem of optimal distribution of copies of files between network nodes in terms of the minimum uniform time required to service all requests received by the system within a unit of time, associated with the type of discrete programming problems with boolean variables: 𝑓(𝑋) = 𝑓0 + ∑𝑚 𝑛 𝑖=1 ∑𝑗=1 𝐶𝑖𝑗 𝑥𝑖𝑗 → 𝑚𝑖𝑛, (5) under restrictions ∑𝑛𝑗=1 𝑥𝑖𝑗 ≥ 1 (𝑖 = 1,2, … , 𝑚); (6) 𝑛 ∑𝑗=1 𝐿𝑖 𝑥𝑖𝑗 ≤ 𝑏𝑗 (𝑗 = 1,2, … , 𝑛); (7) 𝑥𝑖𝑗 ∈ {0, 1}(𝑖 = 1,2, … , 𝑚; 𝑗 = 1,2, … , 𝑛) (8) 4. Algorithmic implementation of the model To implement models (5) (8), we propose an algorithm that creates a model [14] for further comparison. At the first stage of the algorithm, the initial distribution of files is found, which will be rational if the condition (7) is not taken into account. At the second stage, the files are redistributed if there is at least a 1-n index for the original distribution. J is such that condition (7) is not met. The second stage of the algorithm is performed until a distribution is found that meets condition (7). Let us consider the stages of the recommended algorithm. The first stage. Determination of the initial distribution. Determining the values of 𝐶𝑖𝑗 (𝑖 = 1,2, … , 𝑚; 𝑗 = 1,2, … , 𝑛) and calculation of the matrix 𝐶 = 𝐶𝑖𝑗 𝑚𝑛 . If for 𝑖 ∃ 𝐶𝑖𝑗 < 0,then ∗ 1, 𝐶𝑖𝑗 < 0; 𝑥𝑖𝑗 ={ 0, 𝐶𝑖𝑗 ≥ 0. If for ∀𝐶𝑖𝑗 ≥ 0, then we define 𝑚𝑖𝑛1≤𝑗≤𝑛 𝐶𝑖𝑗 . Let 𝑚𝑖𝑛1≤𝑗≤𝑛 𝐶𝑖𝑗 = 𝐶𝑖𝑗𝑖 . Then: ∗ 1, 𝑗 = 𝑗𝑖 ; 𝑥𝑖𝑗 ={ 0, 𝑗 ≠ 𝑗𝑖 . The second stage. Redistribution of files. 1. Create a vector of values 𝐸 = (ε1 , ε2 , … , ε𝑛 ), where ε𝑗 = 0 (𝑗 = 1,2, … , 𝑛). During the algorithm, after redistributing a file from a certain filled node 𝐾𝑗 the corresponding component ε𝑗 of the vector E is set to 1, and this node is closed for redistribution. 355 Figure 3: Main algorithm. 2. For all indices j, where ε𝑗 = 0, we check the fulfillment of condition (8). If this condition is fulfilled for all j indexes, then the algorithm ends. If for sure j=r we have: 𝑚 ∑ 𝐿𝑖 𝑥𝑖𝑟 > 𝑏𝑟 , 𝑖=1 Then we move on to the third point. 3. If ∃ 𝑠 ≠ 𝑟 such as: 𝑚 𝑚 ∑ 𝐿𝑖 𝑥𝑖𝑟 ≤ 𝑏𝑠 , ∑ 𝐿𝑖 𝑥𝑖𝑠 ≤ 𝑏𝑟 , 𝑖=1 𝑖=1 then we swap the memory of the r-th and s-th nodes and return to the second point. Otherwise, we go to the fourth point. For those where ∃𝑗 ≠ 𝑟 such as 𝑥𝑖𝑗 = 1, visualize 𝑚𝑖𝑛𝑖 (−𝐶𝑖𝑟 ). Let: 𝑚𝑖𝑛𝑖 (−𝐶𝑖𝑟 ) = −𝐶𝑘𝑟 . 1, 𝑗 = 𝑟, For those i, where 𝑥𝑖𝑗 = { we determine 𝑚𝑖𝑛𝑗 (𝐶𝑖𝑗 − 𝐶𝑖𝑟 ), where little is taken, according 0, 𝑗 ≠ 𝑟, to those indicators 𝑗 ≠ , where ε𝑗 = 0. Let: 𝑚𝑖𝑛𝑗 (𝐶𝑖𝑗 − 𝐶𝑖𝑟 ) = 𝐶𝑖𝑗𝑖 − 𝐶𝑖𝑟 . Then we define 𝑚𝑖𝑛𝑗 (𝐶𝑖𝑗 − 𝐶𝑖𝑟 ). Let: 𝑚𝑖𝑛𝑖 (𝐶𝑖𝑗𝑖 − 𝐶𝑖𝑟 ) = 𝐶𝑙𝑗𝑙 − 𝐶𝑙𝑟 . If 𝑚𝑖𝑛(−𝐶𝑘𝑟 , 𝐶𝑙𝑗𝑙 − 𝐶𝑙𝑟 ) = −𝐶𝑘𝑟 , then in the matrix X we assume 𝑥𝑘𝑟 = 0. This means that the file 𝐹𝑘 is excluded from the node 𝐾𝑟 . If 𝑚𝑖𝑛(−𝐶𝑘𝑟 , 𝐶𝑙𝑗𝑙 − 𝐶𝑙𝑟 ) = −𝐶𝑙𝑟 , then in the matrix 𝑋 provide 𝑥𝑙𝑟 = 0, 𝑥𝑙𝑗𝑙 = 1.This means that the file 𝐹𝑖 from node 𝐾𝑟 is redistributed to node 𝐾𝑗 . Such a redistribution of files corresponds to a minimal increase in the objective function. Check condition (7) 𝑗 = 𝑟. If it is not fulfilled, then go to the third step. If the condition is met, then the element ε𝑟 element of the vector 𝐸 is assigned a value of 1 and proceed to the second step. 356 Figure 4: File distribution. Thus, the algorithm allows you to find the optimal or almost rational distribution of files between network nodes in a finite number of steps. The result of the algorithm is the matrix X. 5. Optimization of the distribution of registrations To determine the intensity of access to various files of the information base, we used the resident program "Query Analyzer" written in the Assembler programming language. The language guarantees compactness and flexibility when writing resident programs and does not allow errors in the measured processes. The resident program is loaded on all network nodes. The computer time is synchronized. The program analyzes all file requests and records the date, time, file name, request duration, response time, and response duration. Logging is performed in the internal buffer and is written to disk only during computer idle time, when typing from the keyboard or other operations that require waiting for a response are performed. The non-resident part of the program analyzes the processed data and finds temporary file access properties. The network load is characterized by unevenness - complete absence of calls or simultaneous calls from all workstations. The LAN Query Optimizer program is designed to redistribute search queries for reference and information files between network nodes, which significantly reduces queues. As an aspect of optimality, a single time required to service all requests received by the system in 1 hour is taken as a single time. The efficiency of requests for corrections and searches in various data files of the Revenue Accounting DBMS and the average search time are presented in the table. 357 The output file for the optimization program was created in the following order: the efficiency of searches and corrections in any database file; the duration of records in the files; and the average processing time of search queries to data files. As an aspect of optimality, the average amount of information sent over the communication lines when processing requests is taken. The resulting matrix of the expedient distribution of files to network nodes as a result of the calculation using the optimization program is as follows: 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 𝑋= 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1 1 1 1 1 1 1 1 1 1 1 1] For example, files that have been modified are often stored in 1 copy each. Copies of other files are duplicated in the local databases of network nodes. If we take a single time as the optimality aspect, the time required to service all requests that come into the system within 1 hour, the matrix of the appropriate file allocation is of a different type: 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 𝑋= 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 [1 1 1 1 0 0 0 0 0 0 0 0] For example, the first 4 files of information bases, which have a high intensity of corrections, are stored on the 1st of the most productive nodes (1 copy each). Copies of other files, where corrections and additions are made less frequently, are located in the local databases of the first four computer nodes. In fact, it is more convenient to use the second method of file distribution, since for very time- limited work on the introduction of payments, distribution of incoming amounts between budgets and the creation of general reporting, it is enough to use the first four network nodes with the highest speed. Similar placement of copies of data files will help to avoid unnecessary information redundancy and difficulties in reconciling it. 358 6. Conclusion The method of optimization of distributed file placement registrations in a computer network based on the waiting time. The problem of creating switching systems designed to analyze the state of the network at any given time and optimal data transportation to find optimal solutions. A practical implementation of the method of balancing the load of the DBMS network nodes intended for processing large and ultra-large volumes of databases is proposed. The results of the operation of the revenue accounting system based on the proposed method indicate the possibility of a significant increase in the speed of data processing in large-volume databases by using mechanisms for optimizing the load of network nodes used to process databases. The application of the method allows to increase the productivity and reduce the reaction time of the information system working with the DBMS. Declaration on Generative AI The authors have not employed any Generative AI tools. References [1] Krainyk Y., Davydenko Y., Tomas V. Configurable Control Node for Wireless Sensor Network. 2019 3rd International Conference on Advanced Information and Communications Technologies (AICT), Lviv, Ukraine, 2019, pp. 258 262. doi: 10.1109/AIACT.2019.8847732 [2] Bailis P., Ghodsi A., Braams J., Hellerstein H., Stoica I. Bolt-on causal consistency. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 2013. pp. 761 772. [3] Boncz P., Zukowski M., Nes N. LATEX: MonetDB/X100: Hyper-Pipelining Query Execution. Cidr, Vol. 5, 2005. pp. 225 237. [4] Charapko A., Ailijiang A., Demirbas M. Adapting to Access Locality via Live Data Migration in Globally Distributed Datastores. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018. pp. 3321 3330. [5] Liu G., Shen H. Minimum-cost cloud storage service across multiple cloud providers. IEEE/ACM Transactions on Networking (TON), Vol. 25, 4 (2017). pp. 2498 2513. [6] Mahmoud H., Nawab F., Pucher A., Agrawal D., El Abbadi A. Low-latency multi-datacenter databases using replicated commit. Proceedings of the VLDB Endowment, Vol. 6, 9 (2013), pp. 661 672. [7] Guerraoui R., Wang J. How fast can a distributed transaction commit? In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. ACM, 2017. pp. 107 122. [8] Ping F., Hwang J.-F., McConnel C., Vabbalareddy R. Wide area placement of data replicas for fast and highly available data access. In Proceedings of the fourth international workshop on Data-intensive distributed computing. ACM, 2011. pp. 1 8. [9] Ports D., Grittner K. Serializable snapshot isolation in PostgreSQL. Proceedings of the VLDB Endowment, Vol. 5, 12 (2012), pp. 1850 1861. [10] Shved A., Kovalenko I., Davydenko Y. Method of Detection the Consistent Subgroups of Expert Assessments in a Group Based on Measures of Dissimilarity in Evidence Theory. In: Shakhovska N., Medykovskyy M. (eds) Advances in Intelligent Systems and Computing IV. CCSIT 2019. Advances in Intelligent Systems and Computing, vol 1080. Springer, Cham, 2020. pp. 36 53. doi: 10.1007/978-3-030-33695-0_4 [11] Kovalenko I., Davydenko Y., Shved A. Formation of Consistent Groups of Expert Evidences Based on Dissimilarity Measures in Evidence Theory. 2019 IEEE 14th International Conference 359 on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, 2019, pp. 113 116. doi: 10.1109/STC-CSIT.2019.8929858 [12] Adya A., Myers D., Howell J., Elson J., Meek C., Khemani V., Fulger S., Gu P., Bhuvanagiri L, Hunter J., et al. Slicer: Auto-sharding for datacenter applications. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 2016. pp. 739 753. [13] Lakshman A., Malik P. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, Vol. 44, 2 (2010), pp. 35 40. [14] Bacon D., Bales N., Bruno N., Cooper B., Dickinson A., Fikes A., Fraser C., Gubarev A., Joshi M, Kogan E., et al. Spanner: Becoming a SQL system. In Proceedings of the 2017 ACM International Conference on Management of Data. 2017. pp. 331 343. [15] Klophaus R. Riak core: Building distributed applications without shared state. In ACM SIGPLAN Commercial Users of Functional Programming. ACM, 2010. p. 14. [16] Pavlo A., Angulo G, Arulraj J., Arulraj H., Lin H., Lin H., Ma L., Menon P., Mowry T., Perron M., Quah I., et al. Self-Driving Database Management Systems. In CIDR, Vol. 4. 1 (2017). [17] URL : https://www.slideshare.net/AmazonWebServices/dat202getting-started-with-amazon- aurora/14 (Last accessed: 11.01.24). [18] Nishtala R., Fugal H., Grimm S., Kwiatkowski M., Lee H., Li H. C., McElroy R., Paleczny M., Peek D., Saab P., et al. Scaling memcache at facebook. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). 2013. pp. 385 398.S. Anzaroot, A. McCallum, UMass citation field extraction dataset, 2013. URL : http://www.iesl.cs.umass.edu/data/data-umasscitationfield. 360