<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving data rebalancing in distributed databases using adaptive and elitist genetic algorithms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roman Belous</string-name>
          <email>belous22@ukr.net</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Taras Trysnyuk</string-name>
          <email>taras24t@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kyrylo Smetanin</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladyslav Vasylenko</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmytro</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mosiichuk</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of telecommunications and global information space, NAS of Ukraine</institution>
          ,
          <addr-line>Chokolivsky Boulevard 13, Kyiv, 02000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>This paper presents a method for improving data rebalancing in distributed databases using a genetic algorithm enhanced with elitism and adaptive crossover. The proposed approach dynamically redistributes data fragments across system nodes to minimize query execution time and optimize load balancing. A mathematical model of the problem is formalized, incorporating fragment size, node capacity, and data transfer costs. The genetic algorithm includes adaptive tuning of crossover parameters and an elitism mechanism that preserves the best solutions in each generation. Experimental results on a distributed school management system demonstrate an 8.2% improvement in average query execution time compared to static rebalancing approaches. The method can be applied to high-load systems requiring scalable and intelligent data distribution strategies..</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Genetic Algorithm</kwd>
        <kwd>Data Rebalancing</kwd>
        <kwd>Elitism</kwd>
        <kwd>Adaptive Crossover</kwd>
        <kwd>Distributed Databases</kwd>
        <kwd>Query Optimization</kwd>
        <kwd>High-Load Systems</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In the era of increasing data volumes and high-load distributed systems, the efficient
execution of database queries becomes a key factor in ensuring responsiveness and scalability.
Distributed databases often suffer from data skew, where uneven distribution of fragments
across nodes leads to performance bottlenecks and suboptimal resource utilization. Traditional
rebalancing techniques, whether static or heuristic - fail to adapt dynamically to the evolving
workload patterns, especially in real-time, high-throughput environments.</p>
      <p>To address these challenges, intelligent optimization approaches are required. One of the
most promising techniques is the use of evolutionary algorithms, particularly Genetic
Algorithms (GA), which have demonstrated strong adaptability and robustness in solving
NPhard problems, including fragment placement, load balancing, and data allocation in distributed
__________________________</p>
      <p>
        systems [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1–4</xref>
        ]. However, standard implementations of GA often suffer from premature
convergence and limited exploitation-exploration balance, leading to suboptimal rebalancing
decisions.
      </p>
      <p>This paper presents a novel method for data rebalancing in distributed databases using an
enhanced Genetic Algorithm that incorporates two key mechanisms: elitism and adaptive
crossover. The elitism strategy preserves the best-performing solutions across generations,
ensuring that high-quality individuals are not lost during the evolution. The adaptive crossover
mechanism dynamically adjusts the crossover rate depending on the diversity and convergence
behavior of the population, maintaining a healthy balance between exploration of new solutions
and exploitation of current ones.</p>
      <p>The proposed method formalizes the rebalancing problem as an optimization task, where
the objective is to minimize query execution time and data transfer cost, while ensuring load
uniformity across nodes. A mathematical model is introduced, and the algorithm is
implemented in a distributed school management platform developed using Laravel, Docker,
and MySQL, simulating realistic high-load scenarios.</p>
      <p>Experimental results show that the proposed method achieves a measurable improvement
in query performance, with up to 8.2% reduction in average execution time compared to baseline
static allocation approaches. The method demonstrates scalability, adaptability to changing
workloads, and potential for broader application in distributed data-intensive systems.</p>
      <p>The remainder of this paper is organized as follows: Section 2 presents related work and
existing rebalancing strategies; Section 3 describes the proposed genetic algorithm with elitism
and adaptive crossover; Section 4 introduces the mathematical model and algorithmic
workflow; Section 5 outlines the experimental setup and evaluates the performance; finally,
Section 6 summarizes the conclusions and outlines future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Analysis of research and publications</title>
      <p>
        Genetic algorithms (GAs) have been widely adopted for solving optimization problems in
distributed database systems, particularly for tasks such as fragment allocation, data
rebalancing, and load distribution [
        <xref ref-type="bibr" rid="ref1 ref2">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">2</xref>
        ]. These algorithms have demonstrated flexibility and
robustness when applied to NP-hard problems [
        <xref ref-type="bibr" rid="ref4">3</xref>
        ]. However, the majority of early approaches
relied on static parameters and lacked mechanisms for maintaining high-quality solutions
throughout generations.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref5">4</xref>
        ], a genetic algorithm was proposed for distributed database design, focusing on
minimizing data transfer costs and query execution time by determining the optimal placement
of fragments. However, the algorithm did not include elitism, which resulted in the loss of
highquality individuals between generations and increased the risk of premature convergence. A
similar limitation was identified in [
        <xref ref-type="bibr" rid="ref6">5</xref>
        ], where static genetic parameters were used during the
entire execution process, which significantly limited the adaptability of the algorithm in
highload environments.
      </p>
      <p>
        The work in [
        <xref ref-type="bibr" rid="ref7">6</xref>
        ] explored fragment allocation strategies using GA and evaluated various
sitefragment mappings. Although the method showed reasonable performance for small-scale
systems, it suffered from stagnation during long runs due to the absence of adaptive crossover or
mutation control. A more advanced approach was discussed in [
        <xref ref-type="bibr" rid="ref8">7</xref>
        ], where a hybrid of GA and
heuristic search was introduced; however, the method was tuned for static workloads and
lacked dynamic response to evolving query patterns.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref9">8</xref>
        ], researchers addressed the load balancing problem using a GA with a refined fitness
function. While the model integrated basic elitism, it still operated under fixed operator
parameters, which made it less efficient in heterogeneous data environments. Another study [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ]
incorporated a multi-objective GA model but failed to dynamically adapt crossover intensity,
which proved suboptimal under variable access distributions.
      </p>
      <p>
        Parallel genetic algorithms have also been examined for distributed systems [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ], enabling
improved convergence speed and scalability. Nevertheless, most implementations still used
global or fixed parameters and ignored population diversity indicators during evolution.
      </p>
      <p>Thus, existing research confirms the effectiveness of GAs in distributed environments, but
also highlights their common weaknesses: static parameterization, lack of adaptive control, and
insufficient preservation of elite solutions. To address these shortcomings, this study proposes a
modified GA that combines elitism with adaptive crossover—a hybrid strategy that dynamically
adjusts evolutionary pressure based on population diversity while safeguarding the best
solutions from degradation.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Formulation of the problem</title>
      <p>The purpose of the article is to develop a method for improving data rebalancing in
distributed database systems under high-load conditions by applying a genetic algorithm with
elitism and adaptive crossover. The aim is to ensure a more efficient redistribution of data
fragments among nodes, minimize query execution time, and optimize the overall load balance
of the system.</p>
      <p>The proposed method addresses the limitations of existing rebalancing strategies, which
often rely on static fragment allocation or use genetic algorithms with fixed parameters and no
elitism. These limitations lead to inefficient use of computational resources, longer convergence
times, and suboptimal distribution outcomes.</p>
      <p>To overcome these challenges, the developed method integrates elitism to preserve the best
solutions across generations and introduces an adaptive crossover mechanism that dynamically
adjusts to the state of the population. The optimization model considers fragment sizes, node
capacities, and data transfer costs between nodes, aiming to minimize total query execution time
and communication overhead.</p>
      <p>This problem formulation enables the application of evolutionary optimization to the
complex task of dynamic data reallocation in distributed environments, supporting the
development of scalable and adaptive high-performance systems.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Presentation of the main material</title>
      <p>To implement the proposed method of data rebalancing in distributed database systems, the
problem is formalized as an optimization task where the goal is to minimize the total cost of
executing queries and transferring data between nodes. The approach uses a modified genetic
algorithm that incorporates elitism and adaptive crossover.</p>
      <p>1)Let the system consist of a set of data fragments:
2)Let there be a set of sites (nodes):
3)Each fragment f i has a size d f i , and each site s j has a limited capacity c j . The data transfer
cost between any two nodes si and s j is given by the matrix:</p>
      <p>F =\{ f 1 , f 2 , … , f n \}
S=\{ s1 , s2 , … , sm \}
C =[ cij ] where cij ≥ 0
(1)
(2)
(3)
(4)
(5)
4)A solution (chromosome) in the genetic algorithm is encoded as a vector:</p>
      <p>X =[ x1 , x2 , … , xn ] , xi∈ \{ 1 , … , m \}
where xi= j indicates that fragment f i is assigned to node s j.</p>
      <p>5)The total cost function to be minimized is defined as:</p>
      <p>n n
Cost ( X )=∑ ∑ qij⋅ c xi x j</p>
      <p>i=1 j=1
where qij represents the query frequency or data dependency between fragments f i and f j .
6)The following constraint must be satisfied to ensure that no node exceeds its capacity:
i: ∑xi= j d f i ≤ c j ,∀ j∈ \{ 1 , … , m \}
(6)</p>
      <p>To enhance the optimization process, elitism is applied. In each generation, the
bestperforming solutions (with the lowest cost) are preserved and directly copied to the next
population. This ensures that high-quality solutions are not lost during mutation or crossover.
The concept of elitism is illustrated in Figure 1. In this example, Individual 2 achieves the lowest
fitness value F=0.03 among all candidates. As a result, it is directly copied into the next
generation without undergoing genetic operations, preserving its quality and helping the
population to converge more efficiently.</p>
      <p>7)The crossover process plays a key role in maintaining genetic diversity and driving
convergence. In the proposed method, pairs of individuals are selected based on their fitness and
population diversity. The crossover rate   p c is adjusted dynamically during the evolution
process, as described in the previous formula. This allows the algorithm to increase exploration
when diversity is low and reduce randomness as the population begins to converge.</p>
      <p>This process is illustrated in Figure 2, where Individuals 2 and 3 are selected for crossover
based on their fitness and diversity criteria. The resulting offspring inherits characteristics from
both parents and is added to the new generation.</p>
      <p>pc= pbase+α ⋅ (1−</p>
      <p>Dt
Dmax
where: pbase is the base crossover rate,  is the adaptation coefficient, Dt is the current diversity
of the population, Dmax is the maximum observed diversity.</p>
      <p>8)Diversity is calculated as the average Hamming distance between all chromosomes in the
population:</p>
      <p>D =
t</p>
      <p>2 ∑P−1 ∑P H ( X i , X j)
P ( P−1) i=1 j=i+1
(8)
where P is the population size, and H ( X i , X j) is the Hamming distance between chromosomes
X iand X j.</p>
      <p>The mutation rate   p m is kept low and fixed to introduce minor random variations. The
genetic algorithm workflow consists of the following steps:
1. initialization of a random population of solutions;
2. evaluation of each solution using the cost function;
3. selection of parents based on fitness (roulette wheel or tournament selection);
4. application of crossover with adaptive rate;
5. application of mutation;
6. preservation of elite individuals;
7. replacement and repetition until a stopping criterion is met (e.g., number of generations
or convergence).</p>
      <p>The algorithm iteratively improves the population until it converges to an optimal or
nearoptimal fragment distribution that minimizes query execution time and balances load among
distributed nodes.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Analysis of the results</title>
      <p>A simulation model was developed using a real-world distributed educational platform built
on Laravel, MySQL, Docker, and Redis. The system emulates a multi-node architecture where
each node stores and processes data about students, grades, lessons, and academic performance.
The total dataset included 1,700 MB of structured educational data distributed across five nodes
with varying capacity constraints and communication costs.</p>
      <p>To evaluate the effectiveness of the proposed method, three rebalancing strategies were
compared:
 Static allocation — fragments were initially assigned once and not redistributed;
 Classical GA-based rebalancing — genetic algorithm with fixed crossover and no
elitism;
 Proposed method — GA with elitism and adaptive crossover.</p>
      <p>Each experiment was run for 50 generations with a population size of 40 individuals. The
objective function was the average query execution time across all nodes, which depends on the
data locality and inter-node communication.</p>
      <p>The results are shown in Figure 3. The proposed method achieved the fastest convergence,
stabilizing by generation 30, whereas the classical GA required over 45 generations. Static
allocation showed no improvement, confirming the need for dynamic rebalancing.</p>
      <p>In terms of average query execution time, the proposed method outperformed the classical
GA by 3.6% and static allocation by 8.2%. Additionally, the adaptive crossover mechanism
maintained higher population diversity during early generations, which contributed to more
effective exploration of the solution space.</p>
      <p>The performance gain is also demonstrated in Figure 3, which shows the evolution of query
execution time over 10 generations for both algorithms. The modified genetic algorithm
consistently outperforms the baseline variant, exhibiting faster convergence and lower
execution times starting from the fourth generation onward.</p>
      <p>The elitism mechanism prevented the loss of optimal individuals between generations,
significantly reducing oscillations in fitness values. This effect is demonstrated in the
convergence graph, where the best fitness curve in the proposed method shows smoother
descent and lower final values.</p>
      <p>The system also exhibited improved load balancing across nodes, with the variance in node
utilization reduced by 14% compared to the baseline GA. This directly translated into lower peak
resource usage and improved fault tolerance under simulated high-load conditions.</p>
      <p>These results confirm that the enhanced genetic algorithm with elitism and adaptive
crossover provides a more robust and efficient solution for data rebalancing in distributed
database systems, particularly in environments with high variability in query patterns and data
access frequency.
6. Conclusions
1. The article proposes a method for improving data rebalancing in distributed database
systems under high-load conditions using a genetic algorithm enhanced with elitism
and adaptive crossover.
2. The proposed method dynamically redistributes data fragments across nodes to
minimize query execution time, reduce data transfer costs, and balance the
computational load of the system.
3. The integration of elitism ensures the preservation of the best solutions throughout
generations, while adaptive crossover improves convergence and solution diversity,
especially during early evolutionary stages.
4. Experimental results on a 3-node distributed system show that the modified genetic
algorithm reduces the average query execution time by up to 8.2% compared to static
allocation and 3.6% compared to a classical GA.
5. The method also leads to a more balanced load distribution among nodes, decreasing
variance in node utilization and improving overall system efficiency under high-load
scenarios.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Rezaee</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajiaghaei-Keshteli</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farnoosh</surname>
            <given-names>R.</given-names>
          </string-name>
          <article-title>A genetic-based heuristic for solving distributed database design problems</article-title>
          // Applied Soft Computing.
          <article-title>-</article-title>
          <year>2018</year>
          . - Vol.
          <volume>72</volume>
          . - P.
          <fpage>539</fpage>
          -
          <lpage>556</lpage>
          . https://doi.org/10.1016/j.asoc.
          <year>2018</year>
          .
          <volume>08</volume>
          .020
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Goldberg</surname>
            <given-names>D. E.</given-names>
          </string-name>
          <string-name>
            <surname>Genetic</surname>
          </string-name>
          <article-title>Algorithms in Search, Optimization, and Machine Learning</article-title>
          .
          <source>AddisonWesley</source>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Holland</surname>
            <given-names>J. H.</given-names>
          </string-name>
          <article-title>Adaptation in Natural and Artificial Systems</article-title>
          . University of Michigan Press,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Gen</surname>
            <given-names>M.</given-names>
          </string-name>
          , Cheng R. Genetic Algorithms and Engineering Optimization. Wiley-Interscience,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Rezaee</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajiaghaei-Keshteli</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farnoosh</surname>
            <given-names>R.</given-names>
          </string-name>
          <article-title>A genetic-based heuristic for solving distributed database design problems</article-title>
          // Applied Soft Computing.
          <article-title>-</article-title>
          <year>2018</year>
          . - Vol.
          <volume>72</volume>
          . - P.
          <fpage>539</fpage>
          -
          <lpage>556</lpage>
          . https://doi.org/10.1016/j.asoc.
          <year>2018</year>
          .
          <volume>08</volume>
          .020
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ahmed</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ismail</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Distributed database fragmentation and allocation: a genetic algorithm approach /</article-title>
          / IJCSNS. -
          <year>2008</year>
          . - Vol.
          <volume>8</volume>
          , No. 3. - P.
          <fpage>271</fpage>
          -
          <lpage>278</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Kumar</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mishra</surname>
            <given-names>R.</given-names>
          </string-name>
          <article-title>A novel genetic algorithm-based approach for optimization of distributed database systems</article-title>
          // International Journal of Computer Applications. - 2014. - Vol.
          <volume>87</volume>
          , No.
          <volume>14</volume>
          . - P.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Elmasry</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Badr</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aref</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>An efficient data allocation algorithm for distributed databases using genetic algorithm // IJCSNS</article-title>
          . -
          <year>2009</year>
          . - Vol.
          <volume>9</volume>
          , No. 3. - P.
          <fpage>117</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Walia</surname>
            <given-names>G. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saini</surname>
            <given-names>J. R.</given-names>
          </string-name>
          <article-title>Enhanced genetic algorithm for solving load balancing problem in distributed systems</article-title>
          // International Journal of Computer Applications. - 2012. - Vol.
          <volume>52</volume>
          , No. 5. - P.
          <fpage>36</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Bhandari</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murthy</surname>
            <given-names>C. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pal</surname>
            <given-names>S. K.</given-names>
          </string-name>
          <article-title>Variance as a stopping criterion for genetic algorithms</article-title>
          // Information Sciences.
          <article-title>-</article-title>
          <year>1996</year>
          . - Vol.
          <volume>85</volume>
          , No.
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          . - P.
          <fpage>223</fpage>
          -
          <lpage>243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Alba</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troya J. M.</surname>
          </string-name>
          <article-title>A survey of parallel distributed genetic algorithms /</article-title>
          / Complexity. -
          <year>1999</year>
          . - Vol.
          <volume>4</volume>
          , No. 4. - P.
          <fpage>31</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>