=Paper= {{Paper |id=Vol-3200/paper5 |storemode=property |title=Modification of Query Processing Methods in Distributed Databases Using Fractal Trees |pdfUrl=https://ceur-ws.org/Vol-3200/paper5.pdf |volume=Vol-3200 |authors=Olha Svynchuk,Andrii Barabash,Serhii Laptiev,Tetiana Laptieva }} ==Modification of Query Processing Methods in Distributed Databases Using Fractal Trees == https://ceur-ws.org/Vol-3200/paper5.pdf
Modification of Query Processing Methods in Distributed
Databases Using Fractal Trees
Olha Svynchuk1, Andrii Barabash2, Serhii Laptiev3 and Tetiana Laptieva4
1,2
    National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», Politehnichna str. 56, 5,
Kyiv, 03056, Ukraine
3,4
    Taras Shevchenko National University of Kyiv, Volodymyrska str. 64/13, Kyiv, 01601, Ukraine,


                  Abstract
                  Today in database management systems there is an acute problem of searching for data in large
                  data sets. To solve this problem, we propose a modified search tree and its improvement using
                  a fractal index search tree with a multilevel structure. Each level in such a structure is a separate
                  fractal tree. Algorithms for data processing in DBMS RAM by modified methods are described.
                  These methods can be used to search for the same data from different tables. Increased the
                  minimum filling of the node, which reduces the height of the tree. The symmetry of the fractal
                  tree helps to execute the query quickly and, as a result, reduce the number of requests to the
                  disk subsystem. Also, due to the self-similarity property, the most frequently used indexes will
                  be loaded into the DBMS RAM much faster after selection. This will speed up the process of
                  finding the information you need for the request. Loading data indexes into RAM based on
                  statistics on the frequency of use of indices and index size weights will reduce the number of
                  indexes that are loaded into RAM, in contrast to the classic loading where the loading of indexes
                  occurs during their use and after filling the memory, it is deleted. Another big advantage is that
                  indexes that are almost never used will not be loaded into RAM. The proposed approach with
                  fractal trees also has an important scaling property, as fractal trees are divided into a large
                  number of smaller trees, which is especially true in the era of multicore modern computer
                  systems. To study the effectiveness of the use of indexes based on a modified fractal search tree
                  in the database and select the best system for hosting the database server, we measured the
                  speed of information retrieval in tables for the Windows 10 operating system. During the
                  experiments it was shown that the search speed on the modified trees in comparison with the
                  modified fractal search tree is reduced by 12%.

                  Keywords 1
                  database, data search, B + -trees, modified trees, fractal trees, indexes



1. Introduction                                                                               concept that describes the characteristics of this
                                                                                              data. In modern information systems for high-
                                                                                              quality work with databases use DBMS database
   In today's world we can see a rapid increase in
                                                                                              management systems that provide the ability to
information, which complicates the process of its
                                                                                              create, store, update and search for the necessary
storage and management. Therefore, for its
                                                                                              information. DBMSs also provide a number of
organization and quick search using databases
                                                                                              useful services: schema to control data semantics,
(DB), which are organized according to the
                                                                                              query language to organize access to part of the

III International Scientific And Practical Conference “Information
Security And Information Technologies”, September 13–19, 2021,
Odesa, Ukraine
EMAIL:        7011990@ukr.net        mailto:      (A.      1);
andrew.barbsh@gmail.com (A. 2); salaptiev@gmail.com (A. 3);
tetiana1986@ukr.net (A.4)
ORCID: 0000-0001-9032-6335 (A. 1); 0000-0001-8433-2827
(A. 2); 0000-0002-7291-1829 (A. 3); 0000-0002-5223-9078 (A. 4)
              ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative
              Commons License Attribution 4.0 International (CC BY 4.0).

              CEUR Workshop Proceedings (CEUR-WS.org)
database, data granulation, data integrity                   The aim of the article is to improve the process
management, compression to reduce database               of processing indexes in databases using fractal
size, indexing to speed up query processing.             trees and speed up query execution.
However, the integration of different databases
into the production process at enterprises and           2. Modified search tree
other institutions has a number of shortcomings
associated with the organization of their
management and monitoring of events in                       The existing mechanisms of data modification
databases [1-5].                                         in the tables have a certain feature - the change of
    In modern databases, an important element is         keys in the corresponding nodes of the tree is
the search for data in tables that contain many          performed with the subsequent restructuring of
rows and columns and are not always ordered.             the index. This significantly affects the speed of
Therefore, to implement a quick search, indexes          writing information to the database and,
are created that are formed from the values of one       accordingly, is an important factor in increasing
or more columns and pointers to the                      the number of queries to the database. You also
corresponding rows. Indexes allow you to avoid           need to store only the most frequently used
sequential or step-by-step browsing of the file in       indexes, then, accordingly, the access time to the
search of the desired data. They are ordered, each       data storage location will be reduced. Therefore,
element of the index contains the name of the            the existing methods need to be improved, which
searched object and a pointer-identifier of its          will allow to find the necessary information faster
location. The more indexes, the better the               [21-22].
performance of database queries, but a very large            In the + tree we will improve as follows:
number of indexes does not guarantee high                    • increase the minimum filling of the node,
performance [5-10].                                               which will reduce the height of the tree;
    Many databases use different trees and their             • change the rule of separating the nodes of
modifications to build such indexes. However, if                  the tree - splitting the node with its two
the tree has an insufficient number of nodes and                  neighbors into four new nodes;
their fullness, the data search time increases [11-          • change the rule of connecting tree nodes -
13]. The disadvantages may also be the use of                     connecting four nodes into three new
identical indexes for different tables and sending                nodes;
to the RAM of indexes that are rarely used [14-              • in the tree leaf we will store records of
15].                                                              links to the same fields in different tables,
    The base trees in index construction and data                 which will increase the time of receipt of
retrieval are B-trees, namely their type B + trees.               links to data in the tree and speed up the
These trees easily implement the independence of                  search.
the program from the structure of the information            We describe the search for data using indexes.
record, have the ability to sequential access and        Indexes are loaded into the RAM of the database
all key data are contained only in the sheets. The       after receiving the request. Next, a list of data is
main disadvantages of such trees are the                 formed, which contains the necessary
compactness of filling and the number of levels of       information, and the found data is sent to RAM.
trees [16-18].                                           However, in the classical algorithm for loading
    You can also select K-trees, which contain all       indexes in the RAM are indexes that are almost
the characteristics of the B + tree, but have a better   not used, and, accordingly, take place until they
strategy of splitting and merging nodes. Also,           are replaced by other indexes. Therefore, it is
these trees have more elements at the root of the        necessary to improve the procedure for processing
node and the fullness of the node is ¾. All this         indexes in the RAM of the database (Picture 1) by:
saves hard disk space and increases the speed of             • reducing the specific storage in the RAM
access to information [19-20].                                    of indexes that are little used;
    However, the index structures used in modern             • processing little-used indexes by reading
databases have some limitations due to the long                   them from disk.
process of restructuring the index structure in the
case of adding or removing new data.
Accordingly, this leads to a slow process of
searching for information in a database with large
data sets.
                                                             after filling the memory, it is deleted. Another big
                                                             advantage is that indexes that are almost never
                                                             used will not be loaded into RAM.
   Collection of      Loading            Change of
   statistics on     indexes in          indices for         3. A modified method of searching
    the use of         DBMS                the new
      indices          RAM                  period              for queries using fractal trees

                                                                  Recently, fractals are increasingly being used
                                                             in various areas of our lives. Fractals can be used
                                                             to model and describe various phenomena in the
Figure 1: Algorithm for loading indexes in the               fields of radio engineering and electronics, digital
DBMS RAM by hashing                                          information processing, and computer graphics
                                                             [23].
  Here is an algorithm for loading indexes into                  The concept of «fractal» was proposed by the
memory based on the index hashing method:                    French-American          mathematician        Benoit
  • DBMS loads indexes into RAM according                    Mandelbrot. In 1977, he published Fractal
      to the classical algorithm and collects                Geometry of Nature, describing repetitive
      statistics on the number of used indices               drawings from everyday life. According to him,
      during ∆t;                                             many geometric shapes consist of smaller shapes,
  • after collecting statistics, the DBMS loads              which when enlarged accurately repeat a large
      into RAM only those data that were used                shape. After research, he also found that fractals
      most often during the time period ∆t;                  have chaotic behavior, fractional infinite
  • if there is no data in the RAM during the                dimension and can be described mathematically
      query, the search is performed by reading              using simple algorithms.
      nodes from the disk index space of the                     Fractal in a more general sense means an
      database;                                              irregular, self-similar structure, set, subsets and
                                                             elements of which are similar to the set itself.
  • if the time ∆t has expired, then in RAM are
                                                             Fractals can be deterministic or stochastic. They
      loaded those indexes that are used most
                                                             can also be classified according to self-similarity.
      often and have not been loaded before.
                                                             There are three types of self-similarity in fractals:
  This algorithm is implemented in two stages:
                                                             exact self-similarity (looks the same at different
  1. statistics are collected on the number of
                                                             magnifications); almost self-similarity (fractal
      used indices for the corresponding period
                                                             looks approximately (but not exactly) self-similar
      ∆t;
                                                             at different magnifications); statistical self-
  2. the indices that were most often used in the
                                                             similarity (fractal has numerical or statistical
      previous time interval ∆t are loaded into
                                                             measures that persist with magnification).
      RAM.
                                                             Examples of fractals are the Cantor set, the
  We have a formula for calculating time:
                                                             Lyapunov fractal, the Serpinsky triangle, the
                                                             Serpinsky carpet, the Menger sponge, the
                   (∑𝑛𝑖=1 𝑘𝑖 𝑊𝑖 )𝑡                     (1)
            ∆𝑡 =                     ,                       Apollonia grid, the dragon curve, and the Koch
                   (∑𝑛𝑖=1 𝑊𝑖 )𝑘                              curve. Also recently, attention is paid to fractal
                                                             trees: from each branch depart smaller, similar to
where 𝑘𝑖 – the number of used i-th index, 𝑘 – the            it, from them - even smaller (figure 2). By a
number of indexes used, 𝑊𝑖 – the weighting                   separate branch of mathematical methods can
                                                             describe the properties of the whole tree.
factor of the i-th index, 𝑡 – time of statistics
collection.
   Loading data indexes into RAM (figure 1)
based on statistics on the frequency of use of
indices and index size weights leads to a decrease
in the number of indexes that are loaded into
RAM, in contrast to the classic loading, where the
loading of indexes occurs during their use, and
Figure 2: An example of constructing a fractal
tree

    To construct the structure of the indices will be     Figure 4: Pythagorean tree for 6 levels
used Pythagorean fractal tree - a flat fractal,
consisting of interconnected right triangles of               These indexes can be easily used for large
squares built on the legs and hypotenuse                  databases. The structure of such indexes is
(figure 3).                                               presented in the form of arrays with a length equal
                                                          to powers of number 2. This structure is easily
                                                          scalable for a large number of keys, and is not
                                                          sensitive to the content of the entered queries.
                                                              The main advantage of using fractal trees is
                                                          that the resulting structure is symmetrical and
                                                          internally balanced. Symmetry helps to execute
                                                          the request quickly and, as a result, there will be
                                                          much fewer requests to the disk subsystem. Also,
                                                          due to the self-similarity property, the most
                                                          frequently used indexes will be loaded into the
                                                          DBMS RAM much faster after selection. This
                                                          will speed up the process of finding the
                                                          information you need for the request.
                                                              The index uses a new multi-level approach -
                                                          additional levels of the tree allow you to search in
                                                          the data block that contains the information on
Figure 3: Pythagorean tree                                request. Each request accesses the same number
                                                          of levels, which provides balanced access to the
    The Pythagorean tree with N levels is a trunk         index and disk subsystem.
and two Pythagorean trees with N-1 levels depart                 Updating, inserting and deleting indexes
from it symmetrically, so that the length of their        can be done very efficiently. The update is
trunks is 2 times less and the angle between them         performed as a sequential deletion of the old key,
is 90 degrees (figure 4).                                 followed by the insertion of a new key value.
    The Pythagorean tree is divided into subtree          Inserting a key into a fractal tree involves adding
blocks, where each tree is a full-fledged fractal         one new node or adding an edge to an existing
tree. We present this subtree in the form of a new        node. Inserting requires changes to only one block
horizontal level, which complements the vertical          at level 1. First, look for a block to update - if the
structure of the original tree. If the new horizontal     block is crowded, it must be divided, and this
level is too large, then in order to fit into one block   leads to the creation of a new node in level 2.
of the disk, it is divided into two blocks and            Separation of blocks is very rare and does not
indexed in the third horizontal level.                    affect performance.
                                                              To study the effectiveness of indexes based on
                                                          a modified fractal search tree in the database and
                                                          choose the best system for hosting the database
                                                          server, we measured the speed of information
                                                          retrieval in tables for Windows 10. Experiments
                                                          show that the search speed of modified trees
compared to modified fractal search tree is              which is especially true in the era of multicore
reduced by 12% (Picture 5).                              modern computer systems.
   The average error of the result for the modified            Prospects for further research are seen in
search tree is 0.91%, and for the modified fractal       the creation of new methods for processing
search tree is 0.89%. Therefore, the experiments         queries in distributed databases based on index
are performed correctly and provide the results          hashing using fractal trees.
with a given accuracy.
                     В+-tree     Modified fractal tree   5. References

500000                                                   [1] V.A. Mashkov, O.V. Barabash, Self-Testing
                                                             of Multimodule Systems Based on Optimal
450000
                                                             Check-Connection Structures. Engineering
400000                                                       Simulation. Amsterdam: OPA, 13 (1996) 479-
                                                             492.
350000
                                                         [2] V.A. Mashkov, O.V. Barabash, Self-
300000                                                       checking and Self-diagnosis of Module
                                                             Systems on the Principle of Walking
250000                                                       Diagnostic Kernel. Engineering Simulation.
200000                                                       Amsterdam: OPA, 15 (1998) 43-51.
                                                         [3] O. Barabash, G. Shevchenko, N. Dakhno, O.
150000                                                       Neshcheret, A. Musienko, Information
100000                                                       Technology of Targeting: Optimization of
                                                             Decision Making Process in a Competitive
 50000                                                       Environment. International Journal of
 TIME, M/S




             0                                               Intelligent Systems and Applications. Hong
                 0      20       40       60                 Kong: MECS Publisher, 9 (12) (2017) 1-9.
                     NUMBER OF RECORDS                   [4] O.V. Barabash, P.V. Open’ko, O.V. Kopiika,
                                                             H.V. Shevchenko, N.B. Dakhno, Target
Figure 5: Comparison of query search speed for               Programming           with        Multicriterial
modified tree and modified fractal tree                      Restrictions Application to the Defense
                                                             Budget Optimization. Advances in Military
4. Conclusions                                               Technology, 14(2) (2019) 213-229.
                                                         [5] V. Sobchuk, О. Barabash, A. Musienko, O.
                                                             Svynchuk, Adaptive accumulation and
       New methods of index processing in                    diagnostic      information      systems      of
databases for speeding up information processing             enterprises in energy and industry sectors. 1st
are offered. The developed modified methods                  Conference on Traditional and Renewable
differ from the known methods of processing                  Energy Sources: Perspectives and Paradigms
queries in databases in that they can be used for a          for the 21st Century (TRESP 2021), Volume
large amount of information. Loading data indices            250,          09           April         2021.
into RAM based on statistics on the frequency of             doi.org/10.1051/e3sconf/202125008002
use of indices and index size weights leads to a         [6] H. Zhenbing, V. Mukhin, Ya. Kornaga, O.
decrease in the number of indexes that are loaded            Herasymenko, Y. Bazaka, The scheduler for
into RAM. A modification of the data processing              the grid system based on the parameters
algorithm in RAM has been performed, which has               monitoring of the computer components.
made it possible to exclude indexes that are rarely          Eastern European Journal of Enterprise
used in memory. The resulting structure is                   Technologies, 1 (2017) 31-39.
balanced and optimized for storage in the disk           [7] V. Savchenko, O. Ilin, N. Hnidenko, O.
subsystem, reduces the number of I / O operations            Tkachenko, O. Laptiev, S. Lehominova,
to a minimum. The method of constructing                     Detection of Slow DDoS Attacks based on
indexes based on a modified fractal tree allows to           User’s Behavior Forecasting. International
increase the data search speed by 12% compared               Journal of Emerging Trends in Engineering
                                                             Research (IJETER) 8(5) (2020) 2019-2025.
to the modified method of index search based on
                                                         [8] O. Laptiev, O. Stefurak, I. Polovinkin, O.
a classic B + tree. The proposed approach also has           Barabash, V.Savchenko, O. Zelikovska. The
an important property of scaling, as fractal trees           method of improving the signal detection
are divided into a large number of smaller trees,            quality by accounting for interference. 2020
     IEEE 2nd International Conference on           [16] Z. Hu, V. Mukhin, Ya. Kornaga, O.
     Advanced Trends in Information Theory               Herasymenko, Y. Mostoviy, The Analytical
     (IEEE ATIT 2020) Conference Proceedings             Model for Distributed Computer System
     Kyiv, Ukraine, November 25-27, pp.172-176.          Parameters Control Based on Multi-factoring
[9] V. Tkachov, V.Tokariev, Y. Dukh, V.                  Estimations. Journal of Network and Systems
     Volotka, Method of Data Collection in               Management, 27(2) (2019) 351-365.
     Wireless Sensor Networks Using Flying Ad       [17] O. Barabash, O. Laptiev, V. Tkachev, O.
     Hoc Network. 2018 5th International                 Maystrov, O. Krasikov, I. Polovinkin, The
     Scientific-Practical Conference Problems of         Indirect method of obtaining Estimates of the
     Infocommunications.         Science     and         Parameters of Radio Signals of covert means
     Technology, October 9-12, 2018 Kharkiv,             of obtaining Information. International
     Ukraine, pp. 197-201.                               Journal of Emerging Trends in Engineering
[10] K. Smelyakov, S. Smelyakov, A. Chupryna             Research (IJETER), 8(8) (2020) 4133-4139.
     Advances in Spatio-Temporal Segmentation       [18] S. Yevseiev, R. Korolyov, A. Tkachov, O.
     of Visual Data. Chapter 1. Adaptive Edge            Laptiev, I. Opirskyy, O. Soloviova,
     Detection Models and Algorithms. Series             Modification of the algorithm (OFM) S-box,
     Studies in Computational Intelligence (SCI),        which provides increasing crypto resistance
     volume 876, publisher Springer, Cham,               in the post-quantum period. International
     2020, pp. 1-51.                                     Journal of Advanced Trends in Computer
[11] S. Yevseiev, R. Korolyov, A. Tkachov, O.            Science and Engineering (IJATCSE) 9(5)
     Laptiev, I. Opirskyy, O. Soloviova,                 (2020) 8725-8729.
     Modification of the algorithm (OFM) S-box,     [19] М. Pratsiovytyi, O. Svynchuk, Spread of
     which provides increasing crypto resistance         values of a Cantor-type fractal continuous
     in the post-quantum period. International           nonmonotone        function.     Journal    of
     Journal of Advanced Trends in Computer              Mathematical Sciences, 240(3) (2019) 342-
     Science and Engineering (IJATCSE) 9(5)              357. doi.org/10.1007/s10958-019-04354-0
     (2020) 8725-8729.                              [20] О. Laptiev, G. Shuklin, O.Stefurak, O.
[12] O. Barabash, O. Laptiev, O. Kovtun, O.              Svynchuk, O. Urdenko, S. Hohoniants,
     Leshchenko, K. Dukhnovska, A. Biehun,               Method of the increasing the detection
     The Method dynavic TF-IDF. International            system and recognition of digital
     Journal of Emerging Trends in Engineering           radiosignals.          Wschodnioeuropejskie
     Research (IJETER), 8(9) (2020) 5713-5718.           Czasopismo Naukowe, East European
[13] O. Laptiev, V.Savchenko, S. Yevseiev, H.            Scientific Journal, 2 (54) (2020) 4-16.
     Haidur, S. Gakhov, Spartak Hohoniants, The     [21] O. Svynchuk, O. Barabash, J. Nikodem, R.
     new method for detecting signals of means of        Kochan, O. Laptiev, Image compression
     covert obtaining information. 2020 IEEE 2nd         using fractal functions Fractal and Fractional,
     International Conference on Advanced                5(2), (2021) 31.
     Trends in Information Theory (IEEE ATIT        [22] O.V. Barabash, A.P. Musienko, V.V.
     2020) Conference Proceedings Kyiv,                  Sobchuk, N.V. Lukova-Chuiko, O.V.
     Ukraine, November 25-27, pp.176 –181.               Svynchuk, Distribution of Values of Cantor
[14] V. Sobchuk, V. Pichkur, O. Barabash, O.             Type Fractal Functions with Specified
     Laptiev, I.Kovalchuk, A. Zidan, Algorithm of        Restrictions.      Chapter        in     Book
     control of functionally stable manufacturing        “Contemporary Approaches and Methods in
     processes of enterprises. 2020 IEEE 2nd             Fundamental Mathematics and Mechanics”.
     International Conference on Advanced                Editors Victor A. Sadovnichiy, Michael Z.
     Trends in Information Theory (IEEE ATIT             Zgurovsky. Publisher Name: Springer,
     2020) Conference Proceedings Kyiv,                  Cham, Switzerland AG 2021, pp. 433-455.
     Ukraine, November 25-27, pp. 206-211.               doi.org/10.1007/978-3-030-50302-4_21
[15] V. Savchenko, O. Laptiev, O. Kolos, R.           [23]S. Toliupa, N. Lukova-Chuiko, O. Oksiuk.
     Lisnevskyi, V. Ivannikova, I. Ablazov,               Choice of Reasonable Variant of Signal and
     Hidden Transmitter Localization Accuracy             Code Constructions for Multirays Radio
     Model Based on Multi-Position Range                  Channels. Second International Scientific-
     Measurement. 2020 IEEE 2nd International             Practical     Conference      Problems     of
     Conference on Advanced Trends in                     Infocommunications.         Science       and
     Information Theory (IEEE ATIT 2020)                  Technology. IEEE PIC S&T 2015. pp. 269 –
     Conference Proceedings Kyiv, Ukraine,                271.
     November 25-27, pp.246-251.