=Paper=
{{Paper
|id=Vol-3200/paper5
|storemode=property
|title=Modification of Query Processing Methods in Distributed Databases Using Fractal Trees
|pdfUrl=https://ceur-ws.org/Vol-3200/paper5.pdf
|volume=Vol-3200
|authors=Olha Svynchuk,Andrii Barabash,Serhii Laptiev,Tetiana Laptieva
}}
==Modification of Query Processing Methods in Distributed Databases Using Fractal Trees ==
Modification of Query Processing Methods in Distributed Databases Using Fractal Trees Olha Svynchuk1, Andrii Barabash2, Serhii Laptiev3 and Tetiana Laptieva4 1,2 National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», Politehnichna str. 56, 5, Kyiv, 03056, Ukraine 3,4 Taras Shevchenko National University of Kyiv, Volodymyrska str. 64/13, Kyiv, 01601, Ukraine, Abstract Today in database management systems there is an acute problem of searching for data in large data sets. To solve this problem, we propose a modified search tree and its improvement using a fractal index search tree with a multilevel structure. Each level in such a structure is a separate fractal tree. Algorithms for data processing in DBMS RAM by modified methods are described. These methods can be used to search for the same data from different tables. Increased the minimum filling of the node, which reduces the height of the tree. The symmetry of the fractal tree helps to execute the query quickly and, as a result, reduce the number of requests to the disk subsystem. Also, due to the self-similarity property, the most frequently used indexes will be loaded into the DBMS RAM much faster after selection. This will speed up the process of finding the information you need for the request. Loading data indexes into RAM based on statistics on the frequency of use of indices and index size weights will reduce the number of indexes that are loaded into RAM, in contrast to the classic loading where the loading of indexes occurs during their use and after filling the memory, it is deleted. Another big advantage is that indexes that are almost never used will not be loaded into RAM. The proposed approach with fractal trees also has an important scaling property, as fractal trees are divided into a large number of smaller trees, which is especially true in the era of multicore modern computer systems. To study the effectiveness of the use of indexes based on a modified fractal search tree in the database and select the best system for hosting the database server, we measured the speed of information retrieval in tables for the Windows 10 operating system. During the experiments it was shown that the search speed on the modified trees in comparison with the modified fractal search tree is reduced by 12%. Keywords 1 database, data search, B + -trees, modified trees, fractal trees, indexes 1. Introduction concept that describes the characteristics of this data. In modern information systems for high- quality work with databases use DBMS database In today's world we can see a rapid increase in management systems that provide the ability to information, which complicates the process of its create, store, update and search for the necessary storage and management. Therefore, for its information. DBMSs also provide a number of organization and quick search using databases useful services: schema to control data semantics, (DB), which are organized according to the query language to organize access to part of the III International Scientific And Practical Conference “Information Security And Information Technologies”, September 13–19, 2021, Odesa, Ukraine EMAIL: 7011990@ukr.net mailto: (A. 1); andrew.barbsh@gmail.com (A. 2); salaptiev@gmail.com (A. 3); tetiana1986@ukr.net (A.4) ORCID: 0000-0001-9032-6335 (A. 1); 0000-0001-8433-2827 (A. 2); 0000-0002-7291-1829 (A. 3); 0000-0002-5223-9078 (A. 4) ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) database, data granulation, data integrity The aim of the article is to improve the process management, compression to reduce database of processing indexes in databases using fractal size, indexing to speed up query processing. trees and speed up query execution. However, the integration of different databases into the production process at enterprises and 2. Modified search tree other institutions has a number of shortcomings associated with the organization of their management and monitoring of events in The existing mechanisms of data modification databases [1-5]. in the tables have a certain feature - the change of In modern databases, an important element is keys in the corresponding nodes of the tree is the search for data in tables that contain many performed with the subsequent restructuring of rows and columns and are not always ordered. the index. This significantly affects the speed of Therefore, to implement a quick search, indexes writing information to the database and, are created that are formed from the values of one accordingly, is an important factor in increasing or more columns and pointers to the the number of queries to the database. You also corresponding rows. Indexes allow you to avoid need to store only the most frequently used sequential or step-by-step browsing of the file in indexes, then, accordingly, the access time to the search of the desired data. They are ordered, each data storage location will be reduced. Therefore, element of the index contains the name of the the existing methods need to be improved, which searched object and a pointer-identifier of its will allow to find the necessary information faster location. The more indexes, the better the [21-22]. performance of database queries, but a very large In the + tree we will improve as follows: number of indexes does not guarantee high • increase the minimum filling of the node, performance [5-10]. which will reduce the height of the tree; Many databases use different trees and their • change the rule of separating the nodes of modifications to build such indexes. However, if the tree - splitting the node with its two the tree has an insufficient number of nodes and neighbors into four new nodes; their fullness, the data search time increases [11- • change the rule of connecting tree nodes - 13]. The disadvantages may also be the use of connecting four nodes into three new identical indexes for different tables and sending nodes; to the RAM of indexes that are rarely used [14- • in the tree leaf we will store records of 15]. links to the same fields in different tables, The base trees in index construction and data which will increase the time of receipt of retrieval are B-trees, namely their type B + trees. links to data in the tree and speed up the These trees easily implement the independence of search. the program from the structure of the information We describe the search for data using indexes. record, have the ability to sequential access and Indexes are loaded into the RAM of the database all key data are contained only in the sheets. The after receiving the request. Next, a list of data is main disadvantages of such trees are the formed, which contains the necessary compactness of filling and the number of levels of information, and the found data is sent to RAM. trees [16-18]. However, in the classical algorithm for loading You can also select K-trees, which contain all indexes in the RAM are indexes that are almost the characteristics of the B + tree, but have a better not used, and, accordingly, take place until they strategy of splitting and merging nodes. Also, are replaced by other indexes. Therefore, it is these trees have more elements at the root of the necessary to improve the procedure for processing node and the fullness of the node is ¾. All this indexes in the RAM of the database (Picture 1) by: saves hard disk space and increases the speed of • reducing the specific storage in the RAM access to information [19-20]. of indexes that are little used; However, the index structures used in modern • processing little-used indexes by reading databases have some limitations due to the long them from disk. process of restructuring the index structure in the case of adding or removing new data. Accordingly, this leads to a slow process of searching for information in a database with large data sets. after filling the memory, it is deleted. Another big advantage is that indexes that are almost never used will not be loaded into RAM. Collection of Loading Change of statistics on indexes in indices for 3. A modified method of searching the use of DBMS the new indices RAM period for queries using fractal trees Recently, fractals are increasingly being used in various areas of our lives. Fractals can be used to model and describe various phenomena in the Figure 1: Algorithm for loading indexes in the fields of radio engineering and electronics, digital DBMS RAM by hashing information processing, and computer graphics [23]. Here is an algorithm for loading indexes into The concept of «fractal» was proposed by the memory based on the index hashing method: French-American mathematician Benoit • DBMS loads indexes into RAM according Mandelbrot. In 1977, he published Fractal to the classical algorithm and collects Geometry of Nature, describing repetitive statistics on the number of used indices drawings from everyday life. According to him, during ∆t; many geometric shapes consist of smaller shapes, • after collecting statistics, the DBMS loads which when enlarged accurately repeat a large into RAM only those data that were used shape. After research, he also found that fractals most often during the time period ∆t; have chaotic behavior, fractional infinite • if there is no data in the RAM during the dimension and can be described mathematically query, the search is performed by reading using simple algorithms. nodes from the disk index space of the Fractal in a more general sense means an database; irregular, self-similar structure, set, subsets and elements of which are similar to the set itself. • if the time ∆t has expired, then in RAM are Fractals can be deterministic or stochastic. They loaded those indexes that are used most can also be classified according to self-similarity. often and have not been loaded before. There are three types of self-similarity in fractals: This algorithm is implemented in two stages: exact self-similarity (looks the same at different 1. statistics are collected on the number of magnifications); almost self-similarity (fractal used indices for the corresponding period looks approximately (but not exactly) self-similar ∆t; at different magnifications); statistical self- 2. the indices that were most often used in the similarity (fractal has numerical or statistical previous time interval ∆t are loaded into measures that persist with magnification). RAM. Examples of fractals are the Cantor set, the We have a formula for calculating time: Lyapunov fractal, the Serpinsky triangle, the Serpinsky carpet, the Menger sponge, the (∑𝑛𝑖=1 𝑘𝑖 𝑊𝑖 )𝑡 (1) ∆𝑡 = , Apollonia grid, the dragon curve, and the Koch (∑𝑛𝑖=1 𝑊𝑖 )𝑘 curve. Also recently, attention is paid to fractal trees: from each branch depart smaller, similar to where 𝑘𝑖 – the number of used i-th index, 𝑘 – the it, from them - even smaller (figure 2). By a number of indexes used, 𝑊𝑖 – the weighting separate branch of mathematical methods can describe the properties of the whole tree. factor of the i-th index, 𝑡 – time of statistics collection. Loading data indexes into RAM (figure 1) based on statistics on the frequency of use of indices and index size weights leads to a decrease in the number of indexes that are loaded into RAM, in contrast to the classic loading, where the loading of indexes occurs during their use, and Figure 2: An example of constructing a fractal tree To construct the structure of the indices will be Figure 4: Pythagorean tree for 6 levels used Pythagorean fractal tree - a flat fractal, consisting of interconnected right triangles of These indexes can be easily used for large squares built on the legs and hypotenuse databases. The structure of such indexes is (figure 3). presented in the form of arrays with a length equal to powers of number 2. This structure is easily scalable for a large number of keys, and is not sensitive to the content of the entered queries. The main advantage of using fractal trees is that the resulting structure is symmetrical and internally balanced. Symmetry helps to execute the request quickly and, as a result, there will be much fewer requests to the disk subsystem. Also, due to the self-similarity property, the most frequently used indexes will be loaded into the DBMS RAM much faster after selection. This will speed up the process of finding the information you need for the request. The index uses a new multi-level approach - additional levels of the tree allow you to search in the data block that contains the information on Figure 3: Pythagorean tree request. Each request accesses the same number of levels, which provides balanced access to the The Pythagorean tree with N levels is a trunk index and disk subsystem. and two Pythagorean trees with N-1 levels depart Updating, inserting and deleting indexes from it symmetrically, so that the length of their can be done very efficiently. The update is trunks is 2 times less and the angle between them performed as a sequential deletion of the old key, is 90 degrees (figure 4). followed by the insertion of a new key value. The Pythagorean tree is divided into subtree Inserting a key into a fractal tree involves adding blocks, where each tree is a full-fledged fractal one new node or adding an edge to an existing tree. We present this subtree in the form of a new node. Inserting requires changes to only one block horizontal level, which complements the vertical at level 1. First, look for a block to update - if the structure of the original tree. If the new horizontal block is crowded, it must be divided, and this level is too large, then in order to fit into one block leads to the creation of a new node in level 2. of the disk, it is divided into two blocks and Separation of blocks is very rare and does not indexed in the third horizontal level. affect performance. To study the effectiveness of indexes based on a modified fractal search tree in the database and choose the best system for hosting the database server, we measured the speed of information retrieval in tables for Windows 10. Experiments show that the search speed of modified trees compared to modified fractal search tree is which is especially true in the era of multicore reduced by 12% (Picture 5). modern computer systems. The average error of the result for the modified Prospects for further research are seen in search tree is 0.91%, and for the modified fractal the creation of new methods for processing search tree is 0.89%. Therefore, the experiments queries in distributed databases based on index are performed correctly and provide the results hashing using fractal trees. with a given accuracy. В+-tree Modified fractal tree 5. References 500000 [1] V.A. Mashkov, O.V. Barabash, Self-Testing of Multimodule Systems Based on Optimal 450000 Check-Connection Structures. Engineering 400000 Simulation. Amsterdam: OPA, 13 (1996) 479- 492. 350000 [2] V.A. Mashkov, O.V. Barabash, Self- 300000 checking and Self-diagnosis of Module Systems on the Principle of Walking 250000 Diagnostic Kernel. Engineering Simulation. 200000 Amsterdam: OPA, 15 (1998) 43-51. [3] O. Barabash, G. Shevchenko, N. Dakhno, O. 150000 Neshcheret, A. Musienko, Information 100000 Technology of Targeting: Optimization of Decision Making Process in a Competitive 50000 Environment. International Journal of TIME, M/S 0 Intelligent Systems and Applications. Hong 0 20 40 60 Kong: MECS Publisher, 9 (12) (2017) 1-9. NUMBER OF RECORDS [4] O.V. Barabash, P.V. Open’ko, O.V. Kopiika, H.V. Shevchenko, N.B. Dakhno, Target Figure 5: Comparison of query search speed for Programming with Multicriterial modified tree and modified fractal tree Restrictions Application to the Defense Budget Optimization. Advances in Military 4. Conclusions Technology, 14(2) (2019) 213-229. [5] V. Sobchuk, О. Barabash, A. Musienko, O. Svynchuk, Adaptive accumulation and New methods of index processing in diagnostic information systems of databases for speeding up information processing enterprises in energy and industry sectors. 1st are offered. The developed modified methods Conference on Traditional and Renewable differ from the known methods of processing Energy Sources: Perspectives and Paradigms queries in databases in that they can be used for a for the 21st Century (TRESP 2021), Volume large amount of information. Loading data indices 250, 09 April 2021. into RAM based on statistics on the frequency of doi.org/10.1051/e3sconf/202125008002 use of indices and index size weights leads to a [6] H. Zhenbing, V. Mukhin, Ya. Kornaga, O. decrease in the number of indexes that are loaded Herasymenko, Y. Bazaka, The scheduler for into RAM. A modification of the data processing the grid system based on the parameters algorithm in RAM has been performed, which has monitoring of the computer components. made it possible to exclude indexes that are rarely Eastern European Journal of Enterprise used in memory. The resulting structure is Technologies, 1 (2017) 31-39. balanced and optimized for storage in the disk [7] V. Savchenko, O. Ilin, N. Hnidenko, O. subsystem, reduces the number of I / O operations Tkachenko, O. Laptiev, S. Lehominova, to a minimum. The method of constructing Detection of Slow DDoS Attacks based on indexes based on a modified fractal tree allows to User’s Behavior Forecasting. International increase the data search speed by 12% compared Journal of Emerging Trends in Engineering Research (IJETER) 8(5) (2020) 2019-2025. to the modified method of index search based on [8] O. Laptiev, O. Stefurak, I. Polovinkin, O. a classic B + tree. The proposed approach also has Barabash, V.Savchenko, O. Zelikovska. The an important property of scaling, as fractal trees method of improving the signal detection are divided into a large number of smaller trees, quality by accounting for interference. 2020 IEEE 2nd International Conference on [16] Z. Hu, V. Mukhin, Ya. Kornaga, O. Advanced Trends in Information Theory Herasymenko, Y. Mostoviy, The Analytical (IEEE ATIT 2020) Conference Proceedings Model for Distributed Computer System Kyiv, Ukraine, November 25-27, pp.172-176. Parameters Control Based on Multi-factoring [9] V. Tkachov, V.Tokariev, Y. Dukh, V. Estimations. Journal of Network and Systems Volotka, Method of Data Collection in Management, 27(2) (2019) 351-365. Wireless Sensor Networks Using Flying Ad [17] O. Barabash, O. Laptiev, V. Tkachev, O. Hoc Network. 2018 5th International Maystrov, O. Krasikov, I. Polovinkin, The Scientific-Practical Conference Problems of Indirect method of obtaining Estimates of the Infocommunications. Science and Parameters of Radio Signals of covert means Technology, October 9-12, 2018 Kharkiv, of obtaining Information. International Ukraine, pp. 197-201. Journal of Emerging Trends in Engineering [10] K. Smelyakov, S. Smelyakov, A. Chupryna Research (IJETER), 8(8) (2020) 4133-4139. Advances in Spatio-Temporal Segmentation [18] S. Yevseiev, R. Korolyov, A. Tkachov, O. of Visual Data. Chapter 1. Adaptive Edge Laptiev, I. Opirskyy, O. Soloviova, Detection Models and Algorithms. Series Modification of the algorithm (OFM) S-box, Studies in Computational Intelligence (SCI), which provides increasing crypto resistance volume 876, publisher Springer, Cham, in the post-quantum period. International 2020, pp. 1-51. Journal of Advanced Trends in Computer [11] S. Yevseiev, R. Korolyov, A. Tkachov, O. Science and Engineering (IJATCSE) 9(5) Laptiev, I. Opirskyy, O. Soloviova, (2020) 8725-8729. Modification of the algorithm (OFM) S-box, [19] М. Pratsiovytyi, O. Svynchuk, Spread of which provides increasing crypto resistance values of a Cantor-type fractal continuous in the post-quantum period. International nonmonotone function. Journal of Journal of Advanced Trends in Computer Mathematical Sciences, 240(3) (2019) 342- Science and Engineering (IJATCSE) 9(5) 357. doi.org/10.1007/s10958-019-04354-0 (2020) 8725-8729. [20] О. Laptiev, G. Shuklin, O.Stefurak, O. [12] O. Barabash, O. Laptiev, O. Kovtun, O. Svynchuk, O. Urdenko, S. Hohoniants, Leshchenko, K. Dukhnovska, A. Biehun, Method of the increasing the detection The Method dynavic TF-IDF. International system and recognition of digital Journal of Emerging Trends in Engineering radiosignals. Wschodnioeuropejskie Research (IJETER), 8(9) (2020) 5713-5718. Czasopismo Naukowe, East European [13] O. Laptiev, V.Savchenko, S. Yevseiev, H. Scientific Journal, 2 (54) (2020) 4-16. Haidur, S. Gakhov, Spartak Hohoniants, The [21] O. Svynchuk, O. Barabash, J. Nikodem, R. new method for detecting signals of means of Kochan, O. Laptiev, Image compression covert obtaining information. 2020 IEEE 2nd using fractal functions Fractal and Fractional, International Conference on Advanced 5(2), (2021) 31. Trends in Information Theory (IEEE ATIT [22] O.V. Barabash, A.P. Musienko, V.V. 2020) Conference Proceedings Kyiv, Sobchuk, N.V. Lukova-Chuiko, O.V. Ukraine, November 25-27, pp.176 –181. Svynchuk, Distribution of Values of Cantor [14] V. Sobchuk, V. Pichkur, O. Barabash, O. Type Fractal Functions with Specified Laptiev, I.Kovalchuk, A. Zidan, Algorithm of Restrictions. Chapter in Book control of functionally stable manufacturing “Contemporary Approaches and Methods in processes of enterprises. 2020 IEEE 2nd Fundamental Mathematics and Mechanics”. International Conference on Advanced Editors Victor A. Sadovnichiy, Michael Z. Trends in Information Theory (IEEE ATIT Zgurovsky. Publisher Name: Springer, 2020) Conference Proceedings Kyiv, Cham, Switzerland AG 2021, pp. 433-455. Ukraine, November 25-27, pp. 206-211. doi.org/10.1007/978-3-030-50302-4_21 [15] V. Savchenko, O. Laptiev, O. Kolos, R. [23]S. Toliupa, N. Lukova-Chuiko, O. Oksiuk. Lisnevskyi, V. Ivannikova, I. Ablazov, Choice of Reasonable Variant of Signal and Hidden Transmitter Localization Accuracy Code Constructions for Multirays Radio Model Based on Multi-Position Range Channels. Second International Scientific- Measurement. 2020 IEEE 2nd International Practical Conference Problems of Conference on Advanced Trends in Infocommunications. Science and Information Theory (IEEE ATIT 2020) Technology. IEEE PIC S&T 2015. pp. 269 – Conference Proceedings Kyiv, Ukraine, 271. November 25-27, pp.246-251.