=Paper=
{{Paper
|id=Vol-1837/paper27
|storemode=property
|title=The GPU-Oriented Tree Representation Based on the Method of Finding the Remainder
|pdfUrl=https://ceur-ws.org/Vol-1837/paper27.pdf
|volume=Vol-1837
|authors=Vladimir Voronkin,Andrey Malikov,Elmira Azarova,Aleksey Shchegolev
|dblpUrl=https://dblp.org/rec/conf/ysip/VoronkinMAS17
}}
==The GPU-Oriented Tree Representation Based on the Method of Finding the Remainder==
The GPU-Oriented Tree Representation Based on the Method of Finding the Remainder Vladimir Voronkin Andrey Malikov Elmira Azarova vl.voronkinraxperi. om malikovn stu.ru elmira.azarovamail.ru Aleksey Sh hegolev a.wegolevgmail. om North-Cau asus Federal University, Institute of Information Te hnologies and Tele ommuni ations, Russial Federation Abstra t This paper des ribes a new method of the tree-like stru ture represen- tation in data bases. The developed method of nding relationships be- tween nodes is based on the unique prime fa torization theorem whi h states that every natural number an be written as a produ t of prime numbers. The developed method is implemented using NVIDIA graph- i s pro essing unit parallel omputing and NVIDIA CUDA te hnology. The paper analyzes the pro ess of tree onstru tion with the developed method applied, represents the algorithms of GPU-based pro essing of trees oded using the developed method, and also the results of perfor- man e snapshot. 1 Introdu tion Computer systems are developing rapidly, pro essor ar hite tures are hanging, the main and disk memorysizes are in reasing. Nowadays the rapid in rease of power and data volumes allows for the development of high- performan e data management and pro essing systems. A large amount of data in the world in one form or another is urrently represented as a hierar hy or mutual dependen y where some pie es of information depend on the others. What is more, data management systems are beginning to migrate from the disk-oriented te hnology to the memory-oriented. The development of DBMS has led to integrating in-memory te hnology support by many database management systems. [GBE13℄ Data pro essing in the main memory enables a onsiderable in rease of the data pro essing rate in favor of safety. Modern DBMS, supporting in-memory te hnology pro ess data in the memory mirroring hanges to the disk. The appli ation of memory-oriented data pro essing te hnologies together with the new methods of represent- ing trees using GPU omputing an in rease the data pro essing rate. This resear h aims to develop a new method of hierar hy-representation in databases that exhibits high performan e in tree manipulation operations. Copyright c 2017 Copyright by theby the paper’s paper's authors. authors. Copying Copying permitted permitted for private for private and aand academic ademi purposes. purposes. In: S.A.Hölldobler, Editor, B.A.Coeditor Malikov, (eds.): C. Wernhard (eds.): of Pro eedings YSIP2 – Proceedings the XYZ Workshop,of the Lo Second Young Scientist’s ation, Country, International DD-MMM-YYYY, Workshop published at on Trends http:// in Information Processing, Dombai, Russian Federation, May 16–20, 2017, published at http://ceur-ws.org. eur-ws.org 195 1 2 Related Works This work is devoted to the des ription of the developed method of representing trees in databases. The develop- ment of the method was initially-oriented towards the maximum parallelization of tree pro essing and maximum independen e of node ode from other node odes. The paper fo uses on a radi ally new method of representing trees in relational and nonrelational databases, and also demonstrates that the GPU omputing of hierar hi al relationships an be more ee tive than the CPU. The developed method of representing trees is oriented towards the parallel pro essing mode with the use of a GPU. 2.1 Methods of Representing Trees All the methods of representing trees in databases an be put into two ategories: methods of representing trees in non-graph databases; methods of representing trees in graph and hierar hi al databases. The following methods belong to the rst ategory: The adja en y list method of relational hierar hi al modeling. The adja en y list method is based on the storage of dire t parent- hild relationships. Ea h hild in a treelike hierar hy relates to its parent whi h is at one level higher. Using this hierar hy property, a table-level relational dependen e an be reated. The materialized path method. A materialized path is a string eld that onsists of element names separated by a separator hara ter [Tro03℄. Parents' names of a node for whi h a materialized path is built are used as the materialized path elements. The Nested Sets method. The Nested Set method implies that there are two additional elds for the node des ription: a tree oding using the algorithm for the nested sets oding introdu ed by Joe Celko [Cel12℄ [Cel℄ [Cel04℄, is done as follows: moving down the left side of a tree, it's ne essary to travel through all the subtrees beginning with the leftmost to the rightmost and assign ea h node an auto in rement value. When moving down the tree, an auto in rement value is as signed to the variable responsible for the interval starting point (½left bound); when moving up - to the variable responsible for the interval end (½right bound). The nested intervals method. The nested intervals method, introdu ed in [Tro05℄, is based on the materialized path oding using a nite ontinued fra tion [q1 , q2 , . . ., qn ], where [q1 , q2 , . . ., qn ], where q1 , q2 , . . ., qn are the steps of a materialized path. Rational numbers a/b, where a >= b >= 1 and GCD(a, b) = 1 are used as tree element odes. As an example, onsider how the ode for a materialized path ½ 3.2.2 is reated using a ontinued fra tion: 1 17 3+ = 2 + 21 5 To transform a materialized path to a rational expression the onvergen e prin iple an be used, while for the inverse transformation the algorithm for the gradual trun ation should be used, that in ase of a rational expression is the same as Eu lid's algorithm for nding GCD [Gai99℄. Modi ations and ombinations of the afore-named methods. There are also modi ations of the given methods that an x the inherent problems [Kol09℄ [vor14℄ [MT11℄. For example, in [MT11℄ the improvement of the nested intervals method is introdu ed through the interval oding in the residue number system (RNS). Using RNS enables redu ing the length of numbers used by storing a number as a short length remainder. Representing node odes as numbers expressed in RNS enables parallel pro essing of not only dierent tree nodes but also one and the same node remainder. Thus, the maximum parallelization of data pro essing is in reased. In [vor14℄ the method of storing nested intervals expressed in RNS is introdu ed that solves some problems of sorting numbers expressed in RNS and the index onstru tion. Su h methods as the Materialized Path, nested intervals and modi ations of the both in lude the data on hildren and/or parents in the key while oding and an be pro essed in parallel with su ient ee tiveness. The methods of representing trees in graph and hierar hi al databases are di ult to lassify as most of graph databases use their own formats and stru tures of data representations. In some ases pointers are used for linking neighbor nodes. The pointers are bound to the data that appear in ar s. In its stru ture this hierar hy model is similar to the adja en y lists. Other [Kup86℄ [KV93℄ graph databases use a table model for storing relationship ar s. Another format of storing graph data is XML and XML-based formats [MS11℄ [BELP℄. 196 2 Index lookup to nd root Node Index lookup on Person.id Index lookup on Company.Name Person 1 Traverse relation Company 1 Person Index lookup on CompanyId Company WORKS Sin e 1998 Name: Google Id Name Works in Name: Larry Id Name Person 3 1 Page Larry Page PersonId Sinse 1 Google CompanyId S 2 Joshua Blo h K Traverse relation 1 1 1998 2 Ora le R 3 Brian Goetz O S Name: Brian ... ... W K 2001 Person 2 Sin e 2001 Company 2 O R Goetz N ... 2 1 W N ... 3 2 2010 Sin e 2010 Name: Joshua Name: Ora le b) Relational representation Blo h a) Graph representation Figure 1: Representation types of dependent data Fig. 1b and 1a illustrate the aforesaid. Fig. 1b illustrates a relational representation of hierar hi al data in DB, while Fig. 1a illustrates data representation in the form of a graph. The illustration data exemplify the storage of hierar hy data in databases. One of the distin tive features of graph databases is the base hosen for database obje ts representation, relations between the data, omplex obje ts and their attributes. All graph databases are based on a mathemati al des ription of a graph that determines graph manipulation operations. A graph model depends on the base, for example: a dire ted or undire ted graph, marked or unmarked graph nodes, weighted and unweighted ar s. Some models su h as GOOD, GMOD, G-Log, Gram represent both the s heme and entity as a named digraph. LDM is an ex eption whose s hemes are digraphs with leaves representing data and internal nodes represent stru tured data. LDM onsists of a table with two- olumns, ea h of whi h asso iates entities with data types (primitive, tuple or set). A more detailed des ription of graph models is represented in [AG08℄. 2.2 GPU Data Pro essing Graphi s pro essing unit seems to be an advan ed highperforman e platform for omputing. The basi distin - tive feature enabling GPU's high performan e is its ar hite ture having a great number of ALU. The GPU and CPU ar hite tures are ompared in [PT11℄. A number of works [BS10℄ [HKOO11℄ point to the advantage of GPU pro essing of data over CPU. On average, the queries were pro essed 530 times faster in relational databases. Some works [PSHL10℄ [BMCN09℄ show the in reased rate of sorting data using GPU. As far as hierar hi al and graph stru tures are on erned, the works [HN07℄ [HKOO11℄ [WO10℄ are worth mentioning. in whi h GPU graph omputations are made and whi h des ribe the speedup of data pro essing using GPU ompared to CPU (when pro essing graph stru tures). Hereafter, mentioning GPU omputing means the usage of NVIDIA CUDA te hnology. CUDA (Compute Unied Devi e Ar hite ture) is a NVIDIA te hnology designed for the development of appli ations for massive parallel omputing units. Due to a great number of ALU. GPU implements the thread model of omputation: there are input and output data that onsist of the same elements whi h an be pro essed independently. The elements pro essing is performed by the kernel. A kernel is a fun tion exe uted on GPU and alled from CPU. A kernel is omposed of a grid of multiple threads grouped into a hierar hy. The highest level grid orresponds to the kernel and groups all the kernel threads. A grid is a one- or two-dimensional array of blo ks. Ea h blo k is a one/two/three-dimensional array of threads. Furthermore, ea h blo k is an entirely independent set of intera ting threads. 3 Prin iples of Tree Constru tion By ½tree we will understand a onne ted a y li graph omposed of a set of nodes n = (k, d), where k is the node ode, d is data asso iated with the node: T = {n} = {(k, d)} Let us dene for a node n operations key(n) = k a ess to the node ode, data(n) = d a ess to data asso iated with the node. 197 3 Denote by the node identier a prime number unique within node Ids. Let us introdu e a set of primes I , representing primes in in reasing order of their values: I = s1 , s2 , . . . , sm , . . . Let us all a produ t of the node parent identiers by the node Id the node ode. The ode of a tree root is the root identier. Let us introdu e the operation of getting the following prime number from a set pf primes I : next() = si , i = i + 1 Primes are needed for tree onstru tion, be ause tree de omposition should be unique. The usage of one and the same identier for more than one node is impossible. A tree is formed as follows: 1. A prime is hosen as a root node. 2. Ea h node added to the tree has a ode equal to the produ t of the parent ode by the identier of the node added. Ea h tree node in the ode en odes the entire path starting from the root. This method of node ode reation is similar to the Materialized Path method. 4 Tree Pro essing Operations Let us show admissible tree operations. Let us introdu e the fun tion of nding the remainder on dividing a by b: a mod(a, b) = a − b · ⌊ ⌋ b The following tree operations are admissible: 1. add(T, d, nparent ), where nparent is parent node, d is data added to the tree operation of adding a node to the tree. Pre ondition: exists(T, nparent ) Realization: T := T ∪ {n(key(nparent ) · next(), d)} (1) Predi ate exists(T, n) determines if the node n existing in the tree T : exists(T, n) := ∃n ∈ T 2. Operation of sear hing a node with the ode k in the tree node(T, k). node(T, k) := {x ∈ T |key(x) = k} (2) 3. Operation of sear hing node parents in the tree parents(T, k). parents(T, k) := {x ∈ T |mod(k, key(x)) = 0 ∧ k 6= key(x)} (3) 4. Operation of sear hing a dire t parent parent(T, k). parent(T, k) := M AX(parents(T, k)) (4) 5. Operation of sear hing a subtree subtree(T, k). subtree(T, k) := {x ∈ T |mod(key(x), k) = 0} (5) 6. children(T, k) operation of sear hing dire t hildren. children(T, k) := {x ∈ T |k = key(parent(T, key(x)))} (6) 7. Operation of subtree removal remove(T, k). 198 4 Realization: T := {x ∈ T |mod(key(x), k) 6= 0} (7) 8. Operation of transferring a subtree rooted nold to a new parent nnew move(T, nold , nnew ). Pre ondition: exists(T, nold ) ∧ exists(T, nnew ) ∧ mod(key(nnew ), (nold )) 6= 0 Realization: T := T ∪ {x ∈ subtree(T, key(nold))| |n(recalcKey(T, key(x), key(nold), key(nnew )), data(x))} (8) where recalcKey(T, k, kold, knew ) is an operation of node ode re al ulation: k recalcKey(T, k, kold, knew ) := ∗ knew kold Considering the possibility of exe uting parallel operations with a tree as a set, the following features should be noted: 1. The possibility of parallel tree pro essing using GPU. Tree operations (2, 3, 4, 5, 6) that do not modify the tree have no need for intera tion with other nodes. All su h operations an be pro essed in parallel or simultaneously. Parallelization of tree manipulation operations up to one thread per node is possible. 2. The order of re ords in tree representation in the memory inuen es only the order of re ords in the result set and does not inuen e the number of re ords in it (the result set). 3. The usage of primes in a tree in the sequential order is optional. 5 GPU Computation of Trees Ea h node represents a pair: { node ode : pointer to node properties }. Let us all this pair a re ord. Sin e the node properties an represent a sequen e of dynami ally-sized free elds, ea h node ode is asso iated with the pointer to properties lo ation in the le rather than the properties themselves. In the le data is stored as a sequen e of re ords. When downloading, re ords are divided as follows: keys are re orded into GRAM, pointers to the properties are re orded into RAM. Downloaded data (keys and pointers) is stored as a sequen e. For su h form of representation it is important that the order of re ords in RAM is stri tly the same as in GRAM. When swapping some nodes in GRAM, it is ne essary to swap nodes in RAM in a similar way. When exe uting a query, say, a lookup query, all the re ords in the tree are reviewed. The query result is omputed on-the-y, omputing the desired result while exe uting the query. Thus the sear h is not narrowed. To save the time of the pro essing result transfer to RAM, the GPU makes a bit map of the query result, in whi h the bit set denotes a node orresponding to the query, the removed denotes a non- orresponding node, Fig. 2). The bit map is formed in kernels (a kernel is a fun tion exe uted on the graphi s ard) of query exe ution; as a result, one map bit orresponds to ea h node. For example, with a 4-byte key the size of data transmitted through the bus (between GPU and CPU) is de reased by 32 times. To get ertain nodes in the resulting sele tion, it is ne essary to nd set bits in the bit map. The ordinal number of a set bit is a number of a storage ell in RAM (and GRAM) in whi h the data address of the found node (the found node ode) is stored. Let us onsider the pro ess of exe uting some tree operations. Ea h operation is exe uted for ea h node in the set. The whole tree pro essing is realized on GPU. The result of omplex query pro essing, for example a ommon parent sear h, is formed as a result of uniting bit maps of ea h query result, thus avoiding long operations of ea h node sear hing. For example commonP arents(A, k, r) operation of sear hing ommon parents for two nodes with the odes k and r looks like this: \ commonP arents(A, k, r) := parents(A, k) parent(A, r). For nding ommon parents, the most rational (with regard to the time of nding a solution) method is exe uting two queries for parent sear h for ea h node and uniting bit maps of the queries results through the bitwise AND 199 5 Figure 2: Bitwise AND s heme operator (Fig. 2). Uniting of bit maps as well as al ulations take pla e in GPU. The GPU exe ution of the bitwise AND operator for a 4-byte whole requires four multipro essor y les without onsidering memory a ess delay (4 y les for the shared memory). 6 The Experiment Results To he k the developed method of representing trees, the DBMS prototype was designed supporting the basi operations of data manipulation: insertion, deletion, transfer, sear h. Sear h and transfer queries are exe uted on GPU, insertion and deletion are exe uted on CPU. The developed DBMS prototype does not a he data. Reading is performed dire tly from the disk. Queries are not a hed either. The result of sear h query is a bit map returned to the lient in the form of a ursor. At the next ursor re ord query, the sear h for the next set bit in the map takes pla e, then reading of the do ument and key orresponding to the bit lo ation in the map. The performan e test of the developed method of representing trees was arried out in omparison with DBMS MongoDB. MongoDB was hosen as the popular and high performan e DBMS. As an index, MongoDB uses B+tree. For hierar hy representation in MongoDB the Materialized Path was hosen as a hierar hy re ord type. 6.1 Test Set A tree-like database is used as a set test. As a DB base, the All-Russian Classier of Addresses (KLADR) is used. [kla℄ Data is represented as a tree with 2.5 million nodes. Ea h node has a dierent number of hildren. The tree is not balan ed. The height of the tree is 5 levels. For running experiments the KLADR referen e was extended. The original referen e guide has unused elds and o upies approximately 400 MB of the disk spa e. Ea h tree node has the elds represented in Table 1. It is ne essary to dis riminate between a geographi al division and a tree node to whi h the geographi al division is atta hed. The elds represented in Table 1 are ommon elds for the both DBMSs in the test. MongoDB uses the Materialized Path (eld ½matPath) to store the hierar hy data. In the eld ½matPath the DBMS prototype stores the node ode whi h is the produ t of parents Ids by the node Id. The data sets are the same for the both DBMSs tested. To he k the performan e of the method of hierar hy representation in omparison to the analogues, a series of tests were ondu ted. 200 6 Table 1: Tree node elds Node name Data type Des ription name har[32℄ node name matPath var har[128℄ node materialized path objName har[16℄ name of obje t asso iated with node objData var har[10000℄ data atta hed to obje t gpsX oat obje t oordinate X in GPS oordinate system gpsY oat obje t oordinate Y in GPS oordinate system level int obje t nesting level in a ordan e with KLADR objType har[8℄ obje t type in a ordan e with KLADR objCode har[16℄ obje t ode in a ordan e with KLADR A Cold Startup, with all the tests being done, was performed for ea h DBMS ½warm-up. After that, all the tests were arried out. Before testing of ea h DBMS the omputer was rebooted. Ea h test was repeated three times. The test results represent the average re eived value. The power supply plan was set to ½peak performan e and also all the power-saving fun tions were turned o. For ea h DBMS the following tests were performed: re ord addition, subtree sele tion, sele tion of all the node parents, tree traversal, node ode sear h. Testing of node addition time was performed as follows: the time of all the nodes addition was taken into onsideration. After that, the result was divided by the total number of nodes. The sele tion of all the node hildren assumes obtaining a ursor for the node hildren, with reading them from the disk. 3500 queries were exe uted during this test. The sele tion of all the parents also assumes obtaining a ursor for the result set, with reading re ords from the disk. Similarly, 3500 queries were exe uted during this test. Tree traversal assumes reading the whole tree. The sele tion of threads with reading assumes exe ution of 10,000 queries. After obtaining the ursor for hildren, all the hildren are read out from the disk. This test allows determination of the real DBMS performan e as MongoDB, when obtaining a ursor, return only the rst resulting re ords. The rest of the re ords are found in the index only when reading all the previously found. The total size of the database tested is 24.4 GB without onsidering the normative do uments. Two dierent experiments were arried out using the DB tested. The rst one in ludes the whole database, the se ond one in ludes only 100000 nodes (the number of queries and data sets are left un hanged). The need for doing several experiments with one data set is aused by the ne essity of determining the degree of impa t of the re ords number in the tree on its pro essing performan e. In addition to the database tested, an experiment was set up in whi h 100000 tree nodes were used as a test set a materialized path was atta hed to ea h node as properties. The performan e of this experiment enables understanding the degree of impa t of the re ord size on the tree representation method performan e. 6.2 Hardware and Software A omputer with Intel R Core i5TM -4590 pro essor running Windows 8.1 operating system was used as a om- puting platform. The pro essor is a 3.30 GHz 64 bit quad- ore, with maximum throughput of 32 GB/se . The ma hine has 8 gigabytes of memory. The graphi s ard used is an NVIDIA GeFor e GTX 760. with 1152 CUDA ores. 4 GB of global memory, and supports a maximum throughput of 192.2 GB/se . CUDA 6.5 driver is used on the omputer. As for the software. MongoDB 2.6 is used. When testing, exe utable DBMS les were used that were downloaded from the o ial site. The developed DBMS was reated using MS VS Studio 2013 and optimization ag O3. 6.3 Test Results Table 2 represents the test results. All the values are average results for a set of queries. The results in Table 2 onrm a possibility of the justied appli ation of the developed method of hierar hy representation in databases. Su h a result is noti ed with large data volumes and a great number of re ords. If a re ord is small, this method performan e be omes inferior to a ommon tree index (B+tree for MongoDB) 201 7 Table 2: Query exe ution average time Query Represented method MongoDB (average time, se /test) Node addition 768,44 14304,02 Subtree sear h 162,01 283,25 Parent sear h 126,08 163,00 Reading the whole tree 490,74 8 436,37 Obtaining arbitrary nodes 159,90 8 880,27 Table 3: Performan e omparison of tests with dierent data sets Database / test Node addition Subtree sear h Parent sear h Node sear h 2,5 million re ords, 10 KB/re ord queries/se MongoDB 174,78 12,36 21,47 1,13 Developed method 3253,34 21,60 27,76 62,54 100 000 re ords, 10 Kb/re ord queries/se MongoDB 162,70 23,33 93,97 5,77 Developed method 3245,34 21,09 30,02 57,22 100 000 re ords, 20 b/re ord queries/se MongoDB 8673,03 6603,77 350000 2554,74 Developed method 62500 30,04 108,97 1988,64 (Table 3). The dieren e is aused by the restri tion to the maximum number of queries per se ond on the part of hardware. The minimum time of onguration and startup of the empty kernel (for the ard tested) in the syn hronous mode is approximately 0.10 mse : onsequently. the graphi s ard annot exe ute more than 1015 thousand kernels per se ond. Figure 3 shows the relation of the query pro essing time. Data transfer Data transfer 0.0004% 1% CPU CPU 15% 20% GPU GPU 15% 9.9996% Disk Disk 70% 69% a) Children sear h b) Parent sear h Figure 3: Time distribution when exe uting queries Reading data from the disk takes most of the time for the both queries. The exe ution of kernels requires a little time. On average, the kernel exe ution takes 4.4 mse that provides the maximum 227 queries per se ond without onsidering CPU operation time (for 10 KB/re ord). When testing DB with a small re ord size (20 byte/re ord), the kernel exe ution time is almost the same as the testing time for large re ords. This results in the method being predisposed to building hierar hies based on nding the remainder, dealing with a large amount of data, the CPU pro essing of whi h is a long-running operation. 202 8 7 Future Improvements 7.1 Threaded Exe ution So far the multithreaded software produ t has not been developed (we mean CPU multithreading, not GPU multithreading). Multithreaded software is an important aspe t of future developments. If a query is pro essed using multithreading, the maximum speedup will be a hieved when exe uting data modi ation queries (re ord addition, transfer). The appli ation of NoSQL-style nonlo king re ord fun tions will enable performing database modi ation operations in the asyn hronous mode, thus improving the performan e. 7.2 Multiple GPU Computing Support Currently one graphi ard performs all al ulations. Addition of several graphi ards will allow in reasing the terminal apa ity of data bank or the query exe ution speed through pro essing hunks on ea h graphi ard when storing the same d it i on dierent graphi ards. When using several graphi ards, the low-speed CPU-GPU bus transfer appears to be a restri tion. Despite the fa t that PCI 3.0 xl6 bus has a theoreti duel-sided ex hange rate 128 GB/se and 8 GT/s (Giga Trans- a tion/s), it is restri ted by the ore memory performan e. In a tual pra ti e the bus reprodu tion speed is approximately 39 GB/se (on tested PC). What is more, many motherboards espe ially in the moderate pri e range have only one PCI slot per 16 data transmission lines whereas the se ond one and the su eeding have x8 and x4 in general. Even though using less lines de reases multi- ard onguration performan e by 515% on average, and using multi- ard ongurations in most ases annot rea h its full potential due to the RAM restri ted speed, the benet from using su h ongurations ex eeds the expenses. 7.3 Hardware Restri tions Currently an important GPU restri tion is la k of thread syn hronization resour es on the whole graphi ard. Thread syn hronization is possible only inside blo ks. Blo k syn hronization on GPU is impossible. User syn hronization methods are based on using ags. Hardware syn hronization of the blo ks is more important as the performan e in reases ompared to the software syn hronization. Another GPU nuan e is SIMT ar hite ture. SIMT ar hite ture allows appli ation of one and the same ommand to dierent data. The drawba k of the ar hite ture is that all the threads go through ea h bran h of the ode when bran hing is used even if the thread does not exe ute any instru tions. In other words, some threads are in operation, the others are idle. It is important to note a small number of registers available for ea h thread. This problem be omes a ute with a large number of threads and a omplex ode(large data type). With 100,000 threads, 16 byte size data type and 512 threads per blo k, ea h blo k uses 60 registers in 63 theoreti ally possible (the data taken from the CUD A proler). This results in a less number of threads being exe uted in parallel. For example, when the number of registers used in a thread dropped to 13, the pro essing speed of the same amount of data in reased by almost 10 times. When running the experiment, the redu tion in the number of threads per blo k did not result in the expe ted performan e improvement. 7.4 Key Compression The tree pro essing method des ribed in the arti le assumes using plural multiplying for dening key values. The keys obtained by primes multipli ation are exposed to a rapid in rease resulting in large-size numbers. With 100 thousand re ords in a tree it is ne essary to use keys whose size is more than 128 bit. The most frequent arithmeti operations performed with keys in a tree are division, multipli ation and taking the division remainder. Hardware support of su h operations is impossible due to CPU and GPU ar hite ture hara teristi s. To perform arithmeti operations with su h keys, it is important to use dierent algorithms that are less e ient than hardware realization of su h operations. Using nonpositional notations will allow the size redu tion of the used numbers by storing a number as a small- size remainder. When performing non-modular operations, the la k of need for onsidering the dependen ies between the remainders opens up an opportunity for the realization of e ient parallelism. This together with the advan ed features of Kepler ar hite ture (bit ontrol within warp), will allow the e ient realization of bit mapping and dening zero remainder for nonpositional notations. 203 9 8 Con lusion The paper represents a method of storing tree-like stru tures in databases oriented towards multithread pro essing with regard to CUDA parallelization. This proje t demonstrates using GPU for hierar hi al data pro essing. The result of the arti le as a follows: a new method of representing trees in databases is developed, and the realization of a hierar hi al database prototype with GPU data pro essing. On average, the implemented software runs by 19 times faster ompared to the materialized path pro essing in DBMS MongoDB on CPU. Despite the query result variations, the minimum speedup was 1.29 times. Referen es [AG08℄ Renzo Angles and Claudio Gutierrez. Survey of graph database models. ACM Comput. Surv., 40(1):1:11:39, 2008. [BELP℄ U. Brandes, M. Eiglsperger, J. Lerner, and C. Pi h. Graph markup language (graphml). [BMCN09℄ Ranieri Baraglia, Via G Moruzzi, Gabriele Capannini, and Fran o Maria Nardini. Sorting using BItoni netwoRk wIth CUDA. LSDS-IR Workshop, (July), 2009. [BS10℄ P. Bakkum and K. Skadron. A elerating sql database operations on a gpu with uda: Extended results. Te hni al report, University of Virginia Department of Computer S ien e, 2010. [Cel℄ Joe Celko. Trees in SQL. [Cel04℄ Joe Celko. Hierar hi al SQL, 2004. [Cel12℄ J. Celko. Joe Celko's Trees and Hierar hies in SQL for Smarties. Joe Celko's Trees and Hierar hies in SQL for Smarties. Elsevier/Morgan Kaufmann, 2012. [Gai99℄ A. T. Gainov. Number Theory. Part I. Resour e book. Novosibirsk State University. Fa ulty of Me hani s and Mathemati s., 1999. [GBE13℄ Mar el Grandpierre, Georg Buss, and Ralf Esser. In-Memory Computing te hnology - The holy grail of analyti s? Deloitte & Tou he GmbH Wirts haftsprüfungsgesells haf, (07), 2013. [HKOO11℄ Sungpa k Hong, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun. A elerating CUDA graph algorithms at maximum warp. Pro eedings of the 16th ACM symposium on Prin iples and pra ti e of parallel programming - PPoPP '11, page 267, 2011. [HN07℄ Pawan Harish and P. J. Narayanan. A elerating large graph algorithms on the gpu using uda. In Pro eedings of the 14th International Conferen e on High Performan e Computing, HiPC'07, pages 197208, Berlin, Heidelberg, 2007. Springer-Verlag. [kla℄ KLADR - the All-Russian Classier of Addresses. [Kol09℄ M.A. Kolosovskiy. Data stru ture for representing a graph?: ombination of linked list and hash table. CoRR, 2009. [Kup86℄ Gabriel Mark Kuper. The Logi al Data Model: A New Approa h to Database Logi . PhD thesis, Stanford, CA, USA, 1986. UMI order no. GAX86-08173. [KV93℄ G. M. Kuper and M. Y. Vardi. The logi al data model. ACM Transa tions on Database Systems, 18(3):379413, 1993. [MS11℄ Anders Møller and Mi hael S hwartzba h. Xml graphs in program analysis. S i. Comput. Program., 76(6):492515, 2011. [MT11℄ A. Malikov and A. Turyev. Nested intervals tree en oding with system of residual lasses. IJCA Spe ial Issue on Ele troni s, Information and Communi ation Engineering, ICEICE(2):1921, 2011. 204 10 [PSHL10℄ Hagen Peters, Ole S hulz-Hildebrandt, and Norbert Luttenberger. Fast in-pla e sorting with uda based on bitoni sort bitoni sort. In Parallel Pro essing and Applied Mathemati s, pages 403410. 2010. [PT11℄ Jonathan Pala ios and Josh Triska. A Comparison of Modern GPU and CPU Ar hite tures: And the Common Convergen e of Both. pages 120, 2011. [Tro03℄ Vadim Tropashko. Trees in SQL: Nested Sets and Materialized Path. 2003. [Tro05℄ Vadim Tropashko. Nested intervals tree en oding in sql. SIGMOD Re ord, 34(2), June 2005. [vor14℄ DBMS Index for Hierar hi al Data Using Nested Intervals and Residue Classes. Young S i. Int. Work. Trends Inf. Pro ess., 2014. [WO10℄ Yangzihao Wang and John Owens. Large-s ale graph pro essing algorithms on the gpu. Te hni al Report January 2011, UC Davis, 2010. 205 11