Benchmarking a B-tree compression method? Filip Křižka, Michal Krátký, and Radim Bača Department of Computer Science, Technical University of Ostrava, Czech Republic {filip.krizka,michal.kratky,radim.baca}@vsb.cz Abstract. The B-tree and its variants have been widely entries, and [7] describes a binary search that copes applied in many data management fields. When a com- with variable length entries. pression of these data structures is considered, we follow Work [9] describes a split technique for data. Rows two objectives. The first objective is a smaller index file, are assigned tag values in the order in which they are the second one is a reduction of the query processing time. added to the table. Note that tag values identify rows In this paper, we apply a compression scheme to fit these in the table, not records in an individual partition or objectives. The utilized compression scheme handles com- pressed nodes in a secondary storage. If a page must be re- in an individual index. Each tag value appears only trieved then this page is decompressed into the tree cache. once in each index. All vertical partitions are stored Since this compression scheme is transparent from the tree in the B-tree with the tag value as the key. The novel operation’s point of view, we can apply various compression aspect is that the storage of the leading key is reduced algorithms to pages of a tree. Obviously, there are compres- to a minimal value. sion algorithms suitable for various data collections, and Unlike these works, in our work we suppose the so, this issue is very important. In our paper, we compare B-tree compression without changes of the B-tree the B-tree and compressed B-tree where the Fast Fibonacci structure. We mainly utilize the fast decompression al- and invariable coding compression methods are applied. gorithm. In the case of the previously depicted papers, Key words: B-tree and its variants, B-tree compression, B-tree compression is possible using a modification of compression scheme, fast decompression algorithm the B-tree structure. In work [7], B-tree is presented by B∗ -index and B∗ -file. The keys stored in the B∗ -index are only used to searching and determining in which 1 Introduction subtree of a given branch node a key and its associ- ated record will be found. The B∗ -index itself is a con- The B-tree represents an efficient structure for the ventional B-tree including prefixes of the keys in the finding of an ordered set [6]. The B-tree has been often B∗ -file. This prefix B-tree combines some of the advan- used as the backbone data structure for the physical tages of B-trees, digital search trees, and key compres- implementation of RDBMS or file systems. Its most sion without sacrificing the basic simplicity of B-trees important characteristic is that keys in a node have and the associated algorithms and without inheriting very small differences to each others. We utilize this some of the disadvantages of digital search trees and feature in the B-tree compression. In this case, nodes key compression techniques. Work [9] describes an ef- are compressed in the secondary storage and they are ficient columnar storage in B-trees. Column-oriented decompressed during their reading into the cache. Due storage formats have been proposed for query process- to the fact that the random access in the secondary ing in relational data warehouses, specifically for fast storage is a rather expensive operation, we save time scans over non-indexed columns. This data compres- when reading the nodes. sion method reuses traditional on-disk B-tree struc- tures with only minor changes yet achieves storage In work [11], authors summarize some methods density and scan performance comparable to special- for organizing of B-trees. A prefix B-tree, introduced ized columnar designs. In work [1], B-tree compression in [7], provides the head and tail compression. In the is used for minimizing the amount of space used by case of the head compression, one chooses a common certain types of B-tree indexes. When a B-tree is com- prefix for all keys that the page can store, not just the pressed, duplicate occurrences of the indexed column current keys. Tail compression selects a short index values are eliminated. It is compressed by clustering term for the nodes above the data pages. This index the same keys and their unindexed attributes. needs merely to separate the keys of one data node from those of its sibling and is chosen during a node This paper is organized as follows. In Section 2, we split. Tail compression produces variable length index briefly summarize basic knowledge about the B-tree. Section 3 shows a compression scheme used [3]. Sec- ? Work is partially supported by Grants of GACR tion 4 describes two compression methods. Section 5 No. 201/09/0990 and IGA, FEECS, Technical Univer- shows results of the compression methods. The com- sity of Ostrava, No. BI 4569951, Czech Republic. pressed B-tree is compared with a proper B-tree. In 38 Filip Křižka et al. Section 6, we summarize the paper content and out- must be either 2 · L or 2 · L − 1; thus each internal node line possibilities of our future work. is at least half full. This relationship between U and L implies that two half-full nodes can be joined to make a legal node, and one full node can be split into two 2 B-tree and its variants legal nodes (if there is an empty space to push one element up into the parent). These properties make it The B-tree is a tree structure published by Rudolf possible to delete and insert new values into a B-tree Bayer and Edward M. McCreight in 1972 [6]. The and adjust the tree to preserve the B-tree properties. B-tree keeps data sorted and allows searches, inser- Leaf nodes have the same restriction on the num- tions, and deletions in logarithmic amortized time. It is ber of elements, but have no children, and no child optimized for systems that read and write large blocks pointers. The root node still has the upper limit on of data. A B-tree is kept balanced by requiring that all the number of children, but has no lower limit. For leaf nodes are at the same depth. This depth will in- example, when there are fewer than L-1 elements in crease slowly as elements are added to the tree, but an the entire tree, the root will be the only node in the increase in the overall depth is infrequent, and results tree, and it will have no children at all. in all leaf nodes being one more node further away A B-tree of depth n+1 can hold about U times as from the root. many items as a B-tree of depth n, but the cost of B-trees have substantial advantages over alterna- search, insert, and delete operations grows with the tive implementations when node access times far ex- depth of the tree. As with any balanced tree, the cost ceed access times within nodes. This usually occurs increases much more slowly than the number of ele- when most nodes are in secondary storage such as ments. hard drives. By maximizing the number of child nodes Some balanced trees store values only at the leaf within each internal node, the height of the tree de- nodes, and so have different kinds of nodes for leaf creases, balancing occurs less often, and efficiency in- nodes and internal nodes. B-trees keep values in every creases. Usually this value is set such that each node node in the tree, and may use the same structure for all takes up a full disk block or an analogous size in sec- nodes. However, since leaf nodes never have children, ondary storage. a specialized structure for leaf nodes in B-trees will A B-tree of order m (the maximum number of chil- improve performance. The best case height of a B-tree dren for each node) is a tree which satisfies the follow- is: logM n. The worst case height of a B-tree is: log M n. 2 ing properties: Where M is the maximum number of children a node can have. – Every node has at most m children. – Every node (except root and leaves) has at least There exists many variants of the B-tree: m B∗ -tree [13], B∗∗ -tree [15], B+ -tree [17]. In the case 2 children. of the B+ -tree, data is only stored in leaf nodes and – The root has at least two children if it is not a leaf inner nodes include keys. Leaf nodes hold links to the node. previous and next nodes. Moreover, many paged data – All leave nodes are in the same level. structures like UB-tree [5, 12], BUB-tree [8], and – All inner nodes with k children contain k–1 links R-tree [10] are based on the B-tree. to children. Each internal node’s elements act as separation values which divide its subtrees. For example, if an internal 3 A compression scheme for tree-like node has three child nodes (or subtrees) then it must data structures have two separation values or elements a1 and a2 . All values in the leftmost subtree will be less than a1 , all In this section, we describe a basic scheme which can values in the middle subtree will be between a1 and a2 , be utilized for most paged tree data structures [3]. and all values in the rightmost subtree will be greater Pages are stored in a secondary storage and retrieved than a2 . when the tree requires a page. This basic strategy is Internal nodes in a B-tree – nodes which are not widely used by many indexing data structures such as leaf nodes – are usually represented as an ordered set B-trees, R-trees, and many others. They utilize cache of elements and child pointers. Every internal node for fast access to pages as well, since access to the contains a maximum of U children and – other than secondary storage can be more than 20 times slower the root – a minimum of L children. For all internal compared to access to the main memory. We try to de- nodes other than the root, the number of elements is crease the amount of disc access cost (DAC) to a sec- one less than the number of child pointers; the number ondary storage while significantly decreasing the size of elements is between L − 1 and U − 1. The number U of a tree file in the secondary storage. Benchmarking a B-tree compression method 39 Let us consider a common cache schema of per- sistent data structures in Figure 1(a). When a tree Cu Cc = , requires a node, the node is read from the main mem- CRA ory cache. If the node is not in the cache, the node where CRA is the assumed maximum compression ra- page is retrieved from the secondary storage. tio, Cu is the capacity of the uncompressed page. For An important issue of the compression schema is example, the size of the page is 2,048 B, Cu = 100, that tree pages are only compressed in the secondary CRA = 1/5, then Cc = 500. The size of the page for storage. In Figure 1(b), we can observe the basic idea the capacity is 10,240 B. This means that all pages in of the scheme. If a tree data structure wants to retrieve the tree’s cache have Cc = 500, although their S size in a page, the compressed page is transfered from the the secondary storage is less than or equal to 2,048 B. secondary storage to the tree’s cache and it is decom- Let us note that pressed there. Function TreeNode:: Decompress() performs the decompression. Afterwards, the decom- compressed size CR = . pressed page is stored in the cache. Therefore, the original size tree’s algorithms only work with decompressed pages. The TreeNode::Compress() function is called when Obviously, the tree is preserved as a dynamic data the page must be stored in the secondary storage. structure and in our experiments we show the page Every page in the tree has its minimal page utiliza- decompression does not significantly affect query per- tion Cc /2. Let Sl denote the byte size of a compressed formance because we save time with the lower DAC. page. After deleting one item in the page, the byte size of the page is denoted by Sc . Without loss of general- Data structure's ity, we can assume that Sc ≤ Sl . If items are deleted Data structure cache from the page, we must check whether capacity is less in main memory Secondary Node Page storage than or equal to Cc /2. If so, the page is stretched into other pages according to the tree deleting algorithm. Query algorithms are not affected at all because page decompression is processed only between cache and secondary storage and the tree can utilize decom- (a) pressed pages for searching without knowing that they Data structure's cache have been previously compressed. Data structure Compreseed Secondary in main memory page This basic idea of the compression scheme can be Node storage applied to any paged tree data structure. It is not de- pendent upon an applied split algorithm, nor on the key type stored in the tree’s pages. We test this scheme on B+ -tree data structure because this data structure (b) has remained very popular in recent years and it is suitable for further page compressing. Fig. 1. (a) Transfer of tree’s pages between the secondary storage and tree’s cache. (b) Transfer of compressed pages between the secondary storage and tree’s cache. 4 B-tree compression methods In this article, we have applied two compression meth- ods: Fast Fibonacci (FF) and Invariable Coding (IC). 3.1 How the compression scheme affects tree Since keys in a node are close to each another, we use algorithms the well-known difference compression method [14]. When the compression scheme is taken into considera- Similar algorithms were used in the case of the R-tree tion, the tree insert algorithm only needs to be slightly compression [3, 16]. modified. When we insert or modify a record in a page, we have to perform the function TreeNode::Compress 4.1 Fast Fibonacci compression Test() which tests whether the node fits into the page. If not, the node needs to be split. Also, during the In this method, we apply the Fibonacci coding [2] split, we have to make sure that the final nodes fit which uses the Fibonacci numbers; 1, 2, 3, 5, 8, 13, . . . . into the page. This means that the maximum capac- A value is coded as the sum of the Fibonacci numbers ity of a page can vary depending on the redundancy of that are represented by the 1-bit in a binary buffer. the data. The maximum capacity of each tree’s page Special 1-bit is added as the lowest bit in this binary must be determined by a heuristic: buffer after the coding is finished. For example, the 40 Filip Křižka et al. Algorithm 1: Fast Fibonacci Compression Algo- – Unindexed attribute values are compressed by Fi- rithm bonacci coding. (Lines 5-6) function : CompressionLeafNode(node) – Parent, previous, and next nodes links are not 1 Write item1 compressed. (Line 8) 2 for i in 2 .. node.count do 3 num ← FibonacciCodder(node.key(i)-node.key(i-1)) 4.2 Invariable coding compression 4 Write num 5 num ← FibonacciCodder(node.nonatrr(i)) 6 Write num As in the previous method, we work with the difference 7 end of each neighboring value. Since, we use invariable cod- 8 Write links ing, we must first compute the difference of the last, maximal value, and the first value. The result of this function : CompressionNode(node) computation is the number of bits necessary for a stor- 9 Write item1 10 for i in 2 .. node.count do age of the maximal value and, obviously, each value of 11 num ← FibonacciCodder(node.key(i)-node.key(i-1)) 12 Write num Algorithm 2: Invariable Coding Compression 13 end Method 14 Write links function : CompressionLeafNode(node) function : Compression(node) 1 Write maxBits(node.key(1),node.key(n)) 15 if node.isLeaf is leaf then 2 Write node.key(1) 16 CompressionNode(node) 3 for i in 2 .. node.count do 17 end 4 num ← ICCodder(node.key(i)-node.key(i-1)) 18 else 5 Write num 19 CompressionLeafNode(node) 6 end 20 end 7 Write maxBits(node.nonatrr(1),node.nonatrr(n)) 8 Write node.nonatrr(min) function : CompressionTest(node) 9 for i in 2 .. node.count do 21 tmp ← Compression(node) 10 num ← 22 if tmp.size > page.size then ICCodder(node.nonatrr(i)-node.nonatrr(min)) 23 return false 11 Write num 24 end 12 end 25 else 13 Write links 26 return true 27 end function : CompressionNode(node) 14 Write maxBits(node.key(1),node.key(n)) 15 Write node.key(1) 16 for i in 2 .. node.count do value 20 is coded as follows: 0101011 (13 + 5 + 2). 17 num ← ICCodder(node.key(i)-node.key(i-1)) Due to the fact that we need the compression algo- 18 Write num rithm as fast as possible, we use the Fast Fibonacci 19 end decompression introduced in [4]. 20 Write links Algorithm of the Fast Fibonacci compression is function : Compression(node) shown in Algorithm 1. We explain the algorithm in 21 if node.isLeaf is leaf then the following paragraphs. 22 CompressionNode(node) Compression of inner nodes: 23 end 24 else – Keys are compressed by the Fibonacci coding. The 25 CompressionLeafNode(node) first value, obviously minimal, is stored. We com- 26 end pute the difference of each neighboring value and function : CompressionTest(node) the difference is coded by Fibonacci coding. (see 27 tmp ← Compression(node) Lines 10-13) 28 if tmp.size > page.size then – Child and parent links are not compressed. 29 return false (Line 14) 30 end 31 else Compression of leaf nodes: 32 return true – Keys are compressed in the same way as the keys 33 end in an inner node. (Lines 3-4) Benchmarking a B-tree compression method 41 RND 8 RND 16 B+ -tree FF IC B+ -tree FF IC Height 2 2 2 2 2 2 Domain 8b 16b DAC Read 20,858,473 12,115,647 10,010,790 5,996,536 5,872,078 5,739,912 Creating time [s] 15,288 65,618 44,201 6,360 11,897 10,605 Cache Time [s] 7,357 5,398 1,565 6,020 1,495 1,181 Compress time [s] 0 867 652 0 883 636 Decompress time [s] 0 53,524 35,908 0 9,276 8,567 # Inner nodes 35 17 9 33 17 9 # Leaf nodes 7,746 3,350 2,431 5,695 2,596 2,114 # Avg. node items 222.3 198 271 172.58 153.65 235.8 # Avg. leaf node items 129.1 298.5 411.4 176.58 385.21 473 Index size [kB] 15,564 6,736 4,882 11,394 5,228 4,248 Compression ratio 1 0.56 0.69 1 0.54 0.63 Table 1. Building B+ -tree index, result for RND 8 and RND 16. RND 24 RND 32 B+ tree FF IC B+ tree FF IC Height 2 2 2 2 2 2 Max item value 24b 32b DAC Read [all] 5,996,536 5,907,459 5,830,822 5,996,536 5,931,627 5,889,079 Creating time [s] 7,377 12,098 12,435 7,629 13,686 13,154 Cache Time [s] 7,001 1,556 1,690 7,203 2,935 2,267 Compress time [s] 0 882 664 0 885 597 Decompress time [s] 0 9,419 9,828 0 9,595 10,003 # Inner nodes 33 17 17 33 26 17 # Leaf nodes 5,663 2,717 3,099 5,663 2,800 3,756 # Avg node items 172.58 160.76 183.24 172.58 108.65 221.88 # Avg leaf node items 176.58 368.05 322.68 176.58 357.14 266.24 Tree size [kB] 11,394 5,470 6,234 11,394 5,654 5,272 Compression ratio 1 0.52 0.45 1 0.50 0.54 Table 2. Building B+ -tree index, results for RND 24 and RND 32. RND 8 RND 16 RND 24 RND 32 B+ tree FF IC B+ tree FF IC B+ tree FF IC B+ tree FF IC Query time [s] 182.4 37.6 33.5 38.2 34.2 31.7 38.1 34.2 35.2 38.2 45.3 37.2 Decompress time [s] 0 6.8 5.7 0 6.7 5.0 0 6.6 6.1 0 6.8 6.8 Cache Time [s] 149.1 3.8 1.8 5.0 2.0 1.3 5.6 2.4 2.5 7.2 12.7 3.0 DAC Read 64,813 28,123 20,516 47,068 21,897 17,610 47,068 22,469 25,700 47,068 23,200 31,296 Table 3. B-tree querying results for RND 8, RND 16, RND 24 and RND 32. 42 Filip Křižka et al. 1 200 0,9 180 0,8 160 0,7 140 0,6 120 Compress B+tree Query B+tree 0,5 100 ratio FF Compress processing FF Compress 0,4 IC Compress time [s] 80 IC Compress 0,3 60 0,2 40 0,1 20 0 0 RND_8 RND_16 RND_24 RND_32 RND_8 RND_16 RND_24 RND_32 (a) (b) 70000 60000 50000 40000 B+tree DAC FF Compress 30000 IC Compress 20000 10000 0 RND_8 RND_16 RND_24 RND_32 (c) Fig. 2. Experiment results: (a) Compression ratio (b) Query processing time (c) DAC. the node. For example, if the difference of the maximal 5 Experimental results and minimal value is 20, all values are stored in 5bits. Algorithm of the IC compression is shown in Al- In our experiments1 , we test previously described com- gorithm 1. We explain the algorithm in the following pression methods. These approaches are implemented paragraphs. in C++. We use four synthetic collections which differ Compression of inner nodes: in values included. Collection RND 8 includes values in h0; 255i, RND 16 includes values in h0; 65, 535i. In – Keys are compressed by the above proposed this way, we create collections RND 24 and RND 32 method. We store the first value and the num- as well. Each collection contains 1,000,000 items. ber of bits necessary for a storage of the maxi- For each collection, we test index building and mal value. After, all difference values are stored. querying by processing time and DAC. In all tests, the (Lines 14-19) page size is 2,048B and cache size is 1,024 nodes. The – Child and parent links are not compressed. cache of the OS was turned off. (Line 20) Results of index building are depicted in Table 1 Compression of leaf nodes: and 2. We see that the compression ratio decreases for – Keys are compressed in the same way as the keys increasing size of domains. FF compression is more in an inner node. (Lines 1-6) efficient for lower values; on the other hand, the IC – Unindexed attribute values are similarly com- compression is more efficient for higher values. Obvi- pressed as keys, however the maximal value is not ously, due to features of the compressed scheme used, the last value – it must be found by a sequence we obtain the high compress time. Consequently, the scan. 1 The experiments were executed on an AMD Opteron 865 – Parent, previous, and next nodes links are not 1.8Ghz, 2.0 MB L2 cache; 2GB of DDR333; Windows compressed. (Line 13) 2003 Server. Benchmarking a B-tree compression method 43 time of creating of B+ -tree with the FF compression 7. R. Bayer and K. Unterauer: Prefix B-trees. ACM is 1.6 − 4.3× higher then the time of creating for the Trans. on Database Systems, 2, 1, 1977, 11–26. B+ -tree. In the case of the IC compression, the creat- 8. R. Fenk: The BUB-tree. In Proceedings of 28rd VLDB ing time is 1.7 − 2.9× higher. The compression ratio International Conference on Very Large Data Bases is shown is Figure 4.2(a) as well. (VLDB’02), Hongkong, China, Morgan Kaufmann, 2002. In our experiments, we test 50 random queries and 9. G. Goetz: Efficient columnar storage in B-trees. In the results are then averaged. The results are shown Proceedings of SIGMOD Conference, 2007. in Table 3. The number of DAC is 2.1 − 3.5× lower 10. A. Guttman: R-Trees: a dynamic index structure for for the FF compression when compared to the B+ - spatial searching. In Proceedings of ACM International tree and 1.5 − 3.6× for the IC compression. This re- Conference on Management of Data (SIGMOD 1984), sult influences the query processing time. The query ACM Press, June 1984, 47–57. processing times is 0.84 − 4.85× more efficient in the 11. D. Lomet: The evolution of effective B-tree page orga- case of the FF compression when compared to the B+ - nization and techniques: a personal account. In Pro- tree and the time is 1.03 − 5.4× more efficient for the ceedings of SIGMOD Conference, Sep. 2001. IC compression. Obviously, if the compression ratio is 12. V. Markl: Mistral: Processing relational queries over a threshold then the B+ -tree overcomes the com- using a multidimensional access technique. Ph.D. thesis, Technical University München, Germany, 1999, pressed indices. In Figure 4.2(b) and (c), we see the http://mistral.in.tum.de/results/publications/ query processing time and DAC. Mar99.pdf. 13. Y. Sagiv: Concurrent operations on B*-trees with over- taking. In Journal of Computer and System Sciences, 6 Conclusion 1986. 14. D. Salomon: Data Compression The Complete Refer- In this article, we propose two methods for B-tree com- ence. Third Edition, Springer–Verlag, New York, 2004. pression. If the compression ratio is below a threshold 15. A.A. Toptsis: B**-tree: a data organization method for then the query processing performance of the com- high storage utilization. In Computing and Informa- pressed index overcomes the B-tree. However, there tion, 1993. are still some open issues. The first one is the high 16. J. Walder, M. Krátký, and R. Bača: Benchmarking coding algorithms for the R-tree compression. In Pro- creating time. In this case, we must develop a more ceedings of DATESO 2009, Czech Republic, 2009. efficient method or we must use and test the bulkload- 17. N. Wirth: Algorithms and Data Structures. Prentice ing (see [3, 16]). Additionally, we must test our method Hall, 1984. for a real data collection. Finally, we should test differ- ent compression and coding methods (i.e. Elias-delta code, Elias-gamma code, Golomb code [14]). References 1. C. Antognini: Troubleshooting Oracle Performance. Apress, 2008. 2. A. Apostolico and A. Fraenkel: Robust transmission of unbounded strings using Fibonacci representations. IEEE Trans. Inform., 33, 2, 1987, 238–245. 3. R. Bača, M. Krátký, and V. Snášel: A compression scheme for the R-tree data structure. In Submitted in Information Systems, 2009. 4. R. Bača, V. Snášel, J. Platoš, M. Krátký, and E. El-Qawasmeh: The fast Fibonacci de- compression algorithm. In arXiv:0712.0811v2, http://arxiv.org/abs/0712.0811, 2007. 5. R. Bayer: The universal B-tree for multidimensional indexing: general concepts. In Proceedings of World- Wide Computing and Its Applications (WWCA 1997), Tsukuba, Japan, Lecture Notes in Computer Science, Springer–Verlag, 1997, 198–209. 6. R. Bayer and E. M. McCreight: Organization and maintenance of large ordered indices. Acta Informat- ica, 1972, 173–189.