-

Benchmarking a B-tree compression method?

Filip Kˇriˇzka

filip.krizka@vsb.cz 0

Michal Kr´atky´

michal.kratky@vsb.cz 0

Radim Baˇca

radim.baca@vsb.cz 0 0 Department of Computer Science, Technical University of Ostrava , Czech Republic

37 43

The B-tree and its variants have been widely entries, and [7] describes a binary search that copes applied in many data management fields. When a com- with variable length entries. pression of these data structures is considered, we follow Work [9] describes a split technique for data. Rows two objectives. The first objective is a smaller index file, are assigned tag values in the order in which they are the second one is a reduction of the query processing time. added to the table. Note that tag values identify rows In this paper, we apply a compression scheme to fit these in the table, not records in an individual partition or objectives. The utilized compression scheme handles compressed nodes in a secondary storage. If a page must be re- in an individual index. Each tag value appears only trieved then this page is decompressed into the tree cache. once in each index. All vertical partitions are stored Since this compression scheme is transparent from the tree in the B-tree with the tag value as the key. The novel operation's point of view, we can apply various compression aspect is that the storage of the leading key is reduced algorithms to pages of a tree. Obviously, there are compres- to a minimal value. sion algorithms suitable for various data collections, and Unlike these works, in our work we suppose the so, this issue is very important. In our paper, we compare B-tree compression without changes of the B-tree the B-tree and compressed B-tree where the Fast Fibonacci structure. We mainly utilize the fast decompression aland invariable coding compression methods are applied. gorithm. In the case of the previously depicted papers,

Section 6, we summarize the paper content and out- must be either 2 · L or 2 · L − 1; thus each internal node line possibilities of our future work. is at least half full. This relationship between U and L implies that two half-full nodes can be joined to make a legal node, and one full node can be split into two 2 B-tree and its variants legal nodes (if there is an empty space to push one element up into the parent). These properties make it The B-tree is a tree structure published by Rudolf possible to delete and insert new values into a B-tree Bayer and Edward M. McCreight in 1972 [ 6 ]. The and adjust the tree to preserve the B-tree properties. B-tree keeps data sorted and allows searches, inser- Leaf nodes have the same restriction on the numtions, and deletions in logarithmic amortized time. It is ber of elements, but have no children, and no child optimized for systems that read and write large blocks pointers. The root node still has the upper limit on of data. A B-tree is kept balanced by requiring that all the number of children, but has no lower limit. For leaf nodes are at the same depth. This depth will in- example, when there are fewer than L-1 elements in crease slowly as elements are added to the tree, but an the entire tree, the root will be the only node in the increase in the overall depth is infrequent, and results tree, and it will have no children at all. in all leaf nodes being one more node further away A B-tree of depth n+1 can hold about U times as from the root. many items as a B-tree of depth n, but the cost of

B-trees have substantial advantages over alterna- search, insert, and delete operations grows with the tive implementations when node access times far ex- depth of the tree. As with any balanced tree, the cost ceed access times within nodes. This usually occurs increases much more slowly than the number of elewhen most nodes are in secondary storage such as ments. hard drives. By maximizing the number of child nodes Some balanced trees store values only at the leaf within each internal node, the height of the tree de- nodes, and so have different kinds of nodes for leaf creases, balancing occurs less often, and efficiency in- nodes and internal nodes. B-trees keep values in every creases. Usually this value is set such that each node node in the tree, and may use the same structure for all takes up a full disk block or an analogous size in sec- nodes. However, since leaf nodes never have children, ondary storage. a specialized structure for leaf nodes in B-trees will

A B-tree of order m (the maximum number of chil- improve performance. The best case height of a B-tree dren for each node) is a tree which satisfies the follow- is: logM n. The worst case height of a B-tree is: log M n. ing properties: Where M is the maximum number of children a n2ode can have.

There exists many variants of the B-tree: B∗-tree [13], B∗∗-tree [15], B+-tree [17]. In the case of the B+-tree, data is only stored in leaf nodes and inner nodes include keys. Leaf nodes hold links to the previous and next nodes. Moreover, many paged data structures like UB-tree [ 5, 12 ], BUB-tree [8], and

R-tree [10] are based on the B-tree. – Every node has at most m children. – Every node (except root and leaves) has at least

m2 children. – The root has at least two children if it is not a leaf

node. – All leave nodes are in the same level. – All inner nodes with k children contain k–1 links

to children.

Each internal node’s elements act as separation values which divide its subtrees. For example, if an internal 3 A compression scheme for tree-like node has three child nodes (or subtrees) then it must data structures have two separation values or elements a1 and a2. All values in the leftmost subtree will be less than a1, all In this section, we describe a basic scheme which can values in the middle subtree will be between a1 and a2, be utilized for most paged tree data structures [ 3 ]. and all values in the rightmost subtree will be greater Pages are stored in a secondary storage and retrieved than a2. when the tree requires a page. This basic strategy is

Internal nodes in a B-tree – nodes which are not widely used by many indexing data structures such as leaf nodes – are usually represented as an ordered set B-trees, R-trees, and many others. They utilize cache of elements and child pointers. Every internal node for fast access to pages as well, since access to the contains a maximum of U children and – other than secondary storage can be more than 20 times slower the root – a minimum of L children. For all internal compared to access to the main memory. We try to denodes other than the root, the number of elements is crease the amount of disc access cost (DAC) to a secone less than the number of child pointers; the number ondary storage while significantly decreasing the size of elements is between L − 1 and U − 1. The number U of a tree file in the secondary storage.

Let us consider a common cache schema of persistent data structures in Figure 1(a). When a tree Cc = Cu , requires a node, the node is read from the main mem- CRA ory cache. If the node is not in the cache, the node where CRA is the assumed maximum compression rapage is retrieved from the secondary storage. tio, Cu is the capacity of the uncompressed page. For

An important issue of the compression schema is example, the size of the page is 2,048 B, Cu = 100, that tree pages are only compressed in the secondary CRA = 1/5, then Cc = 500. The size of the page for storage. In Figure 1(b), we can observe the basic idea the capacity is 10,240 B. This means that all pages in of the scheme. If a tree data structure wants to retrieve the tree’s cache have Cc = 500, although their S size in a page, the compressed page is transfered from the the secondary storage is less than or equal to 2,048 B. secondary storage to the tree’s cache and it is decom- Let us note that pressed there. Function TreeNode:: Decompress() performs the decompression. Afterwards, the decom- CR = compressed size . pressed page is stored in the cache. Therefore, the original size tree’s algorithms only work with decompressed pages. The TreeNode::Compress() function is called when Obviously, the tree is preserved as a dynamic data the page must be stored in the secondary storage. structure and in our experiments we show the page Every page in the tree has its minimal page utilizadecompression does not significantly affect query per- tion Cc/2. Let Sl denote the byte size of a compressed formance because we save time with the lower DAC. page. After deleting one item in the page, the byte size of the page is denoted by Sc. Without loss of general

Data structure's ity, we can assume that Sc ≤ Sl. If items are deleted Data structure Node in macinacmheemory Page Sesctoornadgaery ftrhoamn othreeqpuaagle,towCecm/u2.stIfchsoe,ckthwehpeatgheeriscsatpraectcithyedis ilnetsos other pages according to the tree deleting algorithm.

Query algorithms are not affected at all because page decompression is processed only between cache and secondary storage and the tree can utilize decom(a) pressed pages for searching without knowing that they Data structure Node iDnamtaacsinatrcmuhceetumroer'sy Comppargeeseed Sesctoornadgaery ahpavpTeliehbdiesetnboapasnriceyvipidoaeugaselodyf tctrohemee pcdroaemtsaspesrdte.rsusicotnursec.hIetmise ncoatndbeependent upon an applied split algorithm, nor on the key type stored in the tree’s pages. We test this scheme on B+-tree data structure because this data structure (b) has remained very popular in recent years and it is suitable for further page compressing. In this article, we have applied two compression methods: Fast Fibonacci (FF) and Invariable Coding (IC). 3.1 How the compression scheme affects tree Since keys in a node are close to each another, we use algorithms the well-known difference compression method [14]. When the compression scheme is taken into considera- Similar algorithms were used in the case of the R-tree tion, the tree insert algorithm only needs to be slightly compression [ 3, 16 ]. modified. When we insert or modify a record in a page, we have to perform the function TreeNode::Compress 4.1 Fast Fibonacci compression Test() which tests whether the node fits into the page.

If not, the node needs to be split. Also, during the In this method, we apply the Fibonacci coding [ 2 ] split, we have to make sure that the final nodes fit which uses the Fibonacci numbers; 1, 2, 3, 5, 8, 13, . . . . into the page. This means that the maximum capac- A value is coded as the sum of the Fibonacci numbers ity of a page can vary depending on the redundancy of that are represented by the 1-bit in a binary buffer. the data. The maximum capacity of each tree’s page Special 1-bit is added as the lowest bit in this binary must be determined by a heuristic: buffer after the coding is finished. For example, the 4 5 6 7 end 8 Write links

Algorithm 1: Fast Fibonacci Compression Algo

rithm

function : CompressionLeafNode(node) 1 Write item1 2 for i in 2 .. node.count do 3 num ←

FibonacciCodder(node.key(i)-node.key(i-1)) Write num num ← FibonacciCodder(node.nonatrr(i))

Write num 12 13 end 14 Write links

function : CompressionNode(node) 9 Write item1 10 for i in 2 .. node.count do 11 num ←

FibonacciCodder(node.key(i)-node.key(i-1))

Write num function : Compression(node) 15 if node.isLeaf is leaf then 16 CompressionNode(node) 17 end 18 else 19 CompressionLeafNode(node) 20 end

function : CompressionTest(node) 21 tmp ← Compression(node) 22 if tmp.size > page.size then 23 return false 24 end 25 else 26 return true 27 end value 20 is coded as follows: 0101011 (13 + 5 + 2). Due to the fact that we need the compression algorithm as fast as possible, we use the Fast Fibonacci decompression introduced in [ 4 ].

Algorithm of the Fast Fibonacci compression is shown in Algorithm 1. We explain the algorithm in the following paragraphs.

Compression of inner nodes: – Keys are compressed by the Fibonacci coding. The first value, obviously minimal, is stored. We compute the difference of each neighboring value and the difference is coded by Fibonacci coding. (see Lines 10-13) – Child and parent links are not compressed.

(Line 14) Compression of leaf nodes: – Keys are compressed in the same way as the keys in an inner node. (Lines 3-4) – Unindexed attribute values are compressed by Fibonacci coding. (Lines 5-6) – Parent, previous, and next nodes links are not compressed. (Line 8) 4.2

Invariable coding compression As in the previous method, we work with the difference of each neighboring value. Since, we use invariable coding, we must first compute the difference of the last, maximal value, and the first value. The result of this computation is the number of bits necessary for a storage of the maximal value and, obviously, each value of

2: Invariable Coding Compression

Algorithm Method

function : CompressionLeafNode(node) 1 Write maxBits(node.key(1),node.key(n)) 2 Write node.key(1) 3 for i in 2 .. node.count do 4 num ← ICCodder(node.key(i)-node.key(i-1)) 5 Write num 6 end 7 Write maxBits(node.nonatrr(1),node.nonatrr(n)) 8 Write node.nonatrr(min) 9 for i in 2 .. node.count do 10 num ←

ICCodder(node.nonatrr(i)-node.nonatrr(min))

Write num 11 12 end 13 Write links

function : CompressionNode(node) 14 Write maxBits(node.key(1),node.key(n)) 15 Write node.key(1) 16 for i in 2 .. node.count do 17 num ← ICCodder(node.key(i)-node.key(i-1)) 18 Write num 19 end 20 Write links

function : Compression(node) 21 if node.isLeaf is leaf then 22 CompressionNode(node) 23 end 24 else 25 CompressionLeafNode(node) 26 end

function : CompressionTest(node) 27 tmp ← Compression(node) 28 if tmp.size > page.size then 29 return false 30 end 31 else 32 return true 33 end Height Domain DAC Read Creating time [s] Cache Time [s] Compress time [s] Decompress time [s] # Inner nodes # Leaf nodes # Avg. node items # Avg. leaf node items Index size [kB] Compression ratio

RND 8 RND 16 B+-tree FF IC B+-tree FF IC 2 2 2 2 2 2

8b 16b 20,858,473 12,115,647 10,010,790 5,996,536 5,872,078 5,739,912 15,288 65,618 44,201 6,360 11,897 10,605 7,357 5,398 1,565 6,020 1,495 1,181 0 867 652 0 883 636 0 53,524 35,908 0 9,276 8,567 35 17 9 33 17 9 7,746 3,350 2,431 5,695 2,596 2,114 222.3 198 271 172.58 153.65 235.8 129.1 298.5 411.4 176.58 385.21 473 15,564 6,736 4,882 11,394 5,228 4,248

1 0.56 0.69 1 0.54 0.63

1 0,9 0,8 0,7 0,6 Compress 0,5 ratio 0,4 0,3 0,2 0,1 0 200 180 160 140 prtoiQmcueeesr[ssyi]ng 1186200000 40 20 0 B+tree FF Compress

IC Compress B+tree FF Compress IC Compress

B+tree FF Compress IC Compress RND_8 RND_16 RND_24 RND_32

RND_8 RND_16 RND_24 RND_32 (a)

(b) DAC – Keys are compressed in the same way as the keys

in an inner node. (Lines 1-6) – Unindexed attribute values are similarly compressed as keys, however the maximal value is not the last value – it must be found by a sequence scan. – Parent, previous, and next nodes links are not

compressed. (Line 13) the node. For example, if the difference of the maximal 5 Experimental results and minimal value is 20, all values are stored in 5bits.

Algorithm of the IC compression is shown in Al- In our experiments1, we test previously described comgorithm 1. We explain the algorithm in the following pression methods. These approaches are implemented paragraphs. in C++. We use four synthetic collections which differ Compression of inner nodes: in values included. Collection RND 8 includes values in h0; 255i, RND 16 includes values in h0; 65, 535i. In – Keys are compressed by the above proposed this way, we create collections RND 24 and RND 32 method. We store the first value and the num- as well. Each collection contains 1,000,000 items. ber of bits necessary for a storage of the maxi- For each collection, we test index building and mal value. After, all difference values are stored. querying by processing time and DAC. In all tests, the (Lines 14-19) page size is 2,048B and cache size is 1,024 nodes. The – Child and parent links are not compressed. cache of the OS was turned off.

(Line 20) Results of index building are depicted in Table 1 Compression of leaf nodes: and 2. We see that the compression ratio decreases for increasing size of domains. FF compression is more efficient for lower values; on the other hand, the IC compression is more efficient for higher values. Obviously, due to features of the compressed scheme used, we obtain the high compress time. Consequently, the 1 The experiments were executed on an AMD Opteron 865 1.8Ghz, 2.0 MB L2 cache; 2GB of DDR333; Windows 2003 Server. time of creating of B+-tree with the FF compression 7. R. Bayer and K. Unterauer: Prefix B-trees. ACM is 1.6 − 4.3× higher then the time of creating for the Trans. on Database Systems, 2, 1, 1977, 11–26. B+-tree. In the case of the IC compression, the creat- 8. R. Fenk: The BUB-tree. In Proceedings of 28rd VLDB ing time is 1.7 − 2.9× higher. The compression ratio International Conference on Very Large Data Bases is shown is Figure 4.2(a) as well. (VLDB’02), Hongkong, China, Morgan Kaufmann, In our experiments, we test 50 random queries and 2002.

9. G. Goetz: Efficient columnar storage in B-trees. In the results are then averaged. The results are shown Proceedings of SIGMOD Conference, 2007. in Table 3. The number of DAC is 2.1 − 3.5× lower 10. A. Guttman: R-Trees: a dynamic index structure for for the FF compression when compared to the B+- spatial searching. In Proceedings of ACM International tree and 1.5 − 3.6× for the IC compression. This re- Conference on Management of Data (SIGMOD 1984), sult influences the query processing time. The query ACM Press, June 1984, 47–57. processing times is 0.84 − 4.85× more efficient in the 11. D. Lomet: The evolution of effective B-tree page orgacase of the FF compression when compared to the B+- nization and techniques: a personal account. In Protree and the time is 1.03 − 5.4× more efficient for the ceedings of SIGMOD Conference, Sep. 2001. IC compression. Obviously, if the compression ratio is 12. V. Markl: Mistral: Processing relational queries over a threshold then the B+-tree overcomes the com- using a multidimensional access technique. Ph.D. pressed indices. In Figure 4.2(b) and (c), we see the thesis, Technical University Mu¨nchen, Germany, 1999, http://mistral.in.tum.de/results/publications/ query processing time and DAC. Mar99.pdf. 13. Y. Sagiv: Concurrent operations on B*-trees with overtaking. In Journal of Computer and System Sciences, 6 Conclusion 1986. 14. D. Salomon: Data Compression The Complete Refer

ence. Third Edition, Springer–Verlag, New York, 2004. 15. A.A. Toptsis: B**-tree: a data organization method for high storage utilization. In Computing and Information, 1993. 16. J. Walder, M. Kr´atky´, and R. Baˇca: Benchmarking coding algorithms for the R-tree compression. In Proceedings of DATESO 2009, Czech Republic, 2009. 17. N. Wirth: Algorithms and Data Structures. Prentice

Hall, 1984.

In this article, we propose two methods for B-tree compression. If the compression ratio is below a threshold then the query processing performance of the compressed index overcomes the B-tree. However, there are still some open issues. The first one is the high creating time. In this case, we must develop a more efficient method or we must use and test the bulkloading (see [ 3, 16 ]). Additionally, we must test our method for a real data collection. Finally, we should test different compression and coding methods (i.e. Elias-delta code, Elias-gamma code, Golomb code [14]).

1. C. Antognini: Troubleshooting Oracle Performance . Apress, 2008 .

Apostolico and

Fraenkel : Robust transmission of unbounded strings using Fibonacci representations . IEEE Trans. Inform ., 33 , 2 , 1987 , 238 - 245 .

3. R. Baˇca, M. Kr´atky´, and

Sn ´aˇsel: A compression scheme for the R-tree data structure . In Submitted in Information Systems , 2009 .

4. R. Baˇca , V. Sn´aˇsel, J. Platoˇs, M. Kr´atky´, and E. El-Qawasmeh : The fast Fibonacci decompression algorithm . In arXiv: 0712 .0811v2, http://arxiv.org/abs/0712.0811, 2007 .

Bayer : The universal B-tree for multidimensional indexing: general concepts . In Proceedings of WorldWide Computing and Its Applications (WWCA 1997 ), Tsukuba, Japan, Lecture Notes in Computer Science, Springer-Verlag, 1997 , 198 - 209 .

Bayer and E. M. McCreight: Organization and maintenance of large ordered indices . Acta Informatica , 1972 , 173 - 189 .