<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Benchmarking a B-tree compression method?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Filip Kˇriˇzka</string-name>
          <email>filip.krizka@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michal Kr´atky´</string-name>
          <email>michal.kratky@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Radim Baˇca</string-name>
          <email>radim.baca@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Technical University of Ostrava</institution>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <fpage>37</fpage>
      <lpage>43</lpage>
      <abstract>
        <p>The B-tree and its variants have been widely entries, and [7] describes a binary search that copes applied in many data management fields. When a com- with variable length entries. pression of these data structures is considered, we follow Work [9] describes a split technique for data. Rows two objectives. The first objective is a smaller index file, are assigned tag values in the order in which they are the second one is a reduction of the query processing time. added to the table. Note that tag values identify rows In this paper, we apply a compression scheme to fit these in the table, not records in an individual partition or objectives. The utilized compression scheme handles compressed nodes in a secondary storage. If a page must be re- in an individual index. Each tag value appears only trieved then this page is decompressed into the tree cache. once in each index. All vertical partitions are stored Since this compression scheme is transparent from the tree in the B-tree with the tag value as the key. The novel operation's point of view, we can apply various compression aspect is that the storage of the leading key is reduced algorithms to pages of a tree. Obviously, there are compres- to a minimal value. sion algorithms suitable for various data collections, and Unlike these works, in our work we suppose the so, this issue is very important. In our paper, we compare B-tree compression without changes of the B-tree the B-tree and compressed B-tree where the Fast Fibonacci structure. We mainly utilize the fast decompression aland invariable coding compression methods are applied. gorithm. In the case of the previously depicted papers,</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Section 6, we summarize the paper content and out- must be either 2 · L or 2 · L − 1; thus each internal node
line possibilities of our future work. is at least half full. This relationship between U and L
implies that two half-full nodes can be joined to make
a legal node, and one full node can be split into two
2 B-tree and its variants legal nodes (if there is an empty space to push one
element up into the parent). These properties make it
The B-tree is a tree structure published by Rudolf possible to delete and insert new values into a B-tree
Bayer and Edward M. McCreight in 1972 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The and adjust the tree to preserve the B-tree properties.
B-tree keeps data sorted and allows searches, inser- Leaf nodes have the same restriction on the
numtions, and deletions in logarithmic amortized time. It is ber of elements, but have no children, and no child
optimized for systems that read and write large blocks pointers. The root node still has the upper limit on
of data. A B-tree is kept balanced by requiring that all the number of children, but has no lower limit. For
leaf nodes are at the same depth. This depth will in- example, when there are fewer than L-1 elements in
crease slowly as elements are added to the tree, but an the entire tree, the root will be the only node in the
increase in the overall depth is infrequent, and results tree, and it will have no children at all.
in all leaf nodes being one more node further away A B-tree of depth n+1 can hold about U times as
from the root. many items as a B-tree of depth n, but the cost of
      </p>
      <p>B-trees have substantial advantages over alterna- search, insert, and delete operations grows with the
tive implementations when node access times far ex- depth of the tree. As with any balanced tree, the cost
ceed access times within nodes. This usually occurs increases much more slowly than the number of
elewhen most nodes are in secondary storage such as ments.
hard drives. By maximizing the number of child nodes Some balanced trees store values only at the leaf
within each internal node, the height of the tree de- nodes, and so have different kinds of nodes for leaf
creases, balancing occurs less often, and efficiency in- nodes and internal nodes. B-trees keep values in every
creases. Usually this value is set such that each node node in the tree, and may use the same structure for all
takes up a full disk block or an analogous size in sec- nodes. However, since leaf nodes never have children,
ondary storage. a specialized structure for leaf nodes in B-trees will</p>
      <p>A B-tree of order m (the maximum number of chil- improve performance. The best case height of a B-tree
dren for each node) is a tree which satisfies the follow- is: logM n. The worst case height of a B-tree is: log M n.
ing properties: Where M is the maximum number of children a n2ode
can have.</p>
      <p>
        There exists many variants of the B-tree:
B∗-tree [13], B∗∗-tree [15], B+-tree [17]. In the case
of the B+-tree, data is only stored in leaf nodes and
inner nodes include keys. Leaf nodes hold links to the
previous and next nodes. Moreover, many paged data
structures like UB-tree [
        <xref ref-type="bibr" rid="ref5">5, 12</xref>
        ], BUB-tree [8], and
      </p>
      <p>R-tree [10] are based on the B-tree.
– Every node has at most m children.
– Every node (except root and leaves) has at least</p>
      <p>m2 children.
– The root has at least two children if it is not a leaf</p>
      <p>node.
– All leave nodes are in the same level.
– All inner nodes with k children contain k–1 links</p>
      <p>to children.</p>
      <p>
        Each internal node’s elements act as separation values
which divide its subtrees. For example, if an internal 3 A compression scheme for tree-like
node has three child nodes (or subtrees) then it must data structures
have two separation values or elements a1 and a2. All
values in the leftmost subtree will be less than a1, all In this section, we describe a basic scheme which can
values in the middle subtree will be between a1 and a2, be utilized for most paged tree data structures [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
and all values in the rightmost subtree will be greater Pages are stored in a secondary storage and retrieved
than a2. when the tree requires a page. This basic strategy is
      </p>
      <p>Internal nodes in a B-tree – nodes which are not widely used by many indexing data structures such as
leaf nodes – are usually represented as an ordered set B-trees, R-trees, and many others. They utilize cache
of elements and child pointers. Every internal node for fast access to pages as well, since access to the
contains a maximum of U children and – other than secondary storage can be more than 20 times slower
the root – a minimum of L children. For all internal compared to access to the main memory. We try to
denodes other than the root, the number of elements is crease the amount of disc access cost (DAC) to a
secone less than the number of child pointers; the number ondary storage while significantly decreasing the size
of elements is between L − 1 and U − 1. The number U of a tree file in the secondary storage.</p>
      <p>Let us consider a common cache schema of
persistent data structures in Figure 1(a). When a tree Cc = Cu ,
requires a node, the node is read from the main mem- CRA
ory cache. If the node is not in the cache, the node where CRA is the assumed maximum compression
rapage is retrieved from the secondary storage. tio, Cu is the capacity of the uncompressed page. For</p>
      <p>An important issue of the compression schema is example, the size of the page is 2,048 B, Cu = 100,
that tree pages are only compressed in the secondary CRA = 1/5, then Cc = 500. The size of the page for
storage. In Figure 1(b), we can observe the basic idea the capacity is 10,240 B. This means that all pages in
of the scheme. If a tree data structure wants to retrieve the tree’s cache have Cc = 500, although their S size in
a page, the compressed page is transfered from the the secondary storage is less than or equal to 2,048 B.
secondary storage to the tree’s cache and it is decom- Let us note that
pressed there. Function TreeNode:: Decompress()
performs the decompression. Afterwards, the decom- CR = compressed size .
pressed page is stored in the cache. Therefore, the original size
tree’s algorithms only work with decompressed pages. The TreeNode::Compress() function is called when
Obviously, the tree is preserved as a dynamic data the page must be stored in the secondary storage.
structure and in our experiments we show the page Every page in the tree has its minimal page
utilizadecompression does not significantly affect query per- tion Cc/2. Let Sl denote the byte size of a compressed
formance because we save time with the lower DAC. page. After deleting one item in the page, the byte size
of the page is denoted by Sc. Without loss of
general</p>
      <p>Data structure's ity, we can assume that Sc ≤ Sl. If items are deleted
Data structure Node in macinacmheemory Page Sesctoornadgaery ftrhoamn othreeqpuaagle,towCecm/u2.stIfchsoe,ckthwehpeatgheeriscsatpraectcithyedis ilnetsos
other pages according to the tree deleting algorithm.</p>
      <p>
        Query algorithms are not affected at all because
page decompression is processed only between cache
and secondary storage and the tree can utilize
decom(a) pressed pages for searching without knowing that they
Data structure Node iDnamtaacsinatrcmuhceetumroer'sy Comppargeeseed Sesctoornadgaery ahpavpTeliehbdiesetnboapasnriceyvipidoaeugaselodyf tctrohemee pcdroaemtsaspesrdte.rsusicotnursec.hIetmise
ncoatndbeependent upon an applied split algorithm, nor on the
key type stored in the tree’s pages. We test this scheme
on B+-tree data structure because this data structure
(b) has remained very popular in recent years and it is
suitable for further page compressing.
In this article, we have applied two compression
methods: Fast Fibonacci (FF) and Invariable Coding (IC).
3.1 How the compression scheme affects tree Since keys in a node are close to each another, we use
algorithms the well-known difference compression method [14].
When the compression scheme is taken into considera- Similar algorithms were used in the case of the R-tree
tion, the tree insert algorithm only needs to be slightly compression [
        <xref ref-type="bibr" rid="ref3">3, 16</xref>
        ].
modified. When we insert or modify a record in a page,
we have to perform the function TreeNode::Compress 4.1 Fast Fibonacci compression
Test() which tests whether the node fits into the page.
      </p>
      <p>
        If not, the node needs to be split. Also, during the In this method, we apply the Fibonacci coding [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
split, we have to make sure that the final nodes fit which uses the Fibonacci numbers; 1, 2, 3, 5, 8, 13, . . . .
into the page. This means that the maximum capac- A value is coded as the sum of the Fibonacci numbers
ity of a page can vary depending on the redundancy of that are represented by the 1-bit in a binary buffer.
the data. The maximum capacity of each tree’s page Special 1-bit is added as the lowest bit in this binary
must be determined by a heuristic: buffer after the coding is finished. For example, the
4
5
6
7 end
8 Write links
      </p>
    </sec>
    <sec id="sec-2">
      <title>Algorithm 1: Fast Fibonacci Compression Algo</title>
      <p>rithm</p>
      <p>function : CompressionLeafNode(node)
1 Write item1
2 for i in 2 .. node.count do
3 num ←</p>
      <p>FibonacciCodder(node.key(i)-node.key(i-1))
Write num
num ← FibonacciCodder(node.nonatrr(i))</p>
      <p>Write num
12
13 end
14 Write links</p>
      <p>function : CompressionNode(node)
9 Write item1
10 for i in 2 .. node.count do
11 num ←</p>
      <p>FibonacciCodder(node.key(i)-node.key(i-1))</p>
      <p>Write num
function : Compression(node)
15 if node.isLeaf is leaf then
16 CompressionNode(node)
17 end
18 else
19 CompressionLeafNode(node)
20 end</p>
      <p>
        function : CompressionTest(node)
21 tmp ← Compression(node)
22 if tmp.size &gt; page.size then
23 return false
24 end
25 else
26 return true
27 end
value 20 is coded as follows: 0101011 (13 + 5 + 2).
Due to the fact that we need the compression
algorithm as fast as possible, we use the Fast Fibonacci
decompression introduced in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Algorithm of the Fast Fibonacci compression is
shown in Algorithm 1. We explain the algorithm in
the following paragraphs.</p>
      <p>Compression of inner nodes:
– Keys are compressed by the Fibonacci coding. The
first value, obviously minimal, is stored. We
compute the difference of each neighboring value and
the difference is coded by Fibonacci coding. (see
Lines 10-13)
– Child and parent links are not compressed.</p>
      <p>(Line 14)
Compression of leaf nodes:
– Keys are compressed in the same way as the keys
in an inner node. (Lines 3-4)
– Unindexed attribute values are compressed by
Fibonacci coding. (Lines 5-6)
– Parent, previous, and next nodes links are not
compressed. (Line 8)
4.2</p>
      <p>Invariable coding compression
As in the previous method, we work with the difference
of each neighboring value. Since, we use invariable
coding, we must first compute the difference of the last,
maximal value, and the first value. The result of this
computation is the number of bits necessary for a
storage of the maximal value and, obviously, each value of</p>
    </sec>
    <sec id="sec-3">
      <title>2: Invariable Coding Compression</title>
      <p>Algorithm
Method</p>
      <p>function : CompressionLeafNode(node)
1 Write maxBits(node.key(1),node.key(n))
2 Write node.key(1)
3 for i in 2 .. node.count do
4 num ← ICCodder(node.key(i)-node.key(i-1))
5 Write num
6 end
7 Write maxBits(node.nonatrr(1),node.nonatrr(n))
8 Write node.nonatrr(min)
9 for i in 2 .. node.count do
10 num ←</p>
      <p>ICCodder(node.nonatrr(i)-node.nonatrr(min))</p>
      <p>Write num
11
12 end
13 Write links</p>
      <p>function : CompressionNode(node)
14 Write maxBits(node.key(1),node.key(n))
15 Write node.key(1)
16 for i in 2 .. node.count do
17 num ← ICCodder(node.key(i)-node.key(i-1))
18 Write num
19 end
20 Write links</p>
      <p>function : Compression(node)
21 if node.isLeaf is leaf then
22 CompressionNode(node)
23 end
24 else
25 CompressionLeafNode(node)
26 end</p>
      <p>function : CompressionTest(node)
27 tmp ← Compression(node)
28 if tmp.size &gt; page.size then
29 return false
30 end
31 else
32 return true
33 end
Height
Domain
DAC Read
Creating time [s]
Cache Time [s]
Compress time [s]
Decompress time [s]
# Inner nodes
# Leaf nodes
# Avg. node items
# Avg. leaf node items
Index size [kB]
Compression ratio</p>
      <p>RND 8 RND 16
B+-tree FF IC B+-tree FF IC
2 2 2 2 2 2</p>
      <p>8b 16b
20,858,473 12,115,647 10,010,790 5,996,536 5,872,078 5,739,912
15,288 65,618 44,201 6,360 11,897 10,605
7,357 5,398 1,565 6,020 1,495 1,181
0 867 652 0 883 636
0 53,524 35,908 0 9,276 8,567
35 17 9 33 17 9
7,746 3,350 2,431 5,695 2,596 2,114
222.3 198 271 172.58 153.65 235.8
129.1 298.5 411.4 176.58 385.21 473
15,564 6,736 4,882 11,394 5,228 4,248</p>
      <p>1 0.56 0.69 1 0.54 0.63</p>
      <p>1
0,9
0,8
0,7
0,6
Compress 0,5
ratio 0,4
0,3
0,2
0,1
0
200
180
160
140
prtoiQmcueeesr[ssyi]ng 1186200000
40
20
0
B+tree
FF Compress</p>
      <p>IC Compress
B+tree
FF Compress
IC Compress</p>
      <p>B+tree
FF Compress
IC Compress
RND_8 RND_16 RND_24 RND_32</p>
      <p>RND_8 RND_16 RND_24 RND_32
(a)</p>
      <p>(b)
DAC
– Keys are compressed in the same way as the keys</p>
      <p>in an inner node. (Lines 1-6)
– Unindexed attribute values are similarly
compressed as keys, however the maximal value is not
the last value – it must be found by a sequence
scan.
– Parent, previous, and next nodes links are not</p>
      <p>compressed. (Line 13)
the node. For example, if the difference of the maximal 5 Experimental results
and minimal value is 20, all values are stored in 5bits.</p>
      <p>Algorithm of the IC compression is shown in Al- In our experiments1, we test previously described
comgorithm 1. We explain the algorithm in the following pression methods. These approaches are implemented
paragraphs. in C++. We use four synthetic collections which differ
Compression of inner nodes: in values included. Collection RND 8 includes values
in h0; 255i, RND 16 includes values in h0; 65, 535i. In
– Keys are compressed by the above proposed this way, we create collections RND 24 and RND 32
method. We store the first value and the num- as well. Each collection contains 1,000,000 items.
ber of bits necessary for a storage of the maxi- For each collection, we test index building and
mal value. After, all difference values are stored. querying by processing time and DAC. In all tests, the
(Lines 14-19) page size is 2,048B and cache size is 1,024 nodes. The
– Child and parent links are not compressed. cache of the OS was turned off.</p>
      <p>(Line 20) Results of index building are depicted in Table 1
Compression of leaf nodes: and 2. We see that the compression ratio decreases for
increasing size of domains. FF compression is more
efficient for lower values; on the other hand, the IC
compression is more efficient for higher values.
Obviously, due to features of the compressed scheme used,
we obtain the high compress time. Consequently, the
1 The experiments were executed on an AMD Opteron 865
1.8Ghz, 2.0 MB L2 cache; 2GB of DDR333; Windows
2003 Server.
time of creating of B+-tree with the FF compression 7. R. Bayer and K. Unterauer: Prefix B-trees. ACM
is 1.6 − 4.3× higher then the time of creating for the Trans. on Database Systems, 2, 1, 1977, 11–26.
B+-tree. In the case of the IC compression, the creat- 8. R. Fenk: The BUB-tree. In Proceedings of 28rd VLDB
ing time is 1.7 − 2.9× higher. The compression ratio International Conference on Very Large Data Bases
is shown is Figure 4.2(a) as well. (VLDB’02), Hongkong, China, Morgan Kaufmann,
In our experiments, we test 50 random queries and 2002.</p>
      <p>9. G. Goetz: Efficient columnar storage in B-trees. In
the results are then averaged. The results are shown Proceedings of SIGMOD Conference, 2007.
in Table 3. The number of DAC is 2.1 − 3.5× lower 10. A. Guttman: R-Trees: a dynamic index structure for
for the FF compression when compared to the B+- spatial searching. In Proceedings of ACM International
tree and 1.5 − 3.6× for the IC compression. This re- Conference on Management of Data (SIGMOD 1984),
sult influences the query processing time. The query ACM Press, June 1984, 47–57.
processing times is 0.84 − 4.85× more efficient in the 11. D. Lomet: The evolution of effective B-tree page
orgacase of the FF compression when compared to the B+- nization and techniques: a personal account. In
Protree and the time is 1.03 − 5.4× more efficient for the ceedings of SIGMOD Conference, Sep. 2001.
IC compression. Obviously, if the compression ratio is 12. V. Markl: Mistral: Processing relational queries
over a threshold then the B+-tree overcomes the com- using a multidimensional access technique. Ph.D.
pressed indices. In Figure 4.2(b) and (c), we see the thesis, Technical University Mu¨nchen, Germany, 1999,
http://mistral.in.tum.de/results/publications/
query processing time and DAC. Mar99.pdf.
13. Y. Sagiv: Concurrent operations on B*-trees with
overtaking. In Journal of Computer and System Sciences,
6 Conclusion 1986.
14. D. Salomon: Data Compression The Complete
Refer</p>
      <p>ence. Third Edition, Springer–Verlag, New York, 2004.
15. A.A. Toptsis: B**-tree: a data organization method for
high storage utilization. In Computing and
Information, 1993.
16. J. Walder, M. Kr´atky´, and R. Baˇca: Benchmarking
coding algorithms for the R-tree compression. In
Proceedings of DATESO 2009, Czech Republic, 2009.
17. N. Wirth: Algorithms and Data Structures. Prentice</p>
      <p>Hall, 1984.</p>
      <p>
        In this article, we propose two methods for B-tree
compression. If the compression ratio is below a threshold
then the query processing performance of the
compressed index overcomes the B-tree. However, there
are still some open issues. The first one is the high
creating time. In this case, we must develop a more
efficient method or we must use and test the
bulkloading (see [
        <xref ref-type="bibr" rid="ref3">3, 16</xref>
        ]). Additionally, we must test our method
for a real data collection. Finally, we should test
different compression and coding methods (i.e. Elias-delta
code, Elias-gamma code, Golomb code [14]).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Antognini: Troubleshooting Oracle Performance</article-title>
          . Apress,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Apostolico</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Fraenkel</surname>
          </string-name>
          <article-title>: Robust transmission of unbounded strings using Fibonacci representations</article-title>
          .
          <source>IEEE Trans. Inform</source>
          .,
          <volume>33</volume>
          ,
          <issue>2</issue>
          ,
          <year>1987</year>
          ,
          <fpage>238</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. R. Baˇca, M. Kr´atky´, and
          <string-name>
            <given-names>V.</given-names>
            <surname>Sn</surname>
          </string-name>
          <article-title>´aˇsel: A compression scheme for the R-tree data structure</article-title>
          .
          <source>In Submitted in Information Systems</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>R. Baˇca</surname>
            , V. Sn´aˇsel, J. Platoˇs, M. Kr´atky´, and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>El-Qawasmeh</surname>
          </string-name>
          :
          <article-title>The fast Fibonacci decompression algorithm</article-title>
          . In arXiv:
          <volume>0712</volume>
          .0811v2, http://arxiv.org/abs/0712.0811,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R.</given-names>
            <surname>Bayer</surname>
          </string-name>
          :
          <article-title>The universal B-tree for multidimensional indexing: general concepts</article-title>
          .
          <source>In Proceedings of WorldWide Computing and Its Applications (WWCA</source>
          <year>1997</year>
          ), Tsukuba, Japan, Lecture Notes in Computer Science, Springer-Verlag,
          <year>1997</year>
          ,
          <fpage>198</fpage>
          -
          <lpage>209</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>R.</given-names>
            <surname>Bayer</surname>
          </string-name>
          and
          <string-name>
            <surname>E. M.</surname>
          </string-name>
          <article-title>McCreight: Organization and maintenance of large ordered indices</article-title>
          .
          <source>Acta Informatica</source>
          ,
          <year>1972</year>
          ,
          <fpage>173</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>