<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Benchmarking Coding Algorithms for the R-tree ⋆ Compression</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jiˇr´ı Walder</string-name>
          <email>jiri.walder@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michal Kr´atky´</string-name>
          <email>michal.kratky@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Radim Baˇca</string-name>
          <email>radim.baca@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science Technical University of Ostrava</institution>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <fpage>32</fpage>
      <lpage>43</lpage>
      <abstract>
        <p>Multi-dimensional data structures have been widely applied in many data management fields. Spatial data indexing is their natural application, however there are many applications in different domain fields. When a compression of these data structures is considered, we follow two objectives. The first objective is a smaller index file, the second one is a reduction of the query processing time. In this paper, we apply a compression scheme to fit these objectives. This compression scheme handles compressed nodes in a secondary storage. If a page must be retrieved then this page is decompressed into the tree cache. Since this compression scheme is transparent from the tree operations point of view, we can apply various compression algorithms to pages of a tree. Obviously, there are compression algorithms suitable for various data collections, therefore, this issue is very important. In our paper, we compare the performance of Golomb, Elias-delta and Elias-gamma coding with the previously introduced Fast Fibonacci algorithm.</p>
      </abstract>
      <kwd-group>
        <kwd>multi-dimensional data structures</kwd>
        <kwd>R-tree</kwd>
        <kwd>compression scheme</kwd>
        <kwd>Golomb</kwd>
        <kwd>Elias-delta</kwd>
        <kwd>and Elias-gamma coding</kwd>
        <kwd>Fast Fibonacci algorithm</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Multidimensional data structures [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] have been widely applied in many data
management fields. Spatial data indexing is their natural application, however
there are many applications in different domain fields. In the case of spatial
data, structures often store two- and three-dimensional objects. In the case of
multimedia data, spaces with dimensionality up to 100,000 appear.
      </p>
      <p>
        Many multidimensional data structures have been developed in the past, e.g.
the quadtree family [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], LSD-tree [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], R-tree [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], R+-tree [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], R∗-tree [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and
Hilbert R-tree [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In the case of R-tree, tuples are clustered in a tree’s page
using MBBs (Minimal Bounding Boxes). If we consider a multidimensional tuple
collection, redundancy appears. Consequently, a compression may be used for
the nodes efficient storage and retrieval.
      </p>
      <p>
        Although some works applying a compression inside a DBMS have been
developed, a real-time compression of multidimensional data structures is not
often a research interest. Obviously, a smaller index file means lower DAC (Disk
Access Cost ) when a query is processed [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Consequently, lower DAC may mean
the lower processing time.
      </p>
      <p>
        There are a lot of works applying a compression inside a DBMS. In [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ],
authors depict RLE for compression of sorted columns to have few distinct
values. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], authors propose the SBC-tree (String B-tree for Compressed
sequences) for indexing and searching RLE-compressed sequences of arbitrary
length. Work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] demonstrates that the fast dictionary-based methods can be
applied to order-preserving compression. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], authors introduce the IQ-tree,
an index compression technique for high-dimensional data spaces. They present
a page scheduling strategy for nearest neighbor algorithms that, based in a cost
model, can avoid many random seeks. Work [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] introduces a tree-based structure
called PCR-tree to manage principle components. In [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], authors introduce the
xS-tree that uses lossy compression of bounding regions. Original works written
about compressions of multidimensional data structures describe the
compression of quad-tree [
        <xref ref-type="bibr" rid="ref22 ref8">8, 22</xref>
        ]. Work [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] suggested an algorithm to save at least 66%
of the computer storage required by regular quadtrees. The first work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which
concerns compressing R-tree pages, uses the relative representation of MBB to
increase the fanout of the R-tree page. A bulk-loading algorithm, which is a
variation of STR [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], and a lossy compression based on the coordinate
quantization are presented there. Other works in this field are focused on improving
the effectiveness of the main memory indexes. Those cache-conscious indexes
suppose that they can store most of the index in the main memory. Such a work
is CR-tree [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], which uses a type of MBB representation similar to [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Let the
irrelevant page be the page whose MBB does not intersect a query box. These
works apply the lossy compression, therefore an improved compression ratio is
achieved when a filtration of irrelevant pages must be processed during a query
processing.
      </p>
      <p>
        In this paper, we utilize a compression scheme for R-tree introduced in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Pages of a tree are stored in a secondary storage and decompressed in a tree’s
cache. We achieved a lower DAC and the pages are not always decompressed
when an operation is required. In this paper, we compare the Fast Fibonacci
coding [
        <xref ref-type="bibr" rid="ref2 ref3">3, 2</xref>
        ] with three other coding algorithms: Golomb, Elias-Gamma, and
Elias-Delta codings [
        <xref ref-type="bibr" rid="ref17 ref20">20, 17</xref>
        ].
      </p>
      <p>In Section 2, we briefly describe the R-tree and its variants. In Section 3,
the above depicted compression scheme is presented. In Section 4, we describe
various coding techniques: Fast Fibonacci, Golomb, Elias-gamma, and
Eliasdelta. Experimental results are shown in Section 5. Finally, we conclude with a
summary of results and discussion about future work.</p>
      <p>
        R-tree
R-trees [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] support point and range queries, and some forms of spatial joins.
Another interesting query, supported to some extent by R-trees, is the k nearest
neighbors (k-NN query. R-tree can be thought of as an extension of B-trees in a
multi-dimensional space. It corresponds to a hierarchy of nested n-dimensional
MBBs (see [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for detail). R-tree performance is usually measured with respect
to the retrieval cost (in terms of DAC) of queries.
      </p>
      <p>
        Variants of R-trees differ in the way they perform the split algorithm. The
well-known R-tree variants include R∗-trees and R+-trees. In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], we can find a
more detailed description as well as depiction of other R-tree variants.
      </p>
      <p>
        It is not usually efficient to insert a large amount of data into an R-tree
using the standard insert operation [
        <xref ref-type="bibr" rid="ref10 ref4">10, 4</xref>
        ]. The split algorithm is rather an
expensive operation, therefore, the insertion of many items may take quite a long
time. Moreover, this algorithm is executed many times during the insertion. The
query performance is greatly influenced by utilization of the R-tree. A common
utilization rate of an R-tree created with a standard insert algorithm is around
55%. On the other hand, the utilization rate of the R-tree created with the
bulk-loading method, rises up to 95% [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Several bulk-loading methods [
        <xref ref-type="bibr" rid="ref12 ref15 ref16">12, 15, 16</xref>
        ] have been developed. All bulk-loading
methods first order input items. Method [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] utilizes one-dimensional
spacefilling curve criterion for such ordering. This method is very simple and allows
to order input items very fast. The result R-tree preserves suitable features for
the most common data.
3
      </p>
      <p>
        A Compression Scheme for Tree-like Data Structures
In this section, we describe a basic compression scheme which can be utilized
for most paged tree data structures [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Pages are stored in a secondary storage
and retrieved when the tree requires a page. This basic strategy is widely used
by many indexing data structures such as B-trees, R-trees, and many others.
They utilize cache for fast access to pages as well, since the access to the
secondary storage can be more than 20 times slower compared to access to the main
memory. We try to decrease the amount of DAC to a secondary storage while
significantly decreasing the size of a tree file in the secondary storage.
      </p>
      <p>In Figure 1, we can observe the basic idea of compression scheme used in this
paper. If a tree data structure wants to retrieve a page, the compressed page is
transfered from the secondary storage to the tree’s cache and it is decompressed
there. An important issue of our compression scheme is that the pages are only
compressed in the secondary storage.</p>
      <p>When the compression scheme is taken into consideration, the tree insert
algorithm only needs to be slightly modified. Query algorithms are not affected at
all because page decompression is processed only between cache and secondary
storage and the tree can utilize decompressed pages for searching without
knowing that they have been previously compressed.</p>
      <p>
        The goal of R-tree and its variants is to cluster the most similar tuples into
a single page. The term ‘similar tuples’ means that the tuples are close to each
other in a multi-dimensional space according to L2 metric. This feature can be
utilized to compress R-tree pages by a fitting compression algorithm. An
important issue of this scheme is that we can apply various compression algorithms to
a single R-tree. In Section 4, we show an algorithm for the R-tree compression,
other compression algorithms can be found in [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
      </p>
      <p>Using this compression scheme we reduce the R-tree index size as well as
DAC during a query processing. We require a decompression algorithm to be as
fast as possible, otherwise the decompression time would not exceed the time
saved for a lower DAC.
4</p>
      <p>Compression Algorithm
Since tuples of a tree’s page are closely located to one another in a
multidimensional space, we can suppose that coordinates of these tuples are ’similar’. This
means that the coordinates in each dimension are the same or their differences
are rather of small values. Consequently, this feature provides increased potential
for a compression.</p>
      <p>
        We implemented different bit-length number coding techniques: Golomb,
Elias-gamma and Elias-delta. These coding algorithms are compared with the
previously published Fast Fibonacci coding [
        <xref ref-type="bibr" rid="ref2 ref3">3, 2</xref>
        ]. We utilize these coding
techniques in a compression algorithm based on coding of differences between similar
tuple coordinates.
4.1
      </p>
      <p>
        Golomb, Elias-gamma and Elias-delta and Fast Fibonacci
Coding
Small values are possible to code with various coding techniques. We have
implemented three simple techniques for the coding of values. These techniques are
as follows: Golomb, Elias-gamma, and Elias-delta [
        <xref ref-type="bibr" rid="ref17 ref20">20, 17</xref>
        ]. The algorithms used
for coding are shown in Algorithms 1, 2, and 3. All codes for numbers 1-12 are
depicted in Table 1.
      </p>
      <p>
        In Algorithms 1, 2, and 3 the compressed values are read bit by bit. Retrieving
the bit from the compressed memory is a time consuming operation. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Fast
Fibonacci decompression was introduced. This algorithm processed data without
retrieving every single bit from a compressed memory. The proposed Fibonacci
decompression method is based on a precomputed mapping table. This table
enables converting of compressed memory segments directly into decompressed
values.
Difference-based compression algorithm for the R-tree was introduced in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
however difference-based compression algorithms are well known [
        <xref ref-type="bibr" rid="ref20 ref27">27, 20</xref>
        ]. This
algorithm is shown in Algorithm 4. This algorithm simply computes XOR
differences between coordinates of the first tuple and values of other tuples. After
that we add all difference numbers into the mCodingBuffer buffer, all numbers
are coded by the Encode function. In this paper, we compare Fast Fibonacci,
Golomb, Elias-Gamma, and Elias-Delta for coding of numbers. In Figure 2, we
can see some encoded values for these coding techniques. Differences for the page
P are output in the page PXOR. The difference numbers in the page PXOR are
then coded by the Encode function.
5
      </p>
      <p>Experimental Results
In our test1, we have used the compression scheme depicted in Section 3 and
coding algorithms described in Section 4. In this section, we compare the query
1 The experiments were executed on a PC with 1.8 Ghz AMD Opteron 865, 2 MB L2
cache; 2 GB of DDR333; Windows 2008 Server.
input : Golomb code bit stream and Golomb code parameter parameterM
output: Decoded number n
1 Bits ←Log(parameterM)/ Log(2);
2 TreshNumber ←Pow(2,Bits +1 )–parameterM ;
3 PowerTwo ←Floor(Bits)==Bits;
4 qpart ← 0;
5 rpart ← 0;
6 bit ←stream.GetNextBit();
7 while bit do
8 qpart ++;
9 bit ←stream.GetNextBit();
10 end
11 if PowerTwo then
12 for x ← 0 to Bits do
13 bit ←stream.GetNextBit();
14 rpart ←rpart &lt;&lt;1|bit ;
15 end
16 else
17
18
19
20
21
22
23
24
25
for x ← 0 to Bits do
bit ←stream.GetNextBit();
rpart ←rpart &lt;&lt;1|bit ;
end
if rpart &gt;=TreshNumber then
bit ←stream.GetNextBit();
rpart ←rpart &lt;&lt;1|bit ;
rpart ←rpart-TreshNumber ;
end
26 end
27 n ← qpart ∗parameterM +rpart ;</p>
      <p>Algorithm 1: Golomb decoding algorithm
performance of a compressed as well as uncompressed data structures. We test
both real and synthetic data sets. In all experiments, we turn off the OS’s disk
read cache to prevent the OS from file caching and the cache of data structures
was 1,000 inner and leaf nodes. The page size of all data structures is 2,048B.
To compare the performance of the compressed and uncompressed R-tree we
observe the following features:
input : Elias-gamma code bit stream
output: Decoded number n
1 Bits ←1;
2 n ←0;
3 bit ←stream.GetNextBit();
4 while not bit do
5 Bits ++;
6 bit ←stream.GetNextBit();
7 end
8 repeat
9 Bits −−;
10 n ←n |bit &lt;&lt;Bits ;
11 if Bits &gt;0 then
12 bit ←stream.GetNextBit();
13 end
14 until Bits ==0 ;</p>
      <p>Algorithm 2: Elias-gamma decoding algorithm
– the query processing time and DAC, see Section 5.1
– R-tree index size, see Section 5.2
– an influence of various space dimensionalities, see Section 5.3
– an influence of various query selectivities, see Section 5.4</p>
      <p>
        We perform experiments on synthetic as well as real data sets. In the case of
synthetic data sets, we generate collections of 500,000 points for dimensionalities:
2, 4, and 6 in an integer domain of the h0, 2 × 106i range with the uniform
distribution of values. In the case of real data sets, we test TIGER 2D spatial
data collections of 500,000 (TIG05) and 2 million (TIG20) points [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. These
data collections only include unique tuples. In this way, the compression scheme
performance is not influenced by identical tuples. We process series of query
experiments where one experiment consists of 50 randomly generated queries.
Consequently, each presented result is the summary result of all these queries.
Query boxes covering 0.1%, 0.2%, 0.3%, 0.4%, and 0.5% of the data space were
randomly generated. In other words, the query selectivity is changed in this way.
5.1
      </p>
      <p>Processing Query Time and DAC
In Tables 2 and 3, query processing performance is presented for both real and
random data collections. In this experiment, selectivity is 0.2%. In the case of
random data, the best query time was achieved by the Fast Fibonacci algorithm.
In the case of other coding algorithms, the query time is little worse than in the
case of the uncompressed R-tree. The Elias-delta decoding algorithm is 30%
slower than Fast Fibonacci. The Golomb and Elias-gamma algorithms are 60%
slower than Fast Fibonacci. In the case of real collections, Elias-delta coding
outperforms the Fast Fibonacci coding. In the case of TIG05, Elias-delta saves
input : Elias-delta code bit stream
output: Decoded number n
6% of the query processing time in a comparison to the Fast Fibonacci algorithm
and 19% of the query processing time in a comparison to the uncompressed
Rtree.</p>
      <p>In Table 4, we propose the query processing time in more detail for both
Elias-delta and Fast Fibonacci. These results are related to the TIG20
collection. Obviously, time spent on reading of pages in the secondary storage is
lower in the case of Elias-delta, however the decompression time is lower for
the Fast Fibonacci algorithm. Overall query processing time is better for Fast
Fibonacci. Elias-delta reads values in the bit-by-bit way, on the other hand Fast
Fibonacci works with bytes. In the future, we can focus on a development of
similar byte-based reading for other coding algorithms, especially for the
Eliasdelta algorithm. Elias-delta achieves the lowest DAC for both real and random
data collections.
5.2</p>
      <p>Index Sizes
An important issue of the compression is a reduction of the R-tree index size. In
Figure 3(f), we can see the index sizes for the real collection. The best
compression ratio was achieved by the Elias-delta encoding. In this case, we save more
than 60% of the index size.
Input : stream, an instance of cStream
output: Compressed R-tree node
1 mCodingBuffer.Clear ();
2 stream.Write (mCount);
3 for i ← 0 to mDimension do
4 value ← mTuples [i].GetInt (0);
5 stream.Write (value);
6
7
8
9
10
11
12
13 end
for j ← 1 to mCount do
int tmpValue ← mTuples [j].GetInt (i);
int diff ← value XOR tmpValue ;
mCodingBuffer.Add (diff);
end
Encode (mCodingBuffer);
stream.Write (mCodingBuffer);
Algorithm 4: Difference-based compression of an R-tree leaf page
Processing Time [s] 60.9
DAC All Nodes [MB] 21,131
DAC Leaf Nodes [MB] 10,691</p>
      <p>Normal Golomb Golomb Golomb Elias Elias Fast</p>
      <p>M=4 M=8 M=16 gamma delta Fibonacci
87.3 87.2 89.9 86.6 68.9 52.1
8,477 8,634 8,797 7,806 7,176 7,175
4,333 4,413 4,491 3,988 3,674 3,673
We compare DAC for randomly generated data collections with dimensionalities
2, 4, and 6 (see Figure 3e). We save more than 60% of DAC in the case of
Eliasdelta and dimension 2. The compression ratio weakens with increasing space
dimension. The space is bigger with increasing dimension, tuples are further
from one another, therefore, less redundancy appears in tuples. The dimension
modification has no impact on the performance of various coding techniques.
DAC for Random data
DAC for Real Data
l
a b
m
r
o
loom 4=M loobm 8=M loobm 16=M il-saE aamm li-saE lteda iccano</p>
      <p>G G G g ib
N</p>
      <p>N
l
a b
m
r
o
loom 4=M loobm 8=M loobm 16=M il-saE aamm il-saE lteda iccano</p>
      <p>G G G g ib
F</p>
      <p>F
Influence of Dimension Change
Real Data Index Size
la b b b - a -s ta ic
roNm looGm 4=M looGm 8=M looGm 16=M ilsaE agmm liaE led icanob
l
a
m
r
o
N
b b b 6 - a -s ta
lom 4= lom 8= lom 1= ilsa mm liaE led
o M o M o M E a
G G G g
i
c
c
a
n
o
b
i
F
Query processing time for bulk-loaded (c) random and (d) real data collections
(e) Selectivity influence comparison (f) Index size comparison
5.4 Influence of the Query Selectivity
In this experiment, we choose the following selectivities: 0.1%, 0.2%, 0.3%, 0.4%,
and 0.5%. DAC and query processing times are put forward in Figures 3(a)-(d).
The results are presented for both random and real data collections. Obviously,
Elias-delta outperform other coding algorithms. The other codings produce
approximately the same DAC. The selectivity modification has no impact on the
performance of various coding techniques.
6</p>
      <p>Conclusion
In this paper, we test a lossless compression scheme for the R∗-tree data
structure. We compare the following coding techniques: Golomb, Elias-Gamma, and
Elias-Delta, with the previously published Fast Fibonacci coding. All coding
algorithms improve DAC compare to the regular R-tree. In the case of real data
collections, Elias-delta and Fast Fibonacci techniques achieve the best results.
The Elias-delta algorithm saves 5% DAC of Fast Fibonacci. All other algorithms
are less efficient that the Fast Fibonacci algorithm. When real data sets are
concerned, the compression methods save at least 60% of the index size required by
a regular R-tree.</p>
      <p>The best compression ratio was achieved by the Elias-delta codding. On the
other hand, decompression time for Fast Fibonacci is lower than in the case of
Elias-delta. Elias-delta reads values in the bit-by-bit way, however Fast Fibonacci
works with bytes. In our future work, we want to focus on a development of
similar byte-based reading for other coding algorithms, especially for the
Eliasdelta algorithm.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>G.</given-names>
            <surname>Antoshenkov</surname>
          </string-name>
          .
          <article-title>Dictionary-based Order-preserving String Compression</article-title>
          .
          <source>VLDB Journal - The International Journal on Very Large Data Bases</source>
          ,
          <volume>6</volume>
          (
          <issue>1</issue>
          ):
          <fpage>26</fpage>
          -
          <lpage>39</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. R. Baˇca, M. Kr´atky´, and
          <string-name>
            <given-names>V.</given-names>
            <surname>Sn</surname>
          </string-name>
          <article-title>´aˇsel. A compression scheme for multi-dimensional data structures</article-title>
          .
          <source>Submitted to Information Systems</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>R. Baˇca</surname>
            , V. Sn´aˇsel, J. Platoˇs, M. Kr´atky´, and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>El-Qawasmeh</surname>
          </string-name>
          .
          <article-title>The fast fibonacci decompression algorithm</article-title>
          . In arXiv:
          <volume>0712</volume>
          .0811v2, http://arxiv.org/abs/0712.0811,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>N.</given-names>
            <surname>Beckmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.-P.</given-names>
            <surname>Kriegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schneider</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Seeger. The R</surname>
          </string-name>
          ∗
          <article-title>-tree: An efficient and robust access method for points and rectangles</article-title>
          .
          <source>In Proceedings of the ACM International Conference on Management of Data (SIGMOD</source>
          <year>1990</year>
          ), pages
          <fpage>322</fpage>
          -
          <lpage>331</lpage>
          . ACM Press,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>S.</given-names>
            <surname>Berchtold</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>B¨ohm</article-title>
          , H.-P. Kriegel,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sander</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          . Independent Quantization:
          <article-title>An Index Compression Technique for High-Dimensional Data Spaces</article-title>
          .
          <source>In Proceedings of the 16th International Conference on Data Engineering (ICDE</source>
          <year>2000</year>
          ), pages
          <fpage>577</fpage>
          -
          <lpage>588</lpage>
          . IEEE Computer Society,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao. PCR-Tree</surname>
          </string-name>
          :
          <article-title>A Compression-Based Index Structure for Similarity Searching in High-Dimensional Image Databases</article-title>
          .
          <source>In Proceedings of the Fourth International conference on Fuzzy Systems and Knowledge Discovery (FSKD</source>
          <year>2007</year>
          ), pages
          <fpage>395</fpage>
          -
          <lpage>400</lpage>
          . IEEE Computer Society,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Eltabakh</surname>
          </string-name>
          , W.-K. Hon,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. G.</given-names>
            <surname>Aref</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Vitter</surname>
          </string-name>
          .
          <article-title>The SBC-Tree: An Index for Run-Length Compressed Sequences</article-title>
          .
          <source>In Proceedings of the 11th International Conference on Extending Database Technology (EDBT</source>
          <year>2008</year>
          ). ACM Press,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>I. Gargantini.</surname>
          </string-name>
          <article-title>An Effective Way to Represent Quadtrees</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>25</volume>
          :
          <fpage>905</fpage>
          -
          <lpage>910</lpage>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>J.</given-names>
            <surname>Goldstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Shaft</surname>
          </string-name>
          .
          <article-title>Compressing Relations and Indexes</article-title>
          .
          <source>In Proceedings of IEEE Conference on Data Engineering (ICDE</source>
          <year>1998</year>
          ), pages
          <fpage>370</fpage>
          -
          <lpage>379</lpage>
          , Los Alamitos, USA,
          <year>1998</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>A.</given-names>
            <surname>Guttman. R-Trees</surname>
          </string-name>
          :
          <article-title>A Dynamic Index Structure for Spatial Searching</article-title>
          .
          <source>In Proceedings of ACM International Conference on Management of Data (SIGMOD</source>
          <year>1984</year>
          ), pages
          <fpage>47</fpage>
          -
          <lpage>57</lpage>
          . ACM Press,
          <year>June 1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A.</given-names>
            <surname>Henrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Six</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Widmayer</surname>
          </string-name>
          .
          <article-title>The lsd tree: spatial access to multidimensional and non-point objects</article-title>
          .
          <source>In VLDB '89: Proceedings of the 15th international conference on Very large data bases</source>
          , pages
          <fpage>45</fpage>
          -
          <lpage>53</lpage>
          , San Francisco, CA, USA,
          <year>1989</year>
          . Morgan Kaufmann Publishers Inc.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. I. Kamel and
          <string-name>
            <given-names>C.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          .
          <article-title>On packing R-trees</article-title>
          .
          <source>In Proceedings of the Second International Conference on Information and Knowledge Management (CIKM</source>
          <year>1993</year>
          ), pages
          <fpage>490</fpage>
          -
          <lpage>499</lpage>
          . ACM Press,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. I. Kamel and
          <string-name>
            <given-names>C.</given-names>
            <surname>Faloutsos. Hilbert</surname>
          </string-name>
          r-tree:
          <article-title>An improved r-tree using fractals</article-title>
          .
          <source>In In Proceedings of VLDB 1984</source>
          , pages
          <fpage>500</fpage>
          -
          <lpage>509</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>K.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Cha</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Kwon</surname>
          </string-name>
          .
          <article-title>Optimizing Multidimensional Index Trees for Main Memory Access</article-title>
          .
          <source>In Proceedings of ACM International Conference on Management of Data (SIGMOD</source>
          <year>2001</year>
          ), pages
          <fpage>139</fpage>
          -
          <lpage>150</lpage>
          , New York, USA,
          <year>2001</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. L.
          <string-name>
            <surname>Arge</surname>
            ,
            <given-names>K.H.</given-names>
          </string-name>
          <string-name>
            <surname>Hinrichs</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Vahrenhold</surname>
            , and
            <given-names>J.S.</given-names>
          </string-name>
          <string-name>
            <surname>Vitter. Efficient Bulk Operations on Dynamic R-Trees</surname>
          </string-name>
          .
          <source>Algorithmica</source>
          , pages
          <fpage>104</fpage>
          -
          <lpage>128</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>S.</given-names>
            <surname>Leutenegger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lopez</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Edgington.</surname>
          </string-name>
          <article-title>STR: A Simple and Efficient Algorithm for R-Tree Packing</article-title>
          .
          <source>In Proceedings of 13th International Conference on Data Engineering (ICDE</source>
          <year>1997</year>
          ), pages
          <fpage>497</fpage>
          -
          <lpage>506</lpage>
          . IEEE CS Press,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>D. J. C.</surname>
          </string-name>
          <article-title>Mackay</article-title>
          . Information Theory,
          <string-name>
            <given-names>Inference &amp; Learning</given-names>
            <surname>Algorithms</surname>
          </string-name>
          . Cambridge University Press,
          <year>June 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Manolopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nanopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Theodoridis</surname>
          </string-name>
          .
          <source>RTrees: Theory and Applications</source>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Manolopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Theodoridis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V. J.</given-names>
            <surname>Tsotras</surname>
          </string-name>
          .
          <source>Advanced Database Indexing</source>
          . Kluwer Academic Publisher,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>D.</given-names>
            <surname>Salomon</surname>
          </string-name>
          .
          <source>Data Compression The Complete Reference. Third Edition</source>
          , SpringerVerlag, New York,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>H.</given-names>
            <surname>Samet</surname>
          </string-name>
          .
          <article-title>Foundations of Multidimensional and Metric Data Structures</article-title>
          . Morgan Kaufmann,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>H.</given-names>
            <surname>Samet</surname>
          </string-name>
          .
          <article-title>Data Structures for Quadtree Approximation and Compression</article-title>
          .
          <source>Communications of the ACM archive</source>
          ,
          <volume>28</volume>
          (
          <issue>9</issue>
          ):
          <fpage>973</fpage>
          -
          <lpage>993</lpage>
          ,
          <year>September 1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23. T. Sellis,
          <string-name>
            <given-names>N.</given-names>
            <surname>Roussopoulos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          . The r+
          <article-title>-tree: A dynamic index for multidimensional objects</article-title>
          .
          <source>In In Proceedings of VLDB 1987</source>
          , pages
          <fpage>507</fpage>
          -
          <lpage>518</lpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>M. Stonebraker</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Batkin</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Cherniack</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ferreira</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Madden</surname>
          </string-name>
          .
          <article-title>C-store: A Column Oriented DBMS</article-title>
          .
          <source>In Proceedings of the International Conference on Very Large Data Bases</source>
          ,
          <string-name>
            <surname>VLDB</surname>
          </string-name>
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25. U.S. Department of Commerce, U.S. Census Bureau, Geography Division. TIGER/Line Files,
          <source>2006 Second Edition</source>
          , Alabama, Autauga County,
          <year>2006</year>
          , http://www.census.gov/geo/www/tiger/.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <given-names>X. S.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Indexing Very High-dimensional Sparse and Quasisparse Vectors for Similarity Searches</article-title>
          .
          <source>VLDB Journal - The International Journal on Very Large Data Bases</source>
          ,
          <volume>9</volume>
          (
          <issue>4</issue>
          ):
          <fpage>344</fpage>
          -
          <lpage>361</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moffat</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Bell</surname>
          </string-name>
          . Managing Gigabytes,
          <article-title>Compressing and Indexing Documents and Images, 2nd edition</article-title>
          . Morgan Kaufmann,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>