Implementing block-stored prefix trees in XML-DBMS

                                      c Oleg Borisenko                   Ilya Taranov
                            Institute for System Programming, Russian Academy of Sciences
                                          borisenko@ispras.ru, taranov@ispras.ru


                         Abstract                               B-trees are widely used in modern databases for index-
                                                                ing [6]. Storing the keys of arbitrary length in B-trees
      The problem of search efficiency through large            is possible however causes implementation problems. It
      amount of text data is well-known problem in              is difficult to implement the efficient storage of variable
      computer science. We would like to introduce a            length keys. In practice usually only a limited part of the
      BST data structure that allows searches through           key is stored in the tree node, and the remainder is stored
      a set of string values, and is optimized for read-        in separate overflow pages[2, 16]. This approach is quite
      ing and writing large blocks of data. This paper          effective if the keys are short or differ only in first char-
      describes the algorithms for insertion, deletion          acters.
      and search of variable-length strings in disk-                The other problem arises if one key corresponds to a
      resident trie structures. This data structure is          set of different value nodes; it is difficult to implement an
      used for value indexes on XML data. We also               efficient search by key/value pair. Most of B-tree imple-
      compare our implementation with existing B+               mentations do not provide an ability to store such mul-
      tree implementation and show that our structure           timaps. At the same time our indexes may have duplicate
      occupies several times less space with the same           key/value pairs. To be able to store identical logical1 keys
      search efficiency.                                        the concatenation of key/value pairs is usually used as a
                                                                physical key.
1     Introduction                                                  This article describes an index implementation for
                                                                disk-resident usage that fits our requirements. It can be
The problem of developing data structures that provide          shown as an extension of a data structure known as a
basic dictionary operations (insert, deletion and lookup)       trie[5] which is also called prefix tree.
and is optimized for disk storage has been investigated in          The idea of using prefix trees as a replacement for B-
computer science for a long time but remains very impor-        trees has already been discussed in other works. A mod-
tant. This work describes a new disk-resident data struc-       ified B-tree, called S(b)-tree [7], stores a binary Patri-
ture BST that stores a set of string keys and algorithms        cia[17] tree in its nodes. The feature of S(b)-tree is that
for insertion, deletion and search for this structure. Also     nodes do not store a key itself, but the number of bits
we compare the existing implementation of B+ tree with          passed during the comparison. During the search for the
our novel approach within Sedna XML DBMS [8, 18]                string you may have to compare this string with the string
and show that the latter has an opportunity for significant     stored in external memory. However, such a comparison
economy of disk space providing the same search time            is not a big problem in itself. The problem is to store all
as B+ tree.                                                     string keys in a separate location. S(b)-tree is presented
   Our structure has been developed for use as a value in-      as a data structure to support full-text index and it is good
dex backend in Sedna XML DBMS. Sedna XML DMBS                   enough for this task[12].
has native support for value indices on nodes and it has            The B-trie[1] is very similar to our approach. The ba-
the following requirements to the structures used for in-       sis of this work was the implementation of a prefix tree
dex management:                                                 that provides efficient usage of CPU cache [9]. Both
                                                                structures propose effective partition of the prefix tree
    1. Structure must provide the ability to store keys of
                                                                into blocks (buckets). But in our work we do not use
       unlimited length. For example, URI’s can be very
                                                                stored supplementary data structures and provide more
       large in practice.
                                                                effective strategy to keep the blocks filled.
    2. Structure must implement the concept of multimap             In addition, there are quite simple implementations of
       abstract data type. An index for an XML document         similar data structures, i.e. Index Fabric[14], which is a
       in Sedna may have duplicate (key, value) pairs.          B-tree with keys stored in a prefix tree in internal nodes.

2     Review of existing solutions                              3    Building the structure
                                                                In this paper we introduce a prefix tree based data struc-
Most common data structures for implementaing
                                                                ture which is optimized for reading and writing large
database index are B-tree[3] and its variants[4, 19, 13].
                                                                    1 Here the physical key is the key that is actually used in comparison
Proceedings of the Spring Researcher’s Colloquium on Database   and search. Logical key refers to a key that is passed to index system
and Information Systems, Moscow, Russia, 2012                   interface.
blocks of data. We call our structure the Block String           If this condition holds, the set K uniquely defines a tree
Trie or BST. Let the set of stored strings be denoted by         T.
K = {s1 , s2 , . . . , sn }. The basic operations for our
structure are defined as follows:                                Theorem. Any non-empty set of strings K defines one
                                                                 tree T and the one tree only for which the conditions 1–4
  1. Insert string s to the set K.                               holds.
  2. Find all the strings with the prefix s in the set K.        Proof. Let us assume that this statement does not hold
                                                                 and the set of strings K does not define the single tree T .
  3. Remove string s from the set K.                             This may happen in two cases: the set of strings K does
   Such structure implements a set abstract data type but        not define any tree at all, or the set K defines more than
not a map. Now we specify how exactly we store the               one tree. We are not going to consider the first case, as
(key,value) pairs in our data structure. Each pair can be        any non-empty set K defines at least one tree construc-
represented as s = k + c + v (we call it physical key),          tion procedure.
where k is a key (or logical key), v is string representa-          Let us assume that the set of strings K defines more
tion of a value, and c is a symbol that is absent in the         than one tree T that satisfies the conditions 1–4. Con-
alphabet that k is built of 2 . To find all (key,value) pairs,   sider two trees which corresponds to a set K. That means
that correspond to a given key k we look for all strings         that there is a string k ∈ K that has paths S(x11 , x1n ) and
prefixed with k + c. Thus the dictionary problem can be          S(x21 , x2m ) in trees T and T 0 respectively, such as:
solved using the introduced data structure.
                                                                       k = pref ix(x11 ) + c(e11 ) + pref ix(x12 ) +
3.1   Prefix tree                                                      +c(e12 ) + · · · + c(e1n−1 ) + pref ix(x1n ) =
Our data structure is a rooted tree T which stores a set K               = pref ix(x21 ) + c(e21 ) + pref ix(x22 ) +
of strings. The structure is as follows:                                 +c(e22 ) + · · · + c(e2m−1 ) + pref ix(x2m )
  1. Each vertex x of tree T has the following properties:          The fact that trees differ means either of the following:
       (a) A prefix pref ix(x) is a string which may have          1. In trees T and T 0 , there are vertices x1i and x2i , such
           zero length.                                               that pref ix(x1i ) 6= pref ix(x2i ).
      (b) An array E(x) of n(x) exiting edges ordered
          in lexicographic order of characters c(e) they           2. In trees T and T 0 , there are edges e1i and e2i such
          are labeled with.                                           that c(e1i ) 6= c(e2i ).
       (c) Auxiliary flags (will be described as they ap-          3. In trees T and T 0 , there are vertices x1i and x2i , such
           pear further in text).                                                                              1|2
                                                                      that one vertex has the flag f inal(xi ), while the
  2. Each edge e = (x, Li (x)). Li (x) is a node, pointed             other does not.
     by i-th edge from the array E(x). Each edge is la-             First two cases contradict the definition of our tree and
     beled with character c(e). In this case we consider         the third case is in conflict with the condition 4. That
     a character the string of the length of one.                means that trees T and T 0 are equal, hence there is only
  3. Any path S(x1 , xn ) = x1 e1 x2 e2 . . . en−1 xn in the     one tree, corresponding to the set K.
     tree is string s, obtained from the path S(x1 , xn )           We consider only minimal trees in further sections un-
     by concatenating its substrings s = pref ix(x1 ) +          less stated otherwise. However the condition (4) is not
     c(e1 ) + pref ix(x2 ) + c(e2 ) + · · · + c(en−1 ) +         required in the implementation of the structure and may
     pref ix(xn ). We also introduce the flag f inal(x),         be not held if lazy removal technique is implemented.
     that shows whether a node has a corresponding
     key from the set K. Thus the string s defined by            3.2    Block separation
     S(x1 , xn ) belongs to the set of strings K if and only
     if the vertex xn is labeled with flag f inal(xn ).          The structure described in the previous section is de-
                                                                 signed primarily for storage and retrieval of strings in
   The last property implies that a set K may have more          memory. But we require the structure, that can be stored
than one corresponding tree T . To build a tree T 0 that         in external memory and our algorithms should effectively
represents the same set as tree T we can add vertex x0           operate with reading and writing blocks of the fixed size.
with an arbitrary prefix and not marked as f inal(x0 ).              First of all we introduce a special type of vertex that
Both trees T and T 0 will represent the same set of strings      does not store the prefix and links to other vertices and
K. Thus it seems reasonable to introduce an additional           keeps only a reference to a node located in another block.
condition, which covers such situations:                         We call such vertices reference nodes and we mark them
                                                                 with the service flag external(x). It should be noted that
  4. Each vertex x ∈ T , is such that n(x) ≤ 1, is marked
                                                                 the nodes that are referenced by a reference vertices, can
     as f inal(x). A tree that satisfies that condition is
                                                                 not be reference nodes themselves.
     called minimal. Each vertex x that satisfies (doesn’t
                                                                     In addition we introduce the notion of twig.
     satisfy) the condition n(x) ≤ 1 ⇒ f inal(x) is
     called important (redundant).                                   A twig is a rooted tree B such that all leaf vertices are
                                                                 marked as either f inal(x) or external(x). If a tree T
  2 For text strings, we use null character.                     has no reference nodes it consists of a single twig.
                                                             3.3     Algorithms
                                                             3.3.1    Search
                                                             The search algorithm for the BST requires two input pa-
                                                             rameters: a pointer to r to the root node of the tree and
                                                             string k to be found. The algorithm BST −Search(r, k)
                                                             returns a node xn+1 which satisfies the following:

                                                              1. String s(xn ) defined by path S(r, xn ) is a prefix of
                                                                 the string k or equals it if exists.

                                                              2. The search string k is a prefix of a string s(xn+1 ) or
                                                                 equals it.

                                                             We define the relationship A ≤ B as follows: string A
                                                             is a prefix of B or equals it. Thus our requirements for
                                                             result may be expressed using the following inequality:
                                                             s(xn ) ≤ k ≤ s(xn+1 ).
         Figure 1: BST example without blocks                    The search procedure starts from the root of the tree.
                                                             The function takes a pointer x to the root node and the
                                                             search string k, an intermediate result is stored in the
   The concept of twig is a key for the problem of divid-    stack S. In addition there is also an auxiliary function
ing BST into blocks. Every tree T can be divided into        y = L(x, c) which returns node y which is pointed by
twigs by inserting a new reference node before any other     that edge of vertex x, which is marked with character c
node. Twigs                                                  (or NULL if there is no such an edge). This feature is im-
                                                             plemented using binary search algorithm and the edges
   Blocks consist of the constant amount bytes W 3 . The     are stored in sorted array in our implementation. We also
distribution of twigs into blocks is organized in such a     introduce the function Cut(p, s) which returns a string,
way that in one block one or more twigs from the initial     that is produced by removing the prefix p from a string
tree can be stored. All the twigs in one block have a        s, so that p + Cut(p, s) = s. The search algorithm can
common direct ancestor. These conditions are necessary       be written in compact form as follows:
to ensure that the locality of changes in the tree and for
block level locking. These conditions also imply that the
block containing the root node has no other twigs.              BST-Search(x, k, S)
                                                              1: Push(S, x)
                                                              2: if external(x) then
                                                              3:    Disk-Read(J(x))
                                                              4:    return BST-Search(J(x), k, S)
                                                              5: end if
                                                              6: if not Is-Prefix(pref ix(x), k)) then
                                                              7:    if Is-Prefix(k, pref ix(x)) then
                                                              8:       return x
                                                              9:    else
                                                             10:       return NIL
                                                             11:    end if
                                                             12: else
                                                             13:    s ← Cut(pref ix(x), k)
                                                             14:    if Empty(s) then
                                                             15:       return x
                                                             16:    else if L(x, s[1]) = NIL then
                                                             17:       return NIL
                                                             18:    else
                                                             19:       return BST-Search(L(x, s[1]), Cut(s[1], s), S)
                                                             20:    end if
                                                             21: end if


        Figure 2: BST block separation example                   The search function returns node x such that all the
                                                             paths from the root to all possible leaf vertices which
                                                             contain node x define the strings prefixed by a search
                                                             string. The function returns N U LL if there is no such a
                                                             node. We can find all strings prefixed by k traversing the
  3 In the Sedna DBMS the default block size is 64KB.        tree starting from node x.
3.3.2   Insertion                                               splitting procedure starts and after that the insert proce-
                                                                dure should be executed again. By mean of this approach
The insert operation begins with building the structure,
                                                                we will access not more than D+2 blocks (excluding the
that describes the modifications we are going to make in
                                                                possible block splitting).
our tree. If some block on the way of our modifications
does not enough space for modifications to be made, the
block is split (this process is discussed in the next sub-      3.3.3    Block splitting
section) and the insertion process is invoked again.            One of the distinguishing features of BST compared to
    The insert procedure starts with executing the BST −        B-trees is that the BST tree is not balanced. As we have
Search(k, r, S) (where k is the string to be inserted)          proven earlier1 T is uniquely determined by the set K
method, but we need only the path S as result. Also             of stored strings so we can not decrease the height of
we will need three additional strings derived from path         the tree. Nevertheless there are studies that show that
S and string k: common, rest and key. We construct              unbalanced tree affect the performance very little ([12]).
these strings in the following way. First of all, we need       The height of BST is the height of its twigs which must
the string s0 , that is the string that corresponds to S with   be read to find given prefix in the worst case.
the last vertex’s prefix removed. From the definition of            The procedure of new space allocation performs the
search procedure it is obvious that this string is a prefix     splitting of the block and is called only in case the block
of added string k or equals it. We also need the string k 0     doesn’t have enough space to insert a new vertex. We use
which is built from k by removing s0 from the beginning.        two different strategies for block splitting.
To build the string common common we take the great-                The first strategy makes the blocks grow in “width”
est common prefix of strings k 0 and p = pref ix(S[n]),         and doesn’t affect the height of the tree so it is more
where S[n] is the last vertex of the path S. In turn the        preferable. The algorithm handles the situation when
strings rest and key are p and k 0 with no prefix common        one block contains several twigs. In this case we divide
at the beginning respectively. The idea becomes clearer
if illustrated by the following scheme:
                                                                twigsP  contained in block P into two sets P1 and P2 such
                                                                that | w∈P1 w(B) − w∈P2 w(B)| is minimal over all
    We consider five cases in our insertion procedure:          possible partitions P1 and P2 . Thus these sets must sep-
                                                                arate twigs contained in the block to about half of their
  1. There are no nodes in the tree. In this case, we need      total size. One of these sets remains in its original block,
     to allocate new page for a new node.                       and a second set is moved to the new block. This al-
  2. Rest and key are empty strings. In this case, it           gorithm affects exactly three blocks — one block that
     is sufficient to mark the latest node of the path as       contains the parent node (this is a significant fact, which
     f inal (in case of a minimal tree it will already be       will play a role in calculating the total number of units
     marked as f inal).                                         affected by the separation.) and two blocks containing
                                                                the twigs (old and new).
  3. Key is an empty string, rest is non-empty. In this             The second strategy is used in case the block where
     case, we should split the final node of the path in        the vertex is added to has only one twig and there is still
     two parts. One of these parts contains the prefix          not enough space. In this case we are looking for the first
     common and is marked as f inal.                            node x starting from the root of the twig which has N >
                                                                1 links. This node has N child nodes that will be the
  4. Key is a non-empty string, rest is empty. In this          roots of N new twigs. Next, we move all nodes from the
     case, we need to add an extra node with the prefix         root to x to the a parent twig contained in another block4 .
     key that will be a child to the last node of the path.     Next, we split the remaining N resulting twigs using the
                                                                first split strategy. If the initial twig is the root of the tree,
  5. Key and rest are non-empty strings. In this case,          then the set of nodes is not transferred to the block that
     the node xn is split into three ones, with prefixes        contains an parent vertex node (since it does not exist)
     common, rest and key respectively, the last one is         but into the new block. This is the only case when the tree
     marked as f inal.                                          grows in height. This algorithm accesses exactly three
                                                                blocks; one of these blocks contains the ancestor twig
   The tricky part is related to splitting of a block on        and two blocks contain the newly created twigs.
steps 4 and 5. If the vertex which is being split is a leaf         So the only case left to handle is the one when the
node of a twig that has a descendant twig we create a           block contains only one twig, and the first algorithm is
new vertex with a prefix key in a new twig. We need to          applicable and at the same time in the block containing
find a child twig which lies in a block which has more          the branch of an ancestor there is not enough space to
free space left. In theory this is a very expensive op-         move the set of nodes of the twigs contained in the block
eration because it affects all the blocks that contain the      that we are trying to separate.
child twigs. The number of such blocks is not greater
                                                                    While looking for a key during the insertion process,
than the number of children twigs. But in practice such
                                                                we build a path to the vertex which is going to be split
an approach is unacceptable since in addition to the com-
                                                                during insertion. We collect some auxiliary data about
pactness of storage the number of blocks being accessed
                                                                blocks on our way: how many twigs do these blocks con-
should also be minimized. There is a compromise that al-
                                                                tain, how much space is occupied by the root nodes, how
lows to meet both requirements in practice: we limit the
                                                                much free space do these blocks have, etc. It is very im-
number of blocks accessed by some constant D (in our
implementation D = 2 ). If none of the D blocks have               4 Strictly speaking there is no guarantee that the block that contains
enough space for the new vertex to be added, the block          a parent twig has enough room to accept this set of nodes.
                            pref (x1 ) + c1 + · · · + pref (xn−1 )         cn−1            pref ix(xn )
                                                                                           z    }|    {
                       s    brownfoxjumpsoverala                             z         y             dog
                                                                                      |{z}           | {z }
                                                                                    common           rest
                                                                                      z}|{
                       k    brownfoxjumpsoverala                             z         y         gopher
                                                                                                 |    {z      }
                                                                                                     key

                                               Figure 3: Example for insert


portant to note that we do not evaluate this data every              > create index "urlbtree[1]"
time. It is stored in a block header and is updated only               on doc("dblp.xml")/dblp/*
when the block is being changed.                                       by ./url as xs:string
   With all this information collected we build the block              using "btree"&
splitting sequence starting from the block we need to in-
sert new vertex to. The number of accessed blocks to
2h + 1 in worst case where h is the height of the tree.                 For BST:
3.3.4    Deletion                                                    > create index "urlbstrie[1]"
                                                                       on doc("dblp.xml")/dblp/*
Following the example of most papers on B-trees, we in-
                                                                       by ./url as xs:string
troduce an operation of lazy removal[10]. This approach
                                                                       using "bstrie"&
is widely applied to the database systems because the
procedure for the deletion of the element from the B-
tree can be even trickier than the insert procedure. Lazy
removal allows to significantly accelerate updates on the               In our tests we measure space occupied by the same
database.                                                            indexes for both structures, the time needed for updates
    In the developed structure delete is simply removing             and search time. Indexes were created on publications
the flag f inal from the vertex, that corresponds to the             by their identifiers, URI and EE field. The total number
string we are looking for. Thus after the removal of the             of indexed publications is about 3 million. However we
vertex we have the redundant vertex and consequently a               show here only the space occupied by index, because the
non-minimal tree. This approach requires a procedure to              search time is almost the same for both types of trees.
minimize the tree (or a separate twig). We remove all                             Distinct     Key volume          BST      B-tree
redundant vertices, which in turn are of two types:                                values          Kb               Kb       Kb
 1. Vertices that are not marked as f inal and have no                 key       2 989 811       59 337           98 752   106 432
    outgoing edges. These nodes can be simply re-                      url       1 915 933       72 828           81 984   103 296
    moved.                                                              ee       1 667 275       74 975           53 440   101 248

 2. Vertices that are not marked as f inal and have ex-
    actly one outgoing edge. In this case it is enough to                  Table 1: Comparison between BST and B+ tree
    merge the nodes with the concatenation of pref ix+
                                                                       As one can see, BST can archive even significant data
    edgecharacter + pref ix.
                                                                     compression, while times of search and update are al-
   After this procedure is applied there are no redun-               most the same for both structures.
dant nodes remaining. It may also happen that some
twigs have no nodes or only a single external-node re-               3.5     Conclusion
mains. Such twigs must be removed and the child twigs                We have introduced a new trie structure that effectively
are propagaded. In some cases this may require splitting             solves the dictionary problem for strings of unlimited
of the block.                                                        length and operates with large blocks of data.
   Thus minimizing the procedure for one twig may ac-                   Structure is implemented and is ready to use within
cess 4 blocks in the worst case: 3 blocks by separation              Sedna XML DBMS[15]. The code is open source under
procedure of the original block and 1 block containing               Apache license and is available on the project site.
the descendant twig may be accessed.                                    Tests show that the structure is well suited for use with
                                                                     indexes on fields that contain URI or strings which may
3.4     Tests and results                                            contain the same prefixes. The BST structure also shows
Tests were performed within Sedna DBMS. All the                      good results in the general case.
tests were made on loaded DBLP Computer Science
Bibliography[11] using B+ tree structure and BST both
implemented in Sedna for handling indexes. Requests                  References
for the construction of the index are as follows:
                                                                      [1] Nikolas Askitis and Justin Zobel. B-tries for disk-
                                                                          based string management. VLDB J., 18(1):157–
   For B+ tree:                                                           179, 2009.
 [2] Ricardo A. Baeza-Yates. An adaptive overflow               [16] B. Srinivasan. An adaptive overflow technique to
     technique for b-trees. In François Bancilhon,                  defer splitting in b-trees. Comput. J., 34(5):397–
     Costantino Thanos, and Dennis Tsichritzis, editors,             405, 1991.
     EDBT, volume 416 of Lecture Notes in Computer
     Science, pages 16–28. Springer, 1990.                      [17] Wojciech Szpankowski. Patricia tries again revis-
                                                                     ited. J. ACM, 37(4):691–711, 1990.
 [3] Rudolf Bayer and Edward M. McCreight. Orga-
     nization and maintenance of large ordered indices.         [18] Ilya Taranov, Ivan Shcheklein, Alexander Kalinin,
     Acta Inf., 1:173–189, 1972.                                     Leonid Novak, Sergei Kuznetsov, Roman Pas-
                                                                     tukhov, Alexander Boldakov, Denis Turdakov,
 [4] Rudolf Bayer and Karl Unterauer. Prefix b-trees.                Konstantin Antipin, Andrey Fomichev, Peter Ple-
     ACM Trans. Database Syst., 2(1):11–26, 1977.                    shachkov, Pavel Velikhov, Nikolai Zavaritski,
                                                                     Maxim Grinev, Maria P. Grineva, and Dmitry Li-
 [5] Walter A. Burkhard. Hashing and trie algorithms                 zorkin. Sedna: native xml database management
     for partial match retrieval. ACM Trans. Database                system (internals overview). In Ahmed K. Elma-
     Syst., 1(2):175–187, 1976.                                      garmid and Divyakant Agrawal, editors, SIGMOD
 [6] Eugene Inseok Chong, Jagannathan Srinivasan,                    Conference, pages 1037–1046. ACM, 2010.
     Souripriya Das, Chuck Freiwald, Aravind Yala-
                                                                [19] Nikolaus Walczuch and Herbert Hoeger. Using in-
     manchi, Mahesh Jagannath, Anh-Tuan Tran,
                                                                     dividual prefixes in b+ -trees. Journal of Systems
     Ramkumar Krishnan, and Richard Jiang. A map-
                                                                     and Software, 47(1):45–51, 1999.
     ping mechanism to support bitmap index and other
     auxiliary structures on tables stored as primary b+ -
     trees. SIGMOD Record, 32(2):78–88, 2003.
 [7] Paolo Ferragina and Roberto Grossi. The string
     b-tree: A new data structure for string search in
     external memory and its applications. J. ACM,
     46(2):236–280, 1999.
 [8] Andrey Fomichev, Maxim Grinev, and Sergei D.
     Kuznetsov. Sedna: A native xml dbms. In Jirı́
     Wiedermann, Gerard Tel, Jaroslav Pokorný, Mária
     Bieliková, and Julius Stuller, editors, SOFSEM,
     volume 3831 of Lecture Notes in Computer Sci-
     ence, pages 272–281. Springer, 2006.
 [9] Steffen Heinz, Justin Zobel, and Hugh E. Williams.
     Burst tries: a fast, efficient data structure for string
     keys. ACM Trans. Inf. Syst., 20(2):192–223, 2002.
[10] Theodore Johnson and Dennis Shasha. B-trees with
     inserts and deletes: Why free-at-empty is better
     than merge-at-half. J. Comput. Syst. Sci., 47(1):45–
     76, 1993.
[11] Michael Ley. Die trierer informatik-bibliographie
     dblp. In GI Jahrestagung, pages 257–266, 1997.
[12] Joong Chae Na and Kunsoo Park. Simple im-
     plementation of string b-trees. In Alberto Apos-
     tolico and Massimo Melucci, editors, SPIRE, vol-
     ume 3246 of Lecture Notes in Computer Science,
     pages 214–215. Springer, 2004.
[13] Ratko Orlandic and Hosam M. Mahmoud. Storage
     overhead of o-trees, b-trees and prefix b-trees: A
     comparative analysis. Int. J. Found. Comput. Sci.,
     7(3):209–226, 1996.
[14] Neal Sample, Brian F. Cooper, Michael J. Franklin,
     Gı́sli R. Hjaltason, Moshe Shadmon, and Levy
     Cohe. Managing complex and varied data with the
     indexfabric(tm). In ICDE, pages 492–493. IEEE
     Computer Society, 2002.
[15] Sedna XML DBMS source code downloads, 2012.
     http://sedna.org/download.html.