<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On Support of Ordering in Multidimensional Data Structures?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Filip Kˇriˇzka</string-name>
          <email>filip.krizka@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michal Kr´atky´</string-name>
          <email>michal.kratky@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Radim Baˇca</string-name>
          <email>radim.baca@vsb.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science</institution>
          ,
          <addr-line>VS</addr-line>
        </aff>
      </contrib-group>
      <fpage>165</fpage>
      <lpage>179</lpage>
      <abstract>
        <p>Multidimensional data structures are applied in many areas, e.g. in data mining, indexing multimedia data and text documents, and so on. There are some applications where the range query result must be ordered. A typical case is the result with tuples sorted according to values in one dimension defined by the ORDER BY clause of an SQL statement. If we use a multidimensional data structure, the result set is sorted after the range query is processed. Since the sort operation must often be processed on tuples stored in the secondary storage, an external sorting algorithm must be utilized. Therefore, this operation is time consuming especially for a large result set. In this paper, we introduce a new data structure, a variant of the R-tree, supporting a storage of ordered tuples.</p>
      </abstract>
      <kwd-group>
        <kwd>multidimensional data structures</kwd>
        <kwd>ordered tuples of the range query result</kwd>
        <kwd>R-tree</kwd>
        <kwd>Ordered R-tree</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Multidimensional data structures [
        <xref ref-type="bibr" rid="ref25 ref30">25, 30</xref>
        ] have been widely applied in many data
management areas. Indexing spatial data is their natural application but there
are many applications in different domain areas. In the case of spatial data,
structures often store two- and three- dimensional objects. In the case of multimedia
data, there are spaces with dimension values of up to 100,000 to be indexed.
There have been many data structures introduced in the past, e.g. R-tree [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
R∗-tree [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], UB-tree [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], X-tree [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and BUB-tree [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. These data structures have
been applied in many areas; the well-known R-tree is included in Oracle Spatial1
– an Oracle option for indexing spatial data. UB-tree [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is applied for an
implementation of more indices related to one table in the relational data model [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
In addition, multidimensional data structures are applied in a wide range of
applications, e.g. indexing XML data [
        <xref ref-type="bibr" rid="ref14 ref15 ref9">9, 15, 14</xref>
        ], terms [
        <xref ref-type="bibr" rid="ref16 ref6">16, 6</xref>
        ], etc.
      </p>
      <p>
        In the case of the R-tree, tuples are clustered in a tree’s page when MBRs
(Minimal Bounding Rectangles) are built. It supports various types of queries,
e.g. point and range queries. As mentioned previously, multidimensional data
structures may be applied as index data structures supporting the storage of
tuples in relational or object-relational DBMS [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. If B-tree or B+-tree are applied
for the support, we must create one tree for each attribute to be indexed. Another
alternative is to use the so-called compound-key which includes more attributes.
In this case, some queries are processed by an often inefficient sequential scan.
When we consider multidimensional data structures, one index is created for
all attributes to be indexed. In this case, random accesses may influence the
efficiency of query processing [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. We must observe that the efficiency of range
query processing in the B-tree is also influenced by this issue.
      </p>
      <p>
        There are some applications where ordering of tuples in the result set is
required. This order may be defined by ordering of values of one attribute. A
common example is an SQL statement with the ORDER BY clause. In this case, we
often define a restriction of attribute values, which means the range query can
be applied for the processing of this query; however, we often require an ordering
defined by the ORDER BY clause. If we consider multidimensional data structures,
there is no support for the ordering; the result tuples must be sorted after the
query is processed. Since the number of sort operations and the number of
tuples in the result may be high in the case of DBMS, we must utilize an external
sorting algorithm [
        <xref ref-type="bibr" rid="ref13 ref29">13, 29</xref>
        ]. If large result sets are sorted, this operation is time
consuming. When we consider a data structure without a support of ordering,
we can not utilize lazy query processing. In the case of lazy query processing, a
query is progressively processed and we can not store all tuples of the result in an
additional data structure. When we consider multidimensional data structures
without a support of ordering, this lazy evaluation is not possible; the query
must be processed in one and result tuples must be inserted in a persistent data
structure and sorted [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        In this article, we introduce a multidimensional data structure with a support
of ordering. A cost-based relational query optimizer [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] can simply utilize this
data structure and decide when it is appropriate to utilize this data structure
instead of common query processing and sorting of the result set. Since the
Rtree is the most popular data structure we present the support of ordering for
the R-tree.
      </p>
      <p>
        Although it can be seen that there is a relation between multidimensional
data structures and ordering, we only find data structures which enable the
indexing of multidimensional data using an order. Such data structures include
UB-tree [
        <xref ref-type="bibr" rid="ref2 ref22">2, 22</xref>
        ] and BUB-tree [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] which utilize Z-ordering for indexing of
multidimensional data. On the other hand, if we apply B-tree or B+-tree as the solution
of this issue, a range query will be processed by an often inefficient sequential
scan.
      </p>
      <p>This paper is organized as follows: In Section 2, we present R-tree and its
variants. We describe the motivation to the support of ordering in Section 3
in more detail. Section 4 includes some notices to ordering of the range query
result. In Section 5, we describe our new variant of the R-tree and show the
basic principles of this data structure. Experimental results are put forward
in Section 6. In the last section we summarize the paper content and outline
possibilities of our future work.</p>
    </sec>
    <sec id="sec-2">
      <title>R-tree and its Variants</title>
      <p>
        R-tree [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] can be thought of as an extension of B+-trees in a multi-dimensional
space. It corresponds to a hierarchy of nested n-dimensional MBRs. If N is an
inner page, it contains couples of the form (Ri, Pi), where Pi is a pointer to a
child of the page N . If R is its MBR, then the rectangles Ri corresponding to
the children Ni of N are contained in R. Rectangles at the same tree level may
overlap. In this article, we consider only tuples stored in leaf nodes due to the
fact that it corresponds to our application domain. Unless a page of the R-tree
is the root, its number of entries is between m and M and it corresponds to a
disk page. Other properties of the R-tree include the following:
– Whenever the number of a page’s children drops below m, the page is deleted
and its descendants are distributed among the sibling pages. The upper
bound M depends on the size of the disk page.
– The root page has at least two entries, unless it is a leaf.
– The R-tree is height-balanced; i.e. all leaves are at the same level. The height
of an R-tree is at most blogm N c − 1 for N index records (N &gt; 1).
      </p>
      <p>
        Given a set of M +1 entries, each entry is assigned to one of the two produced
pages, according to the criterion of minimum area, i.e. the selected page is the one
that will be enlarged the least in order to include the new entry. It significantly
affects the index performance. Three split techniques (Linear, Quadratic, and
Exponential ) proposed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] are based on a heuristic optimization.
      </p>
      <p>R-tree performance is usually measured with respect to the retrieval cost (in
terms of DAC) of queries. The majority of performance studies concerns point,
range, and k-NN queries. Considering the R-tree performance, the concepts of
page coverage and overlap between pages are important. Obviously, an efficient
R-tree search requires that both the overlap and coverage are minimized.
Minimal coverage reduces the amount of dead area covered by R-tree pages. The
minimal overlap is even more critical than the minimal coverage, searching to
objects falling in the area of k overlapping pages; up to k paths to the leaf pages,
may have to be processed in such a way.</p>
      <p>Variants of R-trees differ in the way they perform the split algorithm during
insertions, i.e. which minimization criteria are used. Literature has identified a
variety of criteria for the layout of keys on pages that affect retrieval performance.</p>
      <sec id="sec-2-1">
        <title>These criteria are: minimal page area, minimal overlap between pages, minimal</title>
        <p>page margins or maximized page utilization. It is impossible to optimize all of
these parameters simultaneously.</p>
        <p>
          Authors of [
          <xref ref-type="bibr" rid="ref21 ref25">21, 25</xref>
          ] put forward many variants of the R-tree. The main feature
of R∗-trees [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] involves the page-splitting policy. Therefore, the R∗-tree differs
from the R-trees mainly in the insertion algorithm. Although original R-tree
algorithms tried only to minimize the area covered by MBRs, the R∗-tree algorithms
also take other objectives into account. Clipping-based schemes do not allow
any overlaps between bucket regions; they have to be mutually disjointed. A
typical access method of this kind is the R+-tree [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] which allows no overlap
between regions corresponding to pages at the same tree level and an object can
be stored in more than one leaf page.
        </p>
        <p>
          Other approaches to an improvement of original R-trees release some of
their basic features. For example, the MBRs have been replaced by minimum
bounding spheres or polygons. In [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], R+-trees are extended to support k-NN
queries. In [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], authors introduce a variant for efficient processing of narrow
range queries. In our best knowledge, R-tree variant supporting the ordered
multidimensional data or multidimensional data structure supporting a general
ordering do not exist in the literature.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Motivation</title>
      <p>Although it can be seen that multidimensional indexing and ordering are in
contradiction, there are real applications where we need to sort the range query
result. Let us consider an implementation of indices in an RDBMS.
Multidimensional data structures can be utilized for the implementation instead of more
B-tree or B+-tree indices. Then an SQL statement is processed by a range query.
Users often applied the ORDER BY clause which sorts a result set according to
values of one or more attributes. In other words, we define an order on tuples of
the result set.</p>
      <p>
        A common way to evaluate these queries is to process the range query in a
multidimensional data structure and sort the result set of the range query. Since
the number of sort operations and tuples in the result can be high, we must utilize
an external sorting algorithm [
        <xref ref-type="bibr" rid="ref13 ref29">13, 29</xref>
        ]. In the case of large result sets the sorting
is the time consuming operation. In this article, we introduce a multidimensional
data structure supporting an order on multidimensional tuples. Since the R-tree
is the most popular data structure, we present the support of ordering for the
R-tree; we introduce the Ordered R-tree.
      </p>
      <p>We assume that there are applications where we need always (or very often)
use the ordering of a result set according the same attributes. We can apply
B-tree or B+-tree indices for the support, but we must create one tree for each
attribute to be indexed. Another alternative is to use so called compound-key
which includes more attributes. In these cases some queries are processed by an
often inefficient sequential scan. For example, let us consider the compound key
{A1, A2, A3} of a table. The following query:</p>
      <sec id="sec-3-1">
        <title>SELECT * FROM TABLE_NAME</title>
      </sec>
      <sec id="sec-3-2">
        <title>WHERE A1 BETWEEN VAL_A1_LOW AND VAL_A1_HI</title>
      </sec>
      <sec id="sec-3-3">
        <title>ORDER BY A1,A2,A3;</title>
        <p>is processed by a range query and no irrelevant tuples are retrieved. However,
the following query:</p>
      </sec>
      <sec id="sec-3-4">
        <title>SELECT * FROM TABLE_NAME</title>
      </sec>
      <sec id="sec-3-5">
        <title>WHERE A3 BETWEEN VAL_A3_LOW AND VAL_A3_LOW</title>
      </sec>
      <sec id="sec-3-6">
        <title>ORDER BY A1,A2,A3;</title>
        <p>must be processed by an efficient sequential scan of the whole table. Another
example of the SQL query which retrieves the ordered result is as follows:</p>
      </sec>
      <sec id="sec-3-7">
        <title>SELECT * FROM V_DAY_N</title>
      </sec>
      <sec id="sec-3-8">
        <title>WHERE YEAR BETWEEN 2000 AND 2010 AND MONTH BETWEEN 2 and 4</title>
      </sec>
      <sec id="sec-3-9">
        <title>ORDER BY STATION, ELEMENT, YEAR, MONTH, DAY, TIME;</title>
        <p>This statement is an example of the query over the climatological database
Clidata2. In this case, meteorological stations product data of measured
elements for a period. Data has been measured in the Czech Republic from 1775.
Elements include: Temperature, Pressure, Wind speed, Wind direction,
Precipitation and so on. It means, the main attributes of this table are as
follows: STATION, ELEMENT, DAY, MONTH, YEAR, TIME, VALUE. We need always
or very often select tuples of the table in the same order: {STATION, ELEMENT,
YEAR, MONTH, DAY, TIME}. This real application represents a motivation to
introduce the Ordered R-tree.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Range Query Processing in R-tree</title>
      <p>
        A range query is processed by a depth-first search algorithm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. If the query
rectangle intersects an MBR then the node related to the MBR is retrieved and
searched. In general, there is no order; therefore, tuples of a result set are sorted
in the same order in which these tuples were inserted into leaf nodes and new
MBRs were inserted into inner nodes. In the case of bulk-load algorithms, tuples
are often sorted under an order and MBRs are created for these tuples [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. We
will show that the ordering of tuples in leaf nodes does not guarantee the ordered
result set.
      </p>
      <p>Example 1. (Range Query Processing) Let us consider two MBRs in Figure 1.
A depth-first search range query algorithm retrieves tuples of M BR10 (T1 and
T2) and then it retrieves tuples of M BR20 (T3 and T4); consequently, the result
set includes: T1, T2, T3, T4.</p>
      <p>When we consider the previous depicted motivation, the ORDER BY clause,
we want to retrieve tuples under the well-known Lexicographical order of tuples.
This order is often utilized in the case of term ordering, however the ORDER BY
clause applied on a set of ordered attributes works in the same way.</p>
      <sec id="sec-4-1">
        <title>Definition 1. Lexicographical Order</title>
        <p>Let Ω be an n-dimensional space and t1 = (a1,1, a1,2, . . . , a1,n) and t2 =
(b2,1, b2,2, . . . , b2,n) be two tuples of this space. The Lexicographical order is
defined: t1 &lt; t2 ⇐⇒ (∃m &gt; 0)(∀i &lt; m)(a1,i = b2,i) ∧ (a1,m &lt; b2,m).
Example 2. Let us consider the space in Figure 1 and the Lexicographical order
which prefers the attribute A1 before the attribute A2. The tuples are sorted
under this order as is follows: T1, T3, T2, T4.
0 1 2 3 4 5 6 7</p>
        <p>●T1
MBR11</p>
        <p>●T2</p>
        <p>
          Let us consider space filling curves like C-curve, Z-curve or Hilbert curve [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]
depicted in Figure 2. It is clear that the Lexicographical order is equivalent to
C-ordering. One class of multidimensional data structures utilizes space filling
curves. UB-tree [
          <xref ref-type="bibr" rid="ref2 ref22">2, 22</xref>
          ] and BUB-tree [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] utilize Z-curve to order tuples, regions
included in inner nodes are defined as intervals of the Z-curve. Another way is
to use a space filling curve for clustering of tuples in a multidimensional space;
however, regions are created according to a known data structure like R-tree.
Examples of this way are Hilbert R-tree [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] or some bulk-loaded algorithms [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
There are other data structures utilizing ordering to index multidimensional
data [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] or applications which utilize ordering to multidimensional data
processing, e.g. in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] authors introduced a clustering methods based on an order.
        </p>
        <p>
          There are two ways how to develop a data structure supporting an order. The
first way is to create regions as intervals of the order used. The main problem of
2 http://portal.chmi.cz
this way is that it is necessary to develop an algorithm checking if the MBR and
the regions are intersected. This algorithm is then used by the range query
algorithm. In [
          <xref ref-type="bibr" rid="ref22 ref28">22, 28</xref>
          ] authors introduced this algorithm of UB-tree and Z-ordering.
The similar algorithm must be implemented for each order used. The second way
is to use a known data structure and change the build algorithm to guarantee
the ordered result set. Since the R-tree is the most popular data structure we
present the support of ordering for the R-tree.
5
5.1
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Ordered R-tree</title>
      <p>Introduction
Let us consider a total order O on a set of all tuples in a multidimensional space
Ω = D1 × D2 × . . . × Dn, where n is the space dimension. An MBR is defined
by two tuples QL and QH . An MBR includes other MBRs in the case of inner
nodes or tuples in the case of leaf nodes.</p>
      <sec id="sec-5-1">
        <title>Definition 2. (MBR Order Condition)</title>
        <p>Let us consider a total order O on all tuples of Ω. Let M BR1 and M BR2 be
minimal bounding rectangles in the same level of the tree. MBR Order condition
is defined as follows: M BR1 ≤ M BR2 iff each tuple of M BR1 is lower or equal
to all tuples of M BR2.</p>
        <p>It is clear that if all couples of MBR’s in each level of the tree meet the MBR
Order condition, the result set of each range query is ordered under O. The first
step to guarantee the order on tuples is to order tuples in each leaf node. The
second step is to sort MBRs in each inner node. Since the R-tree is built by a
hierarchy of MBR’s which create different space regions compared to the regions
created by the order, we can not order MBR’s according to their QL. In the next
example, we show why we can not utilize QL to the guarantee of the order on
tuples in the R-tree.</p>
        <p>Example 3. (Violation of the MBR Order Condition in the R-tree)</p>
        <p>Let us suppose the M BRR in Figure 3(a) including tuples T1–T4. We use
the order which prefers values of the first dimension before values of the second
dimension. In this case, it is possible to split this node so that QLMBR1 ≤
QLMBR2 and the MBR Order condition is met. A range query containing both
MBRs returns the following correct result T1,T2, T3, T4, where Ti ≤ Ti+1, 1 ≤
i ≤ 3.</p>
        <p>After the tuple T5 is inserted, M BR1 and M BR2 must be created to
guarantee the order (see Figure 3(b)). In this case, QLMBR1 = (2, 3), QLMBR2 = (2, 0),
QLMBR1 ≥ QLMBR2 . A range query containing both MBRs, e.g. the range
query represented by this statement: SELECT * FROM Ω WHERE A1 BETWEEN (2,4)
and A2 BETWEEN(0,7) ORDER BY (A1,A2), where Ai is the attribute
corresponding to Di, 1 ≤ i ≤ 2, returns the following unordered result T3, T5, T4, T1, T2
although tuples in both MBR’s are ordered. It means, we can not utilize QL to
the guarantee of the order on tuples in the R-tree.</p>
        <p>0
1
2
3
4
5
6
7</p>
        <p>T1 T2
MBR1
(a)
7</p>
        <p>T3
MBR2 T4
MBRR
0
1
2
3
5
6
7
4 ●T5</p>
        <p>MBR1
T1 T2
(b)</p>
        <p>MBR2=MBRR
7
To guarantee the MBR Order condition we must introduce the first tuple
concept.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Definition 3. (The first tuple of an MBR)</title>
        <p>Let O be an order on tuples of Ω and MBR be a minimal bounding rectangle.
The first tuple of the MBR, F T MBR, is defined as follows:
F T MBR = the minimal tuple of the MBR if the MBR is the leaf node MBR
the minimal first tuple of all if the MBR is the inner node MBR
child MBRs</p>
      </sec>
      <sec id="sec-5-3">
        <title>Theorem 1. (Ordered Result Set of the Range Query)</title>
      </sec>
      <sec id="sec-5-4">
        <title>If MBRs in each inner node of an R-tree are sorted according to their FT and the MBR condition is met for all neighbor MBR’s in each level of the Rtree, then the result set of each range query processed by a depth-first technique is ordered.</title>
        <p>Proof. Let us suppose an arbitrary M BRi of a page N including m tuples. If we
choose an M BRj where i &lt; j ≤ m then F T MBRi &lt; F T MBRj since MBRs are
sorted. Since the MBR Order condition is met, all tuples of M BRj &gt; F T MBRi .
It means that a range query algorithm utilizing the depth-first search technique
retrieves ordered tuples.</p>
        <p>Example 4. (First Tuple)</p>
        <p>Let us suppose M BRR in Figure 3(b), F T MBR1 = T1, F T MBR2 = T3. If we
sort M BR1 and M BR2 in M BRR then we get the ordered set: {M BR1, M BR2}
since F T MBR1 &lt; F T MBR2 . A range query containing both MBRs returns the
following ordered result set: T1, T2, T3, T4, T5.</p>
        <p>Since MBR of each node is stored in its parent node, we must handle first
tuple of each node, except the root node, in an array to guarantee the tuple
order.
5.3</p>
        <p>Operations of the Ordered R-tree
The main operation influenced by the Fist Tuple concept is the insert operation.
In Algorithm 1 we see an algorithm of the insert operation including the first
tuple management. For the sake of simplicity we do not differentiate between
inner and leaf nodes as well as between inner and leaf items, therefore, we use
overloading functions Insert and Split as well as overloading items and nodes.
In Lines 1–6, the proper leaf node is found by an iteration through inner nodes,
each inner node is put on the stack. In Lines 7–23, item is inserted in the node
(Line 12) and indices are modified. The variable flagInsert is true if the Insert
and Split operations are invocated in the current level of the tree. In this way,
the changes of MBR are propagated to the root node.</p>
        <p>On the other hand, the variable flagFirstTuple controls the propagation of
FT to the root node. The procedure InsertOrUpdateFirstTuple works in this
way: if the inserted tuple is the first tuple of the node, then insert or update this
first tuple. In the procedure Insert, tuple (or MBR) is inserted in the leaf node
(inner node). Items are sort-inserted in the both cases; in the case of inner nodes
first tuples must be utilized. Obviously, the split operation utilizes the ordering
when a node is split; the second node includes tuples &gt; all tuples of the first
node.</p>
        <p>The find and range query operations are without any change. In the case of
the delete operation, we must manage the update of first tuples. Consequently,
find and delete operations work in O(log M ), where M is the number of tuples
in the tree.
5.4</p>
        <p>First Tuple Storage
To manage first tuples for all MBR’s we can use a persistent array including
all first tuples. This array contains couples (node id, first tuple), where node
id references the tree node related to the first tuple. In Figure 4, we see an
example of the proposed data structure. Since the size of the array is equal to
the number of all nodes in the R-tree, we can often manage this data structure
in the main memory; pages of this data structure are not retrieved from the
secondary storage. For example, if the R-tree includes 106 tuples, the maximum
number of items in leaf nodes is 200 and the maximum number of items in inner
nodes is 100. The array includes approximately 12,000 items. Let us consider the
space dimension=5, 4 B for one attribute value and node id, we get the total size
of the array: 288,000 B = 281,25 kB. When we consider the delete operation, we
can utilize hashing or B-tree for the implementation of the array.
Algorithm 1: Insert Operation of the Ordered R-tree</p>
        <p>procedure: Insert (tuple)
1 // go down;
2 Node node ← rootNode ;
3 while node is an inner node do
4 nodeStack.Push (node);
5 node ← node.GetProperChild();
6 end
7 // go up;
8 int flagInsert ← CONST INSERT ;
9 bool flagFirstTuple ← false;
10 while node!= null do
11 if flagInsert = CONST INSERT then
12 flagInsert ← Insert (node, item, newNode);
13 flagFirstTuple ← InsertOrUpdateFirstTuple (node.Id, item);
14
15
16
17
18 end
19
20
21
22
23 end
24 node ← nodeStack.Pop();
25 end
end
else
if flagFirstTuple = true then</p>
        <p>flagFirstTuple ← InsertOrUpdateFirstTuple (node.Id, item);
end
if flagInsert = CONST SPLIT then
flagInsert ← Split (node, item, newNode);
flagFirstTuple ← InsertOrUpdateFirstTuple (newNode.Id, item);
function : InsertOrUpdateFirstTuple (nodeId, tuple)
26 bool flagFirstTuple ← false;
27 if Does a first tuple exist in the MBB? then
28 if Is tuple the new first tuple? then
29 flagFirstTuple ← UpdateFT (nodeId, tuple);
30 end
31 end
32 else
33 flagFirstTuple ←InsertFT (nodeId, tuple);
34 end
35 return flagFirstTuple ;
Inner
nodes
Leaf
nodes</p>
        <p>Node 1</p>
        <p>MBR1 MBR5
Node 2</p>
        <p>Node 5
MBR2 MBR3 ...</p>
        <p>MBR6 MBR4 ...</p>
        <p>Node 3</p>
        <p>Node 4</p>
        <p>Node 6</p>
        <p>Node 7
P1 P9 P6</p>
        <p>P2 P11 P8</p>
        <p>P5 P4 P3</p>
        <p>P10 P7</p>
        <p>First Tuple Array
In our experiments3, we used the proposed real collection of meteorological
data including 1,391,049 records. The scheme contains 6 attributes: STATION,
ELEMENT, YEAR, MONTH, DAY, TIME. It means the space dimension is 6. The
number of stations is 10; therefore, the attribute STATION has 10 possible values. The
number of elements is 2. Other attributes include the time of the measurement
for the station and element. All data structures have been implemented in C++.
We built 2 data structures: R∗-tree and Ordered R-tree. Their parameters are
shown in Table 1. The page size 2,048 B is used for all trees.
3 The experiments were executed on an Intelr Core 2 Duo 2.4 Ghz, 512 kB L2 cache;
3 GB RAM; Windows 7.</p>
        <p>In our experiments, we used 30 range queries with various selectivities; the
result set includes 0–548,725 tuples. In Table 2, we show some queries in more
detail. Terms min and max denote the minimum and maximum values of the
domain, respectively.
Query Range query defined by Ql:Qh</p>
        <p>Q1 (4, 2, 1960, min, min, min) : (4, 2, 1980, max, max, max)
Q5 (min, 2, 2000, 2, 3, min):(max, 2, 2000, 2, 3, max)
Q8 (3, min, min, 10, min, min):(8, max, max, 10, max, max)
Q10 (6, min, 1980, 2, min, min):(8, max, 2005, 5, max, max)
Q11 (5, min, min, min, min, min):(5, max, max, max, max, max)
Q17 (5, min, min, min, 2, min):(5, max, max, max, 2, max)
Q18 (min, 1, min, min, min, 900):(max, 1, max, max, max, 1200)
Q22 (9, min, min, min, min, 900):(10, max, max, max, max, 1500)
Q23 (3, 2, 1990, min, min, min):(5, 2, 1995, max, max, max)
Q27 (min, min, 1980, 7, 5, min):(max, max, 2100, 7, 5, max)</p>
        <p>
          As usual, tests are processed with a cold cache (OS cache as well as cache
buffer of indices). For all tests we measure query processing time and Disk Access
Cost (DAC). Since in the case of R∗-tree the range query’s result set must be
sorted, we measure the sorting time and DAC of the sort operation. For the time
measurement, we repeat the tests 10× and calculate the average time. Since the
range query is processed by random disk accesses, DAC is equal to the number
of disk accesses during query processing [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. We used Merge sort [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] as the
algorithm sorting the result set.
        </p>
        <p>Although we show results of 10 queries in more detail, average results are
calculated for all 30 queries. DAC results are shown in Table 3 and Figure 5(a).
Whereas Table 3 includes DAC of the range query as well as sort operation,
Figure 5(a) includes only DAC of the range query. Let us consider DAC of range
query processing. We see that the DAC for the Ordered R-tree is 125% of DAC
for the R∗-tree. It is because the order influences the tree building. Although
DAC for the range query and sort operation are not comparable due to the fact
that it is simply possible to reduce the number of disk accesses using a cache
buffer in the case of the sort operation, we see that DAC of the sort operation
is enormous especially for queries with a large result set.</p>
        <p>DAC of range query processing</p>
        <p>Query Processing time for tested range queries
R-tree Ordered R-tree</p>
        <p>R-tree
Query processing time is shown in Table 3 and Figure 5(b). Moreover, Table 3
includes the time of the sort operation. We see that this time influences the query
processing time especially for range queries with the large result set. We see that
our Orderer R-tree saves 35% of the query processing time compared to the
R∗tree.
7</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this article, we introduced a new data structure, a variant of the R-tree,
supporting a storage of ordered tuples. In this way, the range query result is ordered
without an application of any time consuming sort operation. Our experiments
show an improvement of this data structure in comparison with the R-tree.
Orderer R-tree saves 35% of the query processing time compared to the R∗-tree.
The new data structure is particularly efficient in the case of large result sets or
in the case of lazy query processing when tuples are not retrieved immediately.
In the case of the R-tree, range query must be processed completely and then
the result set must be sorted. In our future work, we want to adopt this data
structure for indexing XML data where the lazy query processing of ordered
tuples can be applied.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Muchnik Baiyang Liu</surname>
          </string-name>
          , Casimir Kulikowski.
          <article-title>Global Ordering For MultiDimensional Data: Comparison with K-means Clustering</article-title>
          .
          <source>DIMACS Technical Report 2009-13</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Rudolf</given-names>
            <surname>Bayer</surname>
          </string-name>
          .
          <article-title>The Universal B-Tree for multidimensional indexing: General Concepts</article-title>
          .
          <source>In Proceedings of World-Wide Computing and Its Applications (WWCA</source>
          <year>1997</year>
          ), Tsukuba, Japan, LNCS, pages
          <fpage>198</fpage>
          -
          <lpage>209</lpage>
          . Springer-Verlag,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Norbert</given-names>
            <surname>Beckmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>Hans-Peter Kriegel</surname>
            ,
            <given-names>Ralf</given-names>
          </string-name>
          <string-name>
            <surname>Schneider</surname>
            , and
            <given-names>Bernhard</given-names>
          </string-name>
          <string-name>
            <surname>Seeger. The R</surname>
          </string-name>
          ∗
          <article-title>-tree: An efficient and robust access method for points and rectangles</article-title>
          .
          <source>In Proceedings of the ACM International Conference on Management of Data (SIGMOD</source>
          <year>1990</year>
          ),
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Belussi</surname>
          </string-name>
          , E. Bertino, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Cataniac</surname>
          </string-name>
          .
          <article-title>Using Spatial Data Access Structures for Filtering Nearest Neighbor Queries</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          ,
          <volume>40</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Stefan</surname>
            <given-names>Berchtold</given-names>
          </string-name>
          , Daniel A. Keim, and
          <string-name>
            <surname>Hans-Peter Kriegel</surname>
          </string-name>
          .
          <article-title>The X-tree : An Index Structure for High-Dimensional Data</article-title>
          .
          <source>In Proceedings of 22th International Conference on Very Large Data Bases (VLDB</source>
          <year>1996</year>
          ), pages
          <fpage>28</fpage>
          -
          <lpage>39</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Vlastislav</given-names>
            <surname>Dohnal</surname>
          </string-name>
          , Claudio Gennaro, and
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Zezula</surname>
          </string-name>
          .
          <article-title>A Metric Index for Approximate Text Management</article-title>
          .
          <source>In Proceedings of IASTED International Conference Information Systems and Database (ISDB</source>
          <year>2002</year>
          ), pages
          <fpage>37</fpage>
          -
          <lpage>42</lpage>
          . ACTA Press,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Robert</given-names>
            <surname>Fenk</surname>
          </string-name>
          .
          <article-title>The BUB-Tree</article-title>
          .
          <source>In Proceedings of 28rd VLDB International Conference on Very Large Data Bases (VLDB</source>
          <year>2002</year>
          ), Hongkong, China. Morgan Kaufmann,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.D.</given-names>
            <surname>Ullman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <source>Database Systems: The Complete Book. Prentice Hall</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Torsten</given-names>
            <surname>Gru</surname>
          </string-name>
          <article-title>¨st. Accelerating XPath Location Steps</article-title>
          .
          <source>In Proceedings of ACM International Conference on Management of Data (SIGMOD</source>
          <year>2002</year>
          ), pages
          <fpage>109</fpage>
          -
          <lpage>120</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Antonin</given-names>
            <surname>Guttman. R-Trees</surname>
          </string-name>
          :
          <article-title>A Dynamic Index Structure for Spatial Searching</article-title>
          .
          <source>In Proceedings of ACM International Conference on Management of Data (SIGMOD</source>
          <year>1984</year>
          ), pages
          <fpage>47</fpage>
          -
          <lpage>57</lpage>
          . ACM Press,
          <year>June 1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Ibrahim</given-names>
            <surname>Kamel</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christos</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          .
          <article-title>On packing R-trees</article-title>
          .
          <source>In Proceedings of the 2nd International Conference on Information and Knowledge Management (CIKM</source>
          <year>1993</year>
          ), pages
          <fpage>490</fpage>
          -
          <lpage>499</lpage>
          . ACM,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Ibrahim</given-names>
            <surname>Kamel</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christos</given-names>
            <surname>Faloutsos. Hilbert</surname>
          </string-name>
          R-tree:
          <article-title>An Improved R-tree using Fractals</article-title>
          .
          <source>In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB</source>
          <year>1994</year>
          ), pages
          <fpage>500</fpage>
          -
          <lpage>509</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Donald</given-names>
            <surname>Knuth</surname>
          </string-name>
          .
          <source>The Art of Computer Programming</source>
          , Volume
          <volume>3</volume>
          : Sorting and Searching,
          <string-name>
            <given-names>Second</given-names>
            <surname>Edition</surname>
          </string-name>
          . Addison-Wesley,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. Michal Kr´atky´,
          <string-name>
            <surname>Radim</surname>
            <given-names>Baˇca</given-names>
          </string-name>
          , and V´aclav Sn´aˇsel.
          <source>On the Efficient Processing Regular Path Expressions of an Enormous Volume of XML Data. In Proceedings of DEXA</source>
          <year>2007</year>
          , volume
          <volume>4653</volume>
          /
          <year>2007</year>
          . Springer-Verlag,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. Michal Kr´atky´, Jaroslav Pokorny´, and
          <article-title>V´aclav Sn´aˇsel. Implementation of XPath Axes in the Multi-dimensional Approach to Indexing XML Data. In Current Trends in Database Technology</article-title>
          ,
          <string-name>
            <surname>EDBT</surname>
          </string-name>
          <year>2004</year>
          , volume
          <volume>3268</volume>
          . Springer-Verlag,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Michal Kr´atky´, Tom´aˇs Skopal,
          <article-title>and V´aclav Sn´aˇsel. Multidimensional Term Indexing for Efficient Processing of Complex Queries</article-title>
          . Kybernetika, Journal,
          <volume>40</volume>
          (
          <issue>3</issue>
          ):
          <fpage>381</fpage>
          -
          <lpage>396</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Michal Kr´atky´, V´aclav Sn´aˇsel,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zezula</surname>
          </string-name>
          , and Jaroslav Pokorny´.
          <article-title>Efficient Processing of Narrow Range Queries in the R-Tree</article-title>
          .
          <source>In Proceedings of the 10th International Database Engineering and Applications Symposium (IDEAS</source>
          <year>2006</year>
          ), pages
          <fpage>69</fpage>
          -
          <lpage>79</lpage>
          . IEEE,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar. G-Tree</surname>
          </string-name>
          :
          <article-title>A New Data Structure for Organizing Multidimensional Data</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <fpage>341</fpage>
          -
          <lpage>347</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Tapio</surname>
          </string-name>
          <article-title>Lahdenm¨aki and Michael Leach. Relational Database Index Design and the Optimizers</article-title>
          . John Wiley and Sons, New Jersey,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Sam</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Lightstone</surname>
            ,
            <given-names>Toby J.</given-names>
          </string-name>
          <string-name>
            <surname>Teorey</surname>
          </string-name>
          , and Tom Nadeau.
          <article-title>Physical Database Design: the Database Professional's Guide</article-title>
          . Morgan Kaufmann,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Manolopoulos</surname>
          </string-name>
          , Alexandros Nanopoulos,
          <string-name>
            <given-names>Apostolos N.</given-names>
            <surname>Papadopoulos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Theodoridis.</surname>
          </string-name>
          R-Trees
          <source>: Theory and Applications</source>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>Volker</given-names>
            <surname>Markl</surname>
          </string-name>
          . Mistral:
          <article-title>Processing Relational Queries using a Multidimensional Access Technique</article-title>
          .
          <source>Ph.D. thesis</source>
          , Technical University Mu¨nchen, Germany,
          <year>1999</year>
          , http://mistral.in.tum.de/results/publications/Mar99.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>F.</given-names>
            <surname>Ramsak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Markl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zirkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Elhardt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Bayer</surname>
          </string-name>
          .
          <article-title>Integrating the UB-tree into a Database System Kernel</article-title>
          .
          <source>In Proceedings of 26th International Conference on Very Large Data Bases (VLDB</source>
          <year>2000</year>
          ), pages
          <fpage>263</fpage>
          -
          <lpage>272</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>Hans</given-names>
            <surname>Sagan</surname>
          </string-name>
          .
          <source>Space Filling Curves</source>
          . Springer-Verlag,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>Hanan</given-names>
            <surname>Samet</surname>
          </string-name>
          .
          <article-title>Foundations of Multidimensional and Metric Data Structures</article-title>
          . Morgan Kaufmann,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Timos</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Sellis</surname>
            , Nick Roussopoulos, and
            <given-names>Christos</given-names>
          </string-name>
          <string-name>
            <surname>Faloutsos. The R+-Tree: A Dynamic Index</surname>
          </string-name>
          <article-title>For Multi-Dimensional Objects</article-title>
          .
          <source>In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB</source>
          <year>1997</year>
          ), pages
          <fpage>507</fpage>
          -
          <lpage>518</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <given-names>Dennis</given-names>
            <surname>Shasha</surname>
          </string-name>
          and
          <string-name>
            <given-names>Philippe</given-names>
            <surname>Bonnet</surname>
          </string-name>
          . Database Tuning: Principles, Experiments, and
          <string-name>
            <given-names>Troubleshooting</given-names>
            <surname>Techniques</surname>
          </string-name>
          . The Morgan Kaufmann Series in
          <source>Data Management Systems</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28. Tom´aˇs Skopal, Michal Kr´atky´, V´aclav Sn´aˇsel, and
          <article-title>Jaroslav Pokorny´. A New Range Query Algorithm for the Universal B-trees</article-title>
          .
          <source>Information Systems</source>
          ,
          <volume>31</volume>
          (
          <issue>6</issue>
          ):
          <fpage>489</fpage>
          -
          <lpage>511</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Vitter</surname>
          </string-name>
          .
          <article-title>Algorithms and Data Structures for External Memory</article-title>
          .
          <source>Series on Foundations and Trends in Theoretical Computer Science. Now Publisher</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <given-names>Cui</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <string-name>
            <surname>High-Dimensional</surname>
            <given-names>Indexing</given-names>
          </string-name>
          , volume
          <volume>2341</volume>
          of Lecture Notes in Computer Science. Springer-Verlag,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>