Predicted Range Aggregate Processing in Spatio-temporal
                            Databases
                                        Wei Liao, Guifen Tang, Ning Jing, Zhinong Zhong
        School of Electronic Science and Engineering, National University of Defense Technology
                                            Changsha, China
                                       liaoweinudt@yahoo.com.cn

                               Abstract                                               spatio-temporal database managing the vehicles in a city,
                                                                                      results of predicted queries (e.g., finding all vehicles in
     Predicted range aggregate (PRA) query is an                                      the center) are meaningless (due to continuous object
     important researching issue in spatio-temporal                                   movements), while the aggregate information (the number
     databases. Recent studies have developed two                                     of vehicles) is usually stable and measurable (e.g., finding
     major classes of PRA query methods: (1)                                          the number of vehicles across the center during the next 5
     accurate approaches, which search the common                                     minutes) [1], [2].
     moving objects indexes to obtain an accurate                                         Specifically, given a set S of moving points in the d-
     result; and (2) estimate methods, which utilize                                  dimensional space, a predicted range aggregate (PRA)
     approximate techniques to estimate the result                                    query returns a single value that summarizes the set
     with an acceptable error.                                                         R ⊆ S of points in a d-dimensional hyper-rectangle qR and
         In this paper, we present a novel accurate                                   a future time interval (or a future timestamp) qT according
     prediction index technique, named PRA-tree, for                                  to some aggregation function (e.g., count, max, min, sum,
     range aggregation of moving objects. PRA-tree                                    average). In this paper, we consider predicted range
     takes into account both the velocity and space                                   distinct count queries on multidimensional moving points,
     distribution of moving objects. First, the velocity                              where the result is the size of R (e.g., the number of
     domain is partitioned into different velocity                                    vehicles in an area qR and a future time interval (or a
     buckets, and moving objects are classified into                                  future timestamp) qT), but the solutions can apply to any
     different velocity buckets by their velocities, thus                             other aggregation mentioned above with straightforward
     objects in one bucket have similar velocities.                                   adaptation.
     Then we use aTPR-tree, which is based on the
     basic TPR-tree structure and added with                                          1.1 Motivation
     aggregate information in intermediate nodes, to
     index objects in each bucket. PRA-tree is                                        A PRA query can be trivially processed as an ordinary
     supplemented by a hash index on IDs of moving                                    query, i.e., by first retrieving the qualifying objects and
     objects, and exploits bottom-up deletion                                         then aggregating their properties. This approach assumes
     algorithm, thus having a good dynamic                                            that the server manages the detailed information of
     performance and concurrency. Also new PRA                                        moving objects using a (typically disk-based) spatio-
     query methods with a more precise branch-and-                                    temporal access method [3], [4], [5], however, incurs
     bound searching strategy are developed for PRA-                                  significant overhead since, the objective is to retrieve only
     tree. Extensive experiments confirm that the                                     a single value (as opposed to every qualifying object).
     proposed methods are efficient and practical.                                    The most popular aggregate indexes [6], [7] aim at
                                                                                      estimating the PRA results. These approaches are
                                                                                      motivated by the fact that approximate aggregates with
1. Introduction                                                                       small error are often as useful as the exact ones in practice,
Traditional research in spatio-temporal databases often                               but can be obtained much more efficiently. In particular,
aims at predicted query, which retrieves the moving                                   such estimation methods require only a fraction of the
objects lying inside a multidimensional hyper-rectangle in                            dataset or a small amount of statistics and are significantly
future. In many scenarios (e.g., statistical analysis, traffic                        faster than exact retrieval. In this paper, we focus on
monitoring, etc.), however, users are interested only in                              accurate PRA queries processing with a novel aggregate
summarized information about such objects, instead of                                 index.
their individual properties. Consider, for example, a
                                                                                      1.2 Contributions
------------------------------------------------------------------------------
Proceedings of the 3rd Workshop on STDBM
                                                                                      In this paper, we propose a new indexing method, called
Seoul, Korea, September 11, 2006                                                      predicted range aggregate tree (PRA-tree), for efficient


                                                                                 25
PRA queries processing. The PRA-tree takes into account           sub-tree; (ii) the velocity of the lower edge is the
the distribution of moving objects both in velocity and           minimum of all velocities on this dimension. This ensures
space domain. First, the velocity domain is partitioned           that the MBR always encloses the underlying objects, but
into velocity buckets, and moving objects are classified          it is not necessarily tight all the time. The TPR-tree
into different velocity buckets by their velocities, thus         inherits the problems related to the R-tree, such as overlap
objects in one bucket have similar velocities. Then we use        and dead space. Since the index structure is dynamic, its
aTPR-tree, which is based on the basic TPR-tree structure         query performance degrades quickly with time. The
and added with aggregate information in intermediate              algorithms for insertion and deletion are also similar to R-
nodes, to index the objects in each bucket. We supplement         tree, while the TPR-tree uses time-parameterized metrics
the PRA-tree by a hash index constructed on the IDs of            for parameters such as the area, perimeter, and distance
moving objects, and develop a bottom-up deletion                  from the centroid [4]. For example, Figure 1a) shows the
algorithm to obtain good dynamic performance and                  MBRs and their velocities of TPR-tree at construction
concurrency. Finally, we present novel algorithms that use        time 0, and Figure 1b) shows MBRs of the same TPR-tree
a more precisely branch-and-bound searching strategy for          at future time 1.
PRA queries based on PRA-tree.
    The rest of the paper is organized as follows: Section        2.2 The Aggregate R-tree (aR-tree)
2 reviews previous work related to ours. Section 3                The aR-tree [8] improves the conventional R-tree by
provides the problem definition and an overview of the            adding aggregate information into intermediate nodes.
proposed methods. Section 4 presents the PRA-tree index           Figure 2a) shows an example, where for each intermediate
structure and its construction, insertion, deletion and           entry, in addition to the minimum bounding rectangle
update techniques. Section 5 elaborates algorithms for            (MBR), the tree stores the number of objects in its subtree
PRA queries. Section 6 evaluates the proposed methods             (i.e., count aggregate function). To answer a RA query q
through extensive experimental evaluation. Finally,               (the shaded rectangle), the root R is first retrieved and its
Section 7 concludes the paper.                                    entries are compared with q. For every entry, there are
                                                                  three cases: 1) The MBR of the entry (e.g., e1) does not
2. Related Work                                                   intersect q, and thus its subtree is not explored further. 2)
                                                                  The entry partially intersects q (e.g., e2) and its child
Section 2.1 discusses the most popular TPR-tree index,
                                                                  nodes are fetched to continue the search. 3) The entry is
while Section 2.2 overviews the aggregate R-tree (aR-tree)
                                                                  contained in q (e.g., e3), then we simply add the aggregate
which motivates our solution.
                                                                  number of the entry (i.e., 3 for e3) without accessing its
2.1 The TPR-tree                                                  subtree. As a result, only two node accesses (R and R2)
                                                                  are necessary, while a conventional R-tree (i.e., without
                                                                  the aggregate numbers) would also visit R3. The cost
                                                                  savings increase with the window of the query, which is
                                                                  an important fact because in practice RA queries often
                                                                  involve large rectangles.


   a) MBRs at time 0      b) MBRs of at time 1                                      Figure 2 The aR-tree
            Figure 1 MBRs of TPR-tree
The TPR-tree is an extension of the R-tree that can               3 Preliminary and Overview
answer predicted queries on dynamic objects. The index
                                                                  We start with a concrete definition of aggregate query in
structure is very similar to R-tree, and the difference is
                                                                  spatio-temporal databases before presenting the PRA-tree
that the index stores velocities of elements along with
                                                                  index. By definition, a spatiotemporal object is a unified
their MBRs in nodes. The leaf node entry contains not
                                                                  object with spatial and temporal extent [2]. A pure spatial
only the position of moving objects but also their
                                                                  object can be a point, a line, or a region in two or three-
velocities. Similarly, an intermediate node entry also
                                                                  dimensional space. The position and/or shape either
stores MBRs and velocity vector VBRs of its child nodes.
                                                                  changes continuously (e.g., the motion of vehicles) or
As in traditional R-tree, the extents of MBR are such that
                                                                  discretely (e.g., the shrink of forest) with time. In this
tightly encloses all entries in the node at construction time.
                                                                  paper we concentrate on the points (moving objects),
The velocity vector of the intermediate node MBR is
                                                                  which have continuously moving positions, in two-
determined as follows: (i) the velocity of the upper edge is
                                                                  dimensional space. In spatio-temporal databases, a
the maximum of all velocities on this dimension in the


                                                             26
moving object is represented as <Loc, Vec, A1, A2…An>,                            Figure 3 shows the moving objects in two dimensional
where Loc, Vec denote the position and velocity vector                       space with velocity vector (10, 0) and (-10, 0) for example.
respectively, and A1, A2…An denote the non-spatial                           Figure 3 a) illustrates the MBRs of TPR-tree constructed
properties of this object.                                                   merely in space domain. Obviously, the MBRs of
    An aggregate function takes a set of tuples and returns                  intermediate TPR-tree nodes extend extremely along the
a single value that summarizes the information contained                     horizontal direction, thus incurring the degradation of
in the set of tuples. Spatio-temporal aggregate query first                  query performance. Therefore, an effective aggregate
retrieves the qualifying objects that satisfy a spatio-                      query indexing technique needs to consider the
temporal query predicate and returns aggregation                             distribution of moving objects in both velocity and space
information (e.g., count, average) on the non-spatial                        domain to avoid the deterioration of query and dynamic
properties of objects.                                                       performance.
    Given a range query q(qR,qT), which obtains the                               Motivated by this, we propose a predictive range
objects lying in the query window qT and space area qR,                      aggregate tree (PRA-tree) index structure. First, the
and a moving objects dataset P, a spatio-temporal query                      velocity domain is partitioned into buckets with about the
predicate      Sel(P)      can          be       expressed       as          same objects number. Then moving objects are classified
Sel(P)={pi| ∃t ∈ qT such that p i (t ) ∈ q R }. Especially, if qT=tf,        into different velocity buckets by their velocities, and
where tf denotes a future timestamp, then the query q is a                   objects with similar velocities are clustered into one
predicted timestamp range query. Generally, the query                        bucket. Finally, an aTPR-tree structure is presented for
space area is a static rectangle. Therefore, we can define                   indexing moving objects in each bucket.
predicted range aggregate query as the following.                                 Figure 3 b) and c) shows the MBRs of PRA-tree that
    Definition 1 (predicted range aggregate query). Given                    considers the distribution of velocity domain. The moving
a dataset P, in which each tuple is represented as <Loc,                     objects are classified into two velocity buckets. The
Vec, A1, A2,…,An>, where the domain of Ai is Di, and a                       bucket1 (as shown in figure 3 b)) contains the moving
aggregate function f : D1 × D 2 × K × D n → D agg , where Dagg               objects with velocity vector (10, 0), while bucket2 (as
                                                                             shown in figure 3 c)) contains the moving objects with
denote the range of f. then an predicted range aggregate                     velocity vector (-10, 0). As seen, the MBR of each bucket
query can be defined as follows:                                             moves with time, but the shapes of MBR keep unchanged.
    f(q,P)=f(Sel(P))=f({pi| ∃t ∈ qT such that p i (t ) ∈ q R }).             So PRA-tree can hold a good query performance during
    Similar to the aR-tree, we enhance the conventional                      all the future time.
TPR-tree by keeping aggregate information in
intermediate nodes. To answer a PRA query q(qR,qT), the
root node is first retrieved and its entries are compared
with q. For every entry, there are three cases: 1) the MBR
of this entry does not intersect query area qR all through
the query time qT, then this subtree is not visited further; 2)
the entry is totally contained in the query area qR during                                 Figure 3 MBRs of PRA-tree
qT , and we simply add the aggregate information without                         In addition, existing TPR-tree update algorithms work
accessing its subtree; 3) the entry partially intersects the                 in a top-down manner. For each update, one index
area qR during qT , then its child nodes need to be fetched                  traversal to locate the item for deletion and another
and continue searching until the leaf. Using this approach,                  traversal to insert a new item are needed. The deletion
the cost of processing PRA queries can be significantly                      operation affects the disk I/O cost greatly, for the MBRs
reduced.                                                                     of TPR-tree become looser with time, thus incurring lots
    However, the TPR-tee is constructed merely in the                        of area overlaps between MBRs, and the searching
space domain, and slightly considering the velocity                          operation for deletion needs to visit all the TPR-tree nodes
distribution of moving objects. At construction time,                        at worst case. The top-down approach for visiting the
TPR-tree index clusters objects mainly according to their                    hierarchical structure is very simple, and easy for dynamic
spatial proximity into different nodes; however, the                         maintenance. Nevertheless, for TPR-tree like index
velocities of objects in the same page are always                            structures, the area between intermediate node MBRs
discrepant greatly. The minority of objects with higher                      inevitably overlaps each other (which is not occurred in
velocity make the velocity-bounding rectangle (VBR)                          other traditional indexes such B-tree), so that the top-
relatively larger. In addition, the MBR will increase                        down searching strategy is inefficient in nature. This
extremely with time, causing deterioration of the query                      manner results in the deterioration of TPR-tree, and is not
performance and dynamic maintenance. Actually, in most                       suitable in frequent update applications. Motivated by this,
applications PRA queries often visit unnecessary                             we exploit the bottom-up delete strategy to improve the
intermediate TPR-tree nodes due to the extremely large                       dynamic performance of PRA-tree.
MBRs and massive dead space.


                                                                        27
                                           Figure 4The structure of PRA-tree


4. The PRA-tree                                                  4.2 Construction
                                                                 We use spatio-temporal histograms as mentioned in [9] on
Section 4.1 discusses the structure of PRA-tree, while
                                                                 two velocity dimensions to compute the velocity buckets
Section 4.2 provides the construction method, Section 4.3
                                                                 number and their velocity range. The main idea is to
gives the insertion, deletion and update techniques,
                                                                 divide the moving objects set into different partitions with
Section 4.4 evaluates the effect factors of PRA-tree
                                                                 about the same objects number. Then the algorithm scans
performance.
                                                                 the set of moving objects in sequence, and inserts objects
4.1 The PRA-tree Structure                                       into the aTPR-tree pointed to by corresponding bucket
                                                                 according to their velocities. To avoid redundant disk I/Os
Specifically, in PRA-tree structure we keep the basic            caused by frequent insertion one by one, construction
TPR-tree index structure and add aggregate summary into          algorithm exploits the bulk loading technique [9] to
its intermediate nodes to make a new index, the aggregate        construct the aTPR-tree. However, unlike the traditional
TPR-tree (aTPR-tree). The entries in aTPR-tree are               bulk loading method, the construction algorithm must
organized as vector <MBR, VBR, Agg, ptr>, where MBR,             compute and store the aggregate information in the
VBR, Agg, ptr denote the space area, velocity range,             intermediate nodes.
aggregate information of this node and pointer to its                The foremost problem for PRA-tree is to choose the
subtree respectively. In addition, we introduce a main           number of velocity buckets. With too few velocity
memory linear queue for management of velocity buckets,          buckets the PRA-tree can not gain the optimal query and
the items of which are formed as <MBR, VBR, Agg, tpr>,           dynamic maintenance performance, while too many
where MBR, VBR, Agg, tpr denote the space area, velocity         velocity buckets may cause shifts between velocity
range, aggregate information of this bucket and pointer to       buckets when update, thus resulting in the deterioration of
its corresponding aTPR-tree respectively. To improve the         index structure. So choosing a proper number of velocity
dynamic performance of PRA-tree, we supplement a disk-           buckets can make the query and dynamic performance of
based hash index structure to access the leaf node of            PRA-tree optimal. In the experimental section, we
PRA-tree directly. Further, we modified the original TPR-        evaluate the effect of velocity buckets number on the
tree node item as vector <entry，… , entry, parentptr>,           query and dynamic maintenance performance.
where entry denotes the child nodes contained in this node,
and parentptr denotes the physical address of its parent         4.3 Insertion , Deletion and Update
node. Compared with node page size, the space                    The insertion algorithm is straightforward. When inserting
consumption by pointer parentptr is trivial, and its effect      a new moving object into the PRA-tree index, the
on the fanout of PRA-tree is ignorable.                          algorithm first scans the main-memory velocity bucket
     Figure 4 illustrates the PRA-tree structure. The top        queue to find the bucket that contains this object, and the
right corner is a velocity bucket queue, in which each item      aTPR-tree related to this bucket can be obtained by the
describes the MBR, VBR and aggregate information (i.e.,          pointer in the queue item. Then a new object entry is
count) of its corresponding aTPR-tree. The bottom right          produced and inserted into a suitable leaf node using the
corner is a hash index constructed on IDs of moving              standard TPR*-tree insertion algorithm. Finally, a new
objects, the item is defined as vector <oid, ptr>, where oid     item is produced and inserted into the hash index. If
denotes the identifier of moving objects, and ptr denotes        overflow occurs while insertion, the aTPR-tree node must
physical offset of the object entry in leaf node; the left of    be split with the method mentioned in [4]. In addition, the
Figure 4 is the aTPR-tree structures pointed to by the           algorithm must modify the aggregate information of this
velocity bucket queue. In this figure, the pointer to parent     bucket and aTPR-tree nodes along the searching path.
node is not clearly depicted for concision.                          To delete an object entry from the PRA-tree, a simple
                                                                 straight approach is to first find the velocity bucket that


                                                            28
contains this object and delete the object entry from the          been thoroughly surveyed in [1]. For example, we
aTPR-tree with standard deletion algorithm. However,               consider the following queries: i) “how many cars are
usually the searching path for deletion may be several, so         expected to appear at the center of city 10 minutes from
the algorithm must keep the searching path to perform              now?” and ii) “how many cars are expected to cross the
another operation to complete the modification of                  center of city during the next 5 minutes?” Section 5.1
aggregate information along the path, thus causing                 presents the algorithm for queries as case 1, and section
redundant disk I/Os. Motivated by this, we exploit the             5.2 details the algorithm for queries as case2.
bottom-up deletion strategy like [10] to improve the
deletion performance. The deletion algorithm first locates         5.1 Predicted Time-slice Aggregate Range Query
the leaf node that holds the object entry with hash index,         (PTRA Query)
and then deletes the entry from this leaf node directly.           PTRA query returns the aggregate information of objects
Then the algorithm ascends the branches of PRA-tree by             that will fall in a rectangle at a future timestamp based on
the pointer parentptr until the root node, and modifies the        the current motion of moving objects. Obviously,
MBR, VBR and aggregate information of intermediate                 according to the current MBR and VBR of each node in
nodes along the path meanwhile. Finally, the                       PRA-tree, we can get the MBR of this node at future
corresponding object item in the hash index must be                timestamp, and then an aggregate range search is
deleted to reflect the change and also the bucket item             implemented under the PRA-tree at this timestamp.
related to this aTPR-tree must be modified.                            Specifically, given a PTRA query q(qR,qT), the
    The update algorithm uses the standard deletion-               algorithm first scans the velocity bucket queue, and for
insertion mechanism in TPR-tree. The algorithm first               each bucket we can get whether any object in this bucket
deletes the old entry from the PRA-tree and then inserts a         lies in the query area qR at timestamp qT according to the
new entry into the PRA-tree.                                       bucket’s MBR and VBR. If qR does not intersect the space
                                                                   area covered by a velocity bucket at future timestamp qT,
4.4 Effects on Index Performance
                                                                   the aTPR-tree pointed by this bucket need not to be
PRA-tree partitions moving objects into velocity buckets           visited. Or else if qR contains the space area covered by a
that do not overlap each other in velocity domain. So the          velocity bucket at future timestamp qT, the aggregate
index can perform a good concurrency when a large                  information is returned from the bucket item directly and
amount of concurrent dynamic operations occur.                     added to the query result. Otherwise, if qR intersects the
Obviously, the concurrency improves with the number of             space area covered by a velocity bucket at qT, the
velocity buckets. However, with too large a bucket                 corresponding aTPR-tree is obtained and then from the
number the index structure is prone to instability. That is        root point, the algorithm recursively computes the
to say, because the velocity range of each bucket is too           topological relation between the MBR of this node at
small, when an object is updated, the likely object’s shift        future timestamp qT and query area qR. If the MBR is
from one bucket to another bucket may cause excessive              contained in qR, then the algorithm adds the aggregate
disk I/Os. In addition, limited by the system memory, and          information in this node to the query result, or else if the
considering the space and velocity distribution uncertainty        MBR does not intersect qR each other, then the node is
of moving objects, too many velocity buckets do not                skipped; otherwise this subtree is explored further until
reduce node accesses for processing PRA query.                     the leaf level.
Therefore, PRA-tree with a proper number of velocity
buckets can obtain the optimal query and update                    5.2 Predicted Window Aggregate Range Query
performance.                                                       (PWRA Query)
    The cost of maintain the velocity bucket queue is              PWRA query returns the aggregate information of objects
inexpensive. The queue is constructed along with PRA-              that will fall in a query rectangle at a future time window
tree, when dynamic operations such as insertion and                based on current motion of moving objects. To answer a
deletion occurred, the summary in which must also be               PWRA query, a straightforward method is to judge
updated. The queue is pinned in main memory, thus                  whether the MBR of current node is totally contained in
decrease the visiting cost greatly. And the space                  the query area at some future time. If true, then the
consumed by velocity queue is rather small and ignorable.          algorithm only adds the aggregate information without
                                                                   visiting this subtree; or else if the MBR of this node does
5 The Predicted Range Aggregate Query                              not intersect the query area at any future time, then this
Algorithm                                                          node is skipped. Otherwise, this subtree is explored
                                                                   further. This approach will access nodes whose MBR
We now illustrate the algorithm for answering a predicted          partially intersect the query area while the moving objects
range aggregate query with PRA-tree. There are two types           enclosed in totally lie in the query area at different
of queries: predicted time-slice range aggregate query and         timestamp (as shown in figure 5),thus causing excessive
predicted window range aggregate query, which have                 disk I/Os.


                                                              29
    Motivated by this, we present an enhanced predictive                queue, according to the current MBR and VBR of this
range aggregate query (EPRA) algorithm with a more                      bucket, the future timestamps when each vertex lies in the
precise branch and bound criterion. Before introducing                  query area qR can computed. If all the vertices can lie in
the EPRA algorithm, we give the following lemmas.                       qR during the query window qT, then the algorithm only
    Lemma 1 Given a moving rectangle R with fixed edge                  need to return the aggregate information of this bucket; or
velocities and a static query rectangle q, let q ∩ R ≠ Φ . If           else if none of the vertices will lie in qR during the query
the four vertices of R lie in the query area q at different             window qT, then the algorithm skips this bucket;
future timestamps, then the moving points enclosed in R                 otherwise the bucket is explored further. Similarly, when
will lie in the query area q during the future time.                    searching the aTPR-tree, from the root point the
    Proof: As an illustration of Lemma 1, consider Figure               timestamps when four vertices of MBR in each node lie in
5 where the relationship between R(a, b, c, d) and query q              qR is computed. If all the vertices can lie in qR during the
is shown as case a) and case b), since q ∩ R ≠ Φ .                      query window qT, then the algorithm only need to return
                                                                        the aggregate information of this node; or else if none of
Supposing the VBR of R is< v x− , v x+ , v −y , v +y >, then the
                                                                        the vertices will lie in qR during the query window qT,
direction of VBR can only be depicted as in Figure 5 a)                 then the algorithm skips this node; otherwise this subtree
and b), for vertices a, b, c and d will lie in q at future time.        is explored further.
    In case a), c has the velocity < v x− , v +y >, for ∀p ∈ R              The Algorithm 1 describes the pseudo-code for
point p has the velocity < v x , v y >, where v x > v x−                processing PWRA query as follows:
                                                                             Algorithm 1
and v y < v +y . If c lies in q at future timestamp tc, then c               Input: a PWRA query q(qR,qT), output: Agg(q)
                                                                        1. Initialize Agg(q), set Agg (q ) ← 0
must cross the line l2 and not cross the line l3 at tc. So that
                                                                        2.    For each item E in the velocity bucket queue
for point p, it must have crossed line l2 before tc and still
                                                                        3.      Compute the timestamps when each vertex lies in qR;
not cross line l3, the p lies in q at some timestamp before             4.      If all the timestamps lie between qT
tc.                                                                     5.        Then Agg (q) ← Agg (q) + E. Agg ;
    In case b): for a, b and c will lie in q at future times,           6.      Else if none of the timestamps lies between qT
obviously at least either b or c must lies in q at some                 7.        Then skip E;
future timestamp ts with d, then at ts the topological                  8.      Else get the aTPR-tree pointed by this bucket item E;
relationship between R and q can be illustrated as Figure 5             9.        from the root point, for each entry E in this node
a). According to the discussion above, we can conclude                  10.         Compute the timestamps when each vertex lies in qR;
that every point enclosed in R will lie in q at some future             11.         If all the timestamps lie between qT
timestamp.                                                              12.            Then Agg (q) ← Agg (q) + E. Agg
                                                                        13.         Else if none of the timestamps lies between qT
                                                                        14.            then skip E
                                                                        15.         Else explore the subtree recursively until leaf level
                                                                        16.          Compute the aggregate information in leaf node E
                                                                        17.               Set Agg ( q) ← Agg (q) + E. Agg
                                                                        18.          End if
                                                                        19.        End for
                                                                        20.       End if
   Figure 5 Topological relations between R and q                       21.    End for
    Lemma 2 Given a moving rectangle R with fixed edge                   End algorithm 1
velocities and a static query rectangle q, let q ∩ R = Φ . If
the four vertices of R lie in the query area q at different             6 Experimental Results and Performance
future timestamps, then the moving points enclosed in R                 Analysis
will lie in the query area q during the future time.
    Proof: Supposing vertex d to be the first point that will           6.1 Experimental Setting and Details
lie in the query q, then at this timestamp the topological
relationship between R and q can be shown as in Figure 5.               In this section, we evaluate the query and update
So according to Lemma 1, we can deduce Lemma 2.                         performance of PRA-tree with aTPR-tree and TPR*-tree.
    Heuristic 1: Given an intermediate entry E and a PRA                We use the Network-based Generator of Moving Objects
query q(qR,qT), the subtree of E totally satisfy query q if             [11] to generate 100k moving objects. The input to the
the vertices of MBR in entry E lie in query area qR during              generator is the road map of Oldenburg (a city in
future time qT. If true, the query algorithm only returns the           Germany). An object appears on a network node, and
aggregate information of E without accessing its subtree.               randomly chooses a destination. When the object reaches
    Heuristic 1 reduces the searching cost considerably,                its destination, an update is reported by randomly
while incurring rather small computational overhead.                    selecting the next destination. When normalize the data
Specifically, the algorithm first scans the velocity bucket             space to 10000×10000, the default velocity of objects is


                                                                   30
equal to 20 per timestamp. At each timestamp about 0.8                                                            performance. In addition, the larger the query range area,
% moving objects update their velocities. The PRA                                                                 the better the PTRA query performance of PRA tree, for
queries are generated as follows: i) the query spatial                                                            the intermediate nodes in PRA-tree store the aggregate
extent qRlen is set as 100×100,400×400,800×800,1200                                                               information of this subtree, thus reducing the node
×1200,1600×1600 respectively, and the starting point of                                                           accesses needed by TPR*-tree. The aTPR-tree holds a
its extent randomly distributes in (10000-qRlen)×(10000-                                                          relative worse performance than PRA-tree for its larger
                                                                                                                  MBRs with time.
qRlen).ii) the query time window is [Tisu，Tisu+qTlen]
                                                                                                                                         0     400    800     1200   1600                    20                       40     60      80    100
(Tisu is the time when the query is presented ), where                                                                             800                                 800               1600                                                1600

qTlen=20,40,60,80,100 respectively. The query perform-                                                                             600
                                                                                                                                                 TPR*-tree
                                                                                                                                                                       600               1200
                                                                                                                                                                                                                       TPR*-tree
                                                                                                                                                                                                                                            1200


                                                                                                                                                                               Node accesses
                                                                                                                                                                                                                       aTPR-tree


                                                                                                                   Node accesses
                                                                                                                                                 aTPR-tree
ance is measured as the average number of node accesses                                                                            400
                                                                                                                                                 PRA-tree
                                                                                                                                                                       400                       800
                                                                                                                                                                                                                       PRA-tree
                                                                                                                                                                                                                                            800
in processing ten PRA queries with the same parameters.
                                                                                                                                   200                                 200                       400                                        400
The update performance is measured as the average node
                                                                                                                                     0                                 0                                       0                             0
accesses in executing 100 moving objects updates.                                                                                        0     400    800     1200   1600                                       20    40     60      80    100
                                                                                                                                                      qRlen                                                                 qTlen
     Table 1 summarizes the parameters of PRA-tree
exploited for the workload. For all simulations, we use a                                                             a) qTlen=50             b) qRlen=1000
Celeron 2.4GHz CPU with 256MByte memory.                                                                           Figure 7 comparison of PWRA query performance
                                   Table 1               PRA-tree parameters
 Parameter                       Value                  Description                                                   Figure 7 a) and b) shows the PWRA query cost as a
 Number                          25/50/100/1            The velocity domain is split                              function of qRlen and qTlen respectively. In Figure 7 a)
 of buckets                      50/200/250             into 5×5, 5×10, 10×10, 10×15,                             we fix the parameter qTlen=50 and in Figure 7 b) we fix
                                                        and 10×25 respectively.                                   the parameter qRlen=1000. As seen, the query
 Page size                       1k                     The page size of PRA-tree.                                performance of PRA-tree works best and TPR*-tree
 Fanout                          21                     The average fanout of PRA-tree                            exhibits a worst performance. This is because the MBRs
                                                        in intermediate node.                                     PRA-tree overlap relatively less than those of aTPR-tree
 Average                         3                      The average height of PRA-                                and TPR*-tree, thus having a good query performance.
 height                                                 tree.
                                                                                                                  While TPR*-tree and aTPR-tree may visit many
6.2 Performance Analysis                                                                                          unnecessary nodes for the massive overlaps between area
                                                                                                                  of MBRs with time. Furthermore, PRA-tree exploits more
We compare the query and update performance of PRA-                                                               precise branch-and-bound strategy, thus having a best
tree, TPR*-tree and aTPR-tree by node accesses. In order                                                          performance.
to study the deterioration of the indexes with time, we                                                                                5k       10k     15k    20k    25k                                        5k   10k     15k    20k    25k
                                                                                                                                   1000                                 1000                                   50                             50
measure the performance of PRA-tree, aTPR-tree and                                                                                              TPR*-tree                                                              TPR*-tree
                                                                                                                                    800         aTPR-tree               800                                    40      aTPR-tree                 40
TPR*-tree, using the same query workload, after every 5k
                                                                                                                                                                                               Node accesses


                                                                                                                                                PRA-tree                                                               PRA-tree
                                                                                                                   Node accesses


                                                                                                                                    600                                 600                                    30                                30
updates.                                                                                                                            400                                 400                                    20                                20

                       0   400    800    1200   1600                         20   40     60     80   100                            200                                 200                                    10                                10
                 240                              240                     360                          360

                             TPR*-tree                                             TPR*-tree                                             0                              0                                       0                             0
                 180                              180                     270      aTPR-tree          270                                 5k    10k     15k    20k    25k                                        5k   10k     15k    20k    25k
                                                          Node accesses


                             aTPR-tree
 Node accesses


                             PRA-tree                                              PRA-tree                                                      Number of updates                                                     Number of updates
                 120                              120                     180                         180

                                                                           90                         90
                                                                                                                         a) update cost         b) query cost
                  60                              60
                                                                                                                  Figure 8 comparison of update& query performance
                  0                               0                         0                          0
                       0   400    800    1200   1600                         20   40     60     80   100
                                 qRlen                                                  qTlen                         Figure 8 compares the average PRWA query and
                                                                                                                  update cost as a function of the number of updates. As
       a) qTlen=50             b) qRlen=1000                                                                      shown in figure 8 a), the node accesses needed in PRA-
    Figure 6 comparison of PTRA query performance                                                                 tree update execution are far less than TPR*-tree and
    Figure 6 a) and b) shows the PTRA query cost as a                                                             aTPR-tree. And PRA-tree and TPR*-tree have nearly
function of qRlen and qTlen respectively. In Figure 6 a)                                                          constant update cost This is because PRA-tree exploits the
we fix the parameter qTlen=50 and in Figure 6 b) we fix                                                           bottom-up deletion strategy, avoiding the excessive node
the parameter qRlen=1000. As seen, the query                                                                      accesses for deletion search, while TPR*-tree and aTPR-
performance of PRA-tree works best and TPR*-tree                                                                  tree process update in top-down manner, needing more
exhibits a worst performance. This is because PRA-tree is                                                         node accesses.
constructed both on the space and velocity domain, the                                                                Figure 8 b) shows the node accesses in processing one
area of MBRs overlap relatively less than those of aTPR-                                                          PWRA query as function of an interval of 5k updates
tree and TPR*-tree, thus having a good query                                                                      when fixing the parameters qTlen=50 and qRlen=1000. It
performance. While TPR*-tree and aTPR-tree may visit                                                              is clear that the query cost increases with the number of
many unnecessary nodes for the massive overlaps                                                                   updates. The PRA-tree has a slow increasing query cost,
between area of MBRs with time, causing a worse query                                                             while the cost of TPR*-tree and aTPR-tree increase


                                                                                                             31
significantly. This is because the MBRs of PRA-tree                                                                                      Computation: A Survey. IEEE TKDE, NO.2,
extend less than those of TPR*-tree and aTPR-tree, so the                                                                                pp.271-286, 2005.
degradation is not much expensive.                                                                                                 [3]   Simonas Saltenis, Christian S. Jensen, et al..
                       0    50     100      150   200   250                          0    50     100      150   200   250
                 200                                      200                   20                                      20               Indexing the Positions of Continuously Moving
                                 PRA-tree
                 190                                     190
                                                                                               PRA-tree
                                                                                                                                         Objects. Proc. SIGMOD Conf., pp.331-342, 2000.
 Node accesses


                                                                Node accesses
                                                                                15                                     15
                 180                                     180                                                                       [4]   Tao Y., Papadias D., and Sun J.. The TPR*-Tree:
                 170                                     170
                                                                                10                                     10
                                                                                                                                         An Optimized Spatio-Temporal Access Method for
                 160                                      160                   5                                       5                Predictive Queries. Proc. VLDB Conf., pp.790-801,
                       0    50     100      150   200   250                          0    50     100      150   200   250
                           Number of velocity buckets                                    Number of velocity buckets                      2003.
                 a) effect on update cost   b) effect on query cost                                                                [5]   Jignesh M. Patel, Yun Chen, V. Prasad Chaka.
                    Figure 9 effect of velocity bucket number                                                                            STRIPES: An Efficient Index for Predicted
                                                                                                                                         Trajectories. Proc. SIGMOD Conf., pp.635-646,
    Figure 9 shows the effect of velocity buckets number
on PWRA query and update performance of PRA-tree.                                                                                        2004.
We fix the parameters qTlen=50 and qRlen=1000. As                                                                                  [6]   Yufei Tao, Dimitris Papadias, Jian Zhai, and Qing
shown in Figure9 a), the update cost of PRA-tree increase                                                                                Li. Venn Sampling: A Novel Prediction Technique
slightly with the number of velocity buckets, because the                                                                                for Moving Objects. Proc. Int’l Conf. Data Eng.,
small velocity bucket window causes the shift of moving                                                                                  pp.680-691, 2005.
objects between buckets, thus incurs excessive node                                                                                [7]   Yufei Tao, Jimeng Sun, Dimitris Papadias.
accesses. From Figure 9 b) we can see, the PWRA query                                                                                    Selectivity Estimation for Predictive Spatio-
performance will degrade with too many buckets, this is
                                                                                                                                         Temporal Queries. Proc. Int’l Conf. Data Eng.,
because of, as mentioned earlier, the uncertainty of
moving objects distribution in velocity and space, the                                                                                   pp.417-428, 2003.
probability of overlaps between the area covered by                                                                                [8]   M. Jurgens and H. Lenz. The Ra*-Tree: An
velocity buckets and query region don’t decrease linearly                                                                                Improved R-Tree with Materialized Data for
as expected. However, it may bring more TPR-tree node                                                                                    Supporting Range Queries on OLAP-Data. Proc.
accesses. In this experiment, we set the number of                                                                                       DEXA Workshop, 1998.
velocity bucket equal to 25.                                                                                                       [9]   Bin Lin and Jianwen Su. On bulk loading TPR-tree.
                                                                                                                                         Proc. IEEE Conf. Mobile Data Management
7. Conclusion                                                                                                                            (MDM), pp.114-124, 2004.
This paper investigates the problem of PRA queries. Our                                                                           [10]   M. Lee, W. Hsu, C. Jensen, B. Cui, and K. Teo.
contribution is a novel indexing method, referred to as                                                                                  Supporting Frequent Updates in R-Trees: A
predicted range aggregate R-tree (PRA-tree). PRA-tree                                                                                    Bottom-Up Approach. Proc. VLDB Conf., pp.608-
considers the distribution of moving objects both in                                                                                     619, 2003.
velocity and space domain. First, we partition the velocity
                                                                                                                                  [11]   Thomas Brinkhoff. A Framework for Generating
domain into velocity buckets, and then we use aTPR-tree
to index the moving objects in each bucket. To support                                                                                   Network-Based Moving Objects. GeoInformatica,
frequent updates a supplemented hash index on leaf nodes                                                                                 Vol.6 (2), pp.153-180, Kluwer, 2002.
is added to PRA-tree. Also an extended bottom-up
deletion algorithm is developed for PRA-tree.
    An open problem is to extend our work to support
large scale of concurrent PRA queries efficiently. Further,
the PRA query results need to be revaluated for the
frequent object updates, so developing novel incremental
algorithms for PRA queries is also an interesting work.


References

   [1]                 Yufei Tao and Dimitris Papadias, Range Aggregate
                       Processing in Spatial Databases. IEEE TKDE,
                       NO.12, pp.1555-1570, 2004.
   [2]                 Ine´s Fernando Vega Lo´pez, Richard T. Snodgrass,
                       and Bongki Moon. Spatiotemporal Aggregate


                                                                                                                             32