Predicted Range Aggregate Processing in Spatio-temporal Databases Wei Liao, Guifen Tang, Ning Jing, Zhinong Zhong School of Electronic Science and Engineering, National University of Defense Technology Changsha, China liaoweinudt@yahoo.com.cn Abstract spatio-temporal database managing the vehicles in a city, results of predicted queries (e.g., finding all vehicles in Predicted range aggregate (PRA) query is an the center) are meaningless (due to continuous object important researching issue in spatio-temporal movements), while the aggregate information (the number databases. Recent studies have developed two of vehicles) is usually stable and measurable (e.g., finding major classes of PRA query methods: (1) the number of vehicles across the center during the next 5 accurate approaches, which search the common minutes) [1], [2]. moving objects indexes to obtain an accurate Specifically, given a set S of moving points in the d- result; and (2) estimate methods, which utilize dimensional space, a predicted range aggregate (PRA) approximate techniques to estimate the result query returns a single value that summarizes the set with an acceptable error. R ⊆ S of points in a d-dimensional hyper-rectangle qR and In this paper, we present a novel accurate a future time interval (or a future timestamp) qT according prediction index technique, named PRA-tree, for to some aggregation function (e.g., count, max, min, sum, range aggregation of moving objects. PRA-tree average). In this paper, we consider predicted range takes into account both the velocity and space distinct count queries on multidimensional moving points, distribution of moving objects. First, the velocity where the result is the size of R (e.g., the number of domain is partitioned into different velocity vehicles in an area qR and a future time interval (or a buckets, and moving objects are classified into future timestamp) qT), but the solutions can apply to any different velocity buckets by their velocities, thus other aggregation mentioned above with straightforward objects in one bucket have similar velocities. adaptation. Then we use aTPR-tree, which is based on the basic TPR-tree structure and added with 1.1 Motivation aggregate information in intermediate nodes, to index objects in each bucket. PRA-tree is A PRA query can be trivially processed as an ordinary supplemented by a hash index on IDs of moving query, i.e., by first retrieving the qualifying objects and objects, and exploits bottom-up deletion then aggregating their properties. This approach assumes algorithm, thus having a good dynamic that the server manages the detailed information of performance and concurrency. Also new PRA moving objects using a (typically disk-based) spatio- query methods with a more precise branch-and- temporal access method [3], [4], [5], however, incurs bound searching strategy are developed for PRA- significant overhead since, the objective is to retrieve only tree. Extensive experiments confirm that the a single value (as opposed to every qualifying object). proposed methods are efficient and practical. The most popular aggregate indexes [6], [7] aim at estimating the PRA results. These approaches are motivated by the fact that approximate aggregates with 1. Introduction small error are often as useful as the exact ones in practice, Traditional research in spatio-temporal databases often but can be obtained much more efficiently. In particular, aims at predicted query, which retrieves the moving such estimation methods require only a fraction of the objects lying inside a multidimensional hyper-rectangle in dataset or a small amount of statistics and are significantly future. In many scenarios (e.g., statistical analysis, traffic faster than exact retrieval. In this paper, we focus on monitoring, etc.), however, users are interested only in accurate PRA queries processing with a novel aggregate summarized information about such objects, instead of index. their individual properties. Consider, for example, a 1.2 Contributions ------------------------------------------------------------------------------ Proceedings of the 3rd Workshop on STDBM In this paper, we propose a new indexing method, called Seoul, Korea, September 11, 2006 predicted range aggregate tree (PRA-tree), for efficient 25 PRA queries processing. The PRA-tree takes into account sub-tree; (ii) the velocity of the lower edge is the the distribution of moving objects both in velocity and minimum of all velocities on this dimension. This ensures space domain. First, the velocity domain is partitioned that the MBR always encloses the underlying objects, but into velocity buckets, and moving objects are classified it is not necessarily tight all the time. The TPR-tree into different velocity buckets by their velocities, thus inherits the problems related to the R-tree, such as overlap objects in one bucket have similar velocities. Then we use and dead space. Since the index structure is dynamic, its aTPR-tree, which is based on the basic TPR-tree structure query performance degrades quickly with time. The and added with aggregate information in intermediate algorithms for insertion and deletion are also similar to R- nodes, to index the objects in each bucket. We supplement tree, while the TPR-tree uses time-parameterized metrics the PRA-tree by a hash index constructed on the IDs of for parameters such as the area, perimeter, and distance moving objects, and develop a bottom-up deletion from the centroid [4]. For example, Figure 1a) shows the algorithm to obtain good dynamic performance and MBRs and their velocities of TPR-tree at construction concurrency. Finally, we present novel algorithms that use time 0, and Figure 1b) shows MBRs of the same TPR-tree a more precisely branch-and-bound searching strategy for at future time 1. PRA queries based on PRA-tree. The rest of the paper is organized as follows: Section 2.2 The Aggregate R-tree (aR-tree) 2 reviews previous work related to ours. Section 3 The aR-tree [8] improves the conventional R-tree by provides the problem definition and an overview of the adding aggregate information into intermediate nodes. proposed methods. Section 4 presents the PRA-tree index Figure 2a) shows an example, where for each intermediate structure and its construction, insertion, deletion and entry, in addition to the minimum bounding rectangle update techniques. Section 5 elaborates algorithms for (MBR), the tree stores the number of objects in its subtree PRA queries. Section 6 evaluates the proposed methods (i.e., count aggregate function). To answer a RA query q through extensive experimental evaluation. Finally, (the shaded rectangle), the root R is first retrieved and its Section 7 concludes the paper. entries are compared with q. For every entry, there are three cases: 1) The MBR of the entry (e.g., e1) does not 2. Related Work intersect q, and thus its subtree is not explored further. 2) The entry partially intersects q (e.g., e2) and its child Section 2.1 discusses the most popular TPR-tree index, nodes are fetched to continue the search. 3) The entry is while Section 2.2 overviews the aggregate R-tree (aR-tree) contained in q (e.g., e3), then we simply add the aggregate which motivates our solution. number of the entry (i.e., 3 for e3) without accessing its 2.1 The TPR-tree subtree. As a result, only two node accesses (R and R2) are necessary, while a conventional R-tree (i.e., without the aggregate numbers) would also visit R3. The cost savings increase with the window of the query, which is an important fact because in practice RA queries often involve large rectangles. a) MBRs at time 0 b) MBRs of at time 1 Figure 2 The aR-tree Figure 1 MBRs of TPR-tree The TPR-tree is an extension of the R-tree that can 3 Preliminary and Overview answer predicted queries on dynamic objects. The index We start with a concrete definition of aggregate query in structure is very similar to R-tree, and the difference is spatio-temporal databases before presenting the PRA-tree that the index stores velocities of elements along with index. By definition, a spatiotemporal object is a unified their MBRs in nodes. The leaf node entry contains not object with spatial and temporal extent [2]. A pure spatial only the position of moving objects but also their object can be a point, a line, or a region in two or three- velocities. Similarly, an intermediate node entry also dimensional space. The position and/or shape either stores MBRs and velocity vector VBRs of its child nodes. changes continuously (e.g., the motion of vehicles) or As in traditional R-tree, the extents of MBR are such that discretely (e.g., the shrink of forest) with time. In this tightly encloses all entries in the node at construction time. paper we concentrate on the points (moving objects), The velocity vector of the intermediate node MBR is which have continuously moving positions, in two- determined as follows: (i) the velocity of the upper edge is dimensional space. In spatio-temporal databases, a the maximum of all velocities on this dimension in the 26 moving object is represented as , Figure 3 shows the moving objects in two dimensional where Loc, Vec denote the position and velocity vector space with velocity vector (10, 0) and (-10, 0) for example. respectively, and A1, A2…An denote the non-spatial Figure 3 a) illustrates the MBRs of TPR-tree constructed properties of this object. merely in space domain. Obviously, the MBRs of An aggregate function takes a set of tuples and returns intermediate TPR-tree nodes extend extremely along the a single value that summarizes the information contained horizontal direction, thus incurring the degradation of in the set of tuples. Spatio-temporal aggregate query first query performance. Therefore, an effective aggregate retrieves the qualifying objects that satisfy a spatio- query indexing technique needs to consider the temporal query predicate and returns aggregation distribution of moving objects in both velocity and space information (e.g., count, average) on the non-spatial domain to avoid the deterioration of query and dynamic properties of objects. performance. Given a range query q(qR,qT), which obtains the Motivated by this, we propose a predictive range objects lying in the query window qT and space area qR, aggregate tree (PRA-tree) index structure. First, the and a moving objects dataset P, a spatio-temporal query velocity domain is partitioned into buckets with about the predicate Sel(P) can be expressed as same objects number. Then moving objects are classified Sel(P)={pi| ∃t ∈ qT such that p i (t ) ∈ q R }. Especially, if qT=tf, into different velocity buckets by their velocities, and where tf denotes a future timestamp, then the query q is a objects with similar velocities are clustered into one predicted timestamp range query. Generally, the query bucket. Finally, an aTPR-tree structure is presented for space area is a static rectangle. Therefore, we can define indexing moving objects in each bucket. predicted range aggregate query as the following. Figure 3 b) and c) shows the MBRs of PRA-tree that Definition 1 (predicted range aggregate query). Given considers the distribution of velocity domain. The moving a dataset P, in which each tuple is represented as , where the domain of Ai is Di, and a bucket1 (as shown in figure 3 b)) contains the moving aggregate function f : D1 × D 2 × K × D n → D agg , where Dagg objects with velocity vector (10, 0), while bucket2 (as shown in figure 3 c)) contains the moving objects with denote the range of f. then an predicted range aggregate velocity vector (-10, 0). As seen, the MBR of each bucket query can be defined as follows: moves with time, but the shapes of MBR keep unchanged. f(q,P)=f(Sel(P))=f({pi| ∃t ∈ qT such that p i (t ) ∈ q R }). So PRA-tree can hold a good query performance during Similar to the aR-tree, we enhance the conventional all the future time. TPR-tree by keeping aggregate information in intermediate nodes. To answer a PRA query q(qR,qT), the root node is first retrieved and its entries are compared with q. For every entry, there are three cases: 1) the MBR of this entry does not intersect query area qR all through the query time qT, then this subtree is not visited further; 2) the entry is totally contained in the query area qR during Figure 3 MBRs of PRA-tree qT , and we simply add the aggregate information without In addition, existing TPR-tree update algorithms work accessing its subtree; 3) the entry partially intersects the in a top-down manner. For each update, one index area qR during qT , then its child nodes need to be fetched traversal to locate the item for deletion and another and continue searching until the leaf. Using this approach, traversal to insert a new item are needed. The deletion the cost of processing PRA queries can be significantly operation affects the disk I/O cost greatly, for the MBRs reduced. of TPR-tree become looser with time, thus incurring lots However, the TPR-tee is constructed merely in the of area overlaps between MBRs, and the searching space domain, and slightly considering the velocity operation for deletion needs to visit all the TPR-tree nodes distribution of moving objects. At construction time, at worst case. The top-down approach for visiting the TPR-tree index clusters objects mainly according to their hierarchical structure is very simple, and easy for dynamic spatial proximity into different nodes; however, the maintenance. Nevertheless, for TPR-tree like index velocities of objects in the same page are always structures, the area between intermediate node MBRs discrepant greatly. The minority of objects with higher inevitably overlaps each other (which is not occurred in velocity make the velocity-bounding rectangle (VBR) other traditional indexes such B-tree), so that the top- relatively larger. In addition, the MBR will increase down searching strategy is inefficient in nature. This extremely with time, causing deterioration of the query manner results in the deterioration of TPR-tree, and is not performance and dynamic maintenance. Actually, in most suitable in frequent update applications. Motivated by this, applications PRA queries often visit unnecessary we exploit the bottom-up delete strategy to improve the intermediate TPR-tree nodes due to the extremely large dynamic performance of PRA-tree. MBRs and massive dead space. 27 Figure 4The structure of PRA-tree 4. The PRA-tree 4.2 Construction We use spatio-temporal histograms as mentioned in [9] on Section 4.1 discusses the structure of PRA-tree, while two velocity dimensions to compute the velocity buckets Section 4.2 provides the construction method, Section 4.3 number and their velocity range. The main idea is to gives the insertion, deletion and update techniques, divide the moving objects set into different partitions with Section 4.4 evaluates the effect factors of PRA-tree about the same objects number. Then the algorithm scans performance. the set of moving objects in sequence, and inserts objects 4.1 The PRA-tree Structure into the aTPR-tree pointed to by corresponding bucket according to their velocities. To avoid redundant disk I/Os Specifically, in PRA-tree structure we keep the basic caused by frequent insertion one by one, construction TPR-tree index structure and add aggregate summary into algorithm exploits the bulk loading technique [9] to its intermediate nodes to make a new index, the aggregate construct the aTPR-tree. However, unlike the traditional TPR-tree (aTPR-tree). The entries in aTPR-tree are bulk loading method, the construction algorithm must organized as vector , where MBR, compute and store the aggregate information in the VBR, Agg, ptr denote the space area, velocity range, intermediate nodes. aggregate information of this node and pointer to its The foremost problem for PRA-tree is to choose the subtree respectively. In addition, we introduce a main number of velocity buckets. With too few velocity memory linear queue for management of velocity buckets, buckets the PRA-tree can not gain the optimal query and the items of which are formed as , dynamic maintenance performance, while too many where MBR, VBR, Agg, tpr denote the space area, velocity velocity buckets may cause shifts between velocity range, aggregate information of this bucket and pointer to buckets when update, thus resulting in the deterioration of its corresponding aTPR-tree respectively. To improve the index structure. So choosing a proper number of velocity dynamic performance of PRA-tree, we supplement a disk- buckets can make the query and dynamic performance of based hash index structure to access the leaf node of PRA-tree optimal. In the experimental section, we PRA-tree directly. Further, we modified the original TPR- evaluate the effect of velocity buckets number on the tree node item as vector , query and dynamic maintenance performance. where entry denotes the child nodes contained in this node, and parentptr denotes the physical address of its parent 4.3 Insertion , Deletion and Update node. Compared with node page size, the space The insertion algorithm is straightforward. When inserting consumption by pointer parentptr is trivial, and its effect a new moving object into the PRA-tree index, the on the fanout of PRA-tree is ignorable. algorithm first scans the main-memory velocity bucket Figure 4 illustrates the PRA-tree structure. The top queue to find the bucket that contains this object, and the right corner is a velocity bucket queue, in which each item aTPR-tree related to this bucket can be obtained by the describes the MBR, VBR and aggregate information (i.e., pointer in the queue item. Then a new object entry is count) of its corresponding aTPR-tree. The bottom right produced and inserted into a suitable leaf node using the corner is a hash index constructed on IDs of moving standard TPR*-tree insertion algorithm. Finally, a new objects, the item is defined as vector , where oid item is produced and inserted into the hash index. If denotes the identifier of moving objects, and ptr denotes overflow occurs while insertion, the aTPR-tree node must physical offset of the object entry in leaf node; the left of be split with the method mentioned in [4]. In addition, the Figure 4 is the aTPR-tree structures pointed to by the algorithm must modify the aggregate information of this velocity bucket queue. In this figure, the pointer to parent bucket and aTPR-tree nodes along the searching path. node is not clearly depicted for concision. To delete an object entry from the PRA-tree, a simple straight approach is to first find the velocity bucket that 28 contains this object and delete the object entry from the been thoroughly surveyed in [1]. For example, we aTPR-tree with standard deletion algorithm. However, consider the following queries: i) “how many cars are usually the searching path for deletion may be several, so expected to appear at the center of city 10 minutes from the algorithm must keep the searching path to perform now?” and ii) “how many cars are expected to cross the another operation to complete the modification of center of city during the next 5 minutes?” Section 5.1 aggregate information along the path, thus causing presents the algorithm for queries as case 1, and section redundant disk I/Os. Motivated by this, we exploit the 5.2 details the algorithm for queries as case2. bottom-up deletion strategy like [10] to improve the deletion performance. The deletion algorithm first locates 5.1 Predicted Time-slice Aggregate Range Query the leaf node that holds the object entry with hash index, (PTRA Query) and then deletes the entry from this leaf node directly. PTRA query returns the aggregate information of objects Then the algorithm ascends the branches of PRA-tree by that will fall in a rectangle at a future timestamp based on the pointer parentptr until the root node, and modifies the the current motion of moving objects. Obviously, MBR, VBR and aggregate information of intermediate according to the current MBR and VBR of each node in nodes along the path meanwhile. Finally, the PRA-tree, we can get the MBR of this node at future corresponding object item in the hash index must be timestamp, and then an aggregate range search is deleted to reflect the change and also the bucket item implemented under the PRA-tree at this timestamp. related to this aTPR-tree must be modified. Specifically, given a PTRA query q(qR,qT), the The update algorithm uses the standard deletion- algorithm first scans the velocity bucket queue, and for insertion mechanism in TPR-tree. The algorithm first each bucket we can get whether any object in this bucket deletes the old entry from the PRA-tree and then inserts a lies in the query area qR at timestamp qT according to the new entry into the PRA-tree. bucket’s MBR and VBR. If qR does not intersect the space area covered by a velocity bucket at future timestamp qT, 4.4 Effects on Index Performance the aTPR-tree pointed by this bucket need not to be PRA-tree partitions moving objects into velocity buckets visited. Or else if qR contains the space area covered by a that do not overlap each other in velocity domain. So the velocity bucket at future timestamp qT, the aggregate index can perform a good concurrency when a large information is returned from the bucket item directly and amount of concurrent dynamic operations occur. added to the query result. Otherwise, if qR intersects the Obviously, the concurrency improves with the number of space area covered by a velocity bucket at qT, the velocity buckets. However, with too large a bucket corresponding aTPR-tree is obtained and then from the number the index structure is prone to instability. That is root point, the algorithm recursively computes the to say, because the velocity range of each bucket is too topological relation between the MBR of this node at small, when an object is updated, the likely object’s shift future timestamp qT and query area qR. If the MBR is from one bucket to another bucket may cause excessive contained in qR, then the algorithm adds the aggregate disk I/Os. In addition, limited by the system memory, and information in this node to the query result, or else if the considering the space and velocity distribution uncertainty MBR does not intersect qR each other, then the node is of moving objects, too many velocity buckets do not skipped; otherwise this subtree is explored further until reduce node accesses for processing PRA query. the leaf level. Therefore, PRA-tree with a proper number of velocity buckets can obtain the optimal query and update 5.2 Predicted Window Aggregate Range Query performance. (PWRA Query) The cost of maintain the velocity bucket queue is PWRA query returns the aggregate information of objects inexpensive. The queue is constructed along with PRA- that will fall in a query rectangle at a future time window tree, when dynamic operations such as insertion and based on current motion of moving objects. To answer a deletion occurred, the summary in which must also be PWRA query, a straightforward method is to judge updated. The queue is pinned in main memory, thus whether the MBR of current node is totally contained in decrease the visiting cost greatly. And the space the query area at some future time. If true, then the consumed by velocity queue is rather small and ignorable. algorithm only adds the aggregate information without visiting this subtree; or else if the MBR of this node does 5 The Predicted Range Aggregate Query not intersect the query area at any future time, then this Algorithm node is skipped. Otherwise, this subtree is explored further. This approach will access nodes whose MBR We now illustrate the algorithm for answering a predicted partially intersect the query area while the moving objects range aggregate query with PRA-tree. There are two types enclosed in totally lie in the query area at different of queries: predicted time-slice range aggregate query and timestamp (as shown in figure 5),thus causing excessive predicted window range aggregate query, which have disk I/Os. 29 Motivated by this, we present an enhanced predictive queue, according to the current MBR and VBR of this range aggregate query (EPRA) algorithm with a more bucket, the future timestamps when each vertex lies in the precise branch and bound criterion. Before introducing query area qR can computed. If all the vertices can lie in the EPRA algorithm, we give the following lemmas. qR during the query window qT, then the algorithm only Lemma 1 Given a moving rectangle R with fixed edge need to return the aggregate information of this bucket; or velocities and a static query rectangle q, let q ∩ R ≠ Φ . If else if none of the vertices will lie in qR during the query the four vertices of R lie in the query area q at different window qT, then the algorithm skips this bucket; future timestamps, then the moving points enclosed in R otherwise the bucket is explored further. Similarly, when will lie in the query area q during the future time. searching the aTPR-tree, from the root point the Proof: As an illustration of Lemma 1, consider Figure timestamps when four vertices of MBR in each node lie in 5 where the relationship between R(a, b, c, d) and query q qR is computed. If all the vertices can lie in qR during the is shown as case a) and case b), since q ∩ R ≠ Φ . query window qT, then the algorithm only need to return the aggregate information of this node; or else if none of Supposing the VBR of R is< v x− , v x+ , v −y , v +y >, then the the vertices will lie in qR during the query window qT, direction of VBR can only be depicted as in Figure 5 a) then the algorithm skips this node; otherwise this subtree and b), for vertices a, b, c and d will lie in q at future time. is explored further. In case a), c has the velocity < v x− , v +y >, for ∀p ∈ R The Algorithm 1 describes the pseudo-code for point p has the velocity < v x , v y >, where v x > v x− processing PWRA query as follows: Algorithm 1 and v y < v +y . If c lies in q at future timestamp tc, then c Input: a PWRA query q(qR,qT), output: Agg(q) 1. Initialize Agg(q), set Agg (q ) ← 0 must cross the line l2 and not cross the line l3 at tc. So that 2. For each item E in the velocity bucket queue for point p, it must have crossed line l2 before tc and still 3. Compute the timestamps when each vertex lies in qR; not cross line l3, the p lies in q at some timestamp before 4. If all the timestamps lie between qT tc. 5. Then Agg (q) ← Agg (q) + E. Agg ; In case b): for a, b and c will lie in q at future times, 6. Else if none of the timestamps lies between qT obviously at least either b or c must lies in q at some 7. Then skip E; future timestamp ts with d, then at ts the topological 8. Else get the aTPR-tree pointed by this bucket item E; relationship between R and q can be illustrated as Figure 5 9. from the root point, for each entry E in this node a). According to the discussion above, we can conclude 10. Compute the timestamps when each vertex lies in qR; that every point enclosed in R will lie in q at some future 11. If all the timestamps lie between qT timestamp. 12. Then Agg (q) ← Agg (q) + E. Agg 13. Else if none of the timestamps lies between qT 14. then skip E 15. Else explore the subtree recursively until leaf level 16. Compute the aggregate information in leaf node E 17. Set Agg ( q) ← Agg (q) + E. Agg 18. End if 19. End for 20. End if Figure 5 Topological relations between R and q 21. End for Lemma 2 Given a moving rectangle R with fixed edge End algorithm 1 velocities and a static query rectangle q, let q ∩ R = Φ . If the four vertices of R lie in the query area q at different 6 Experimental Results and Performance future timestamps, then the moving points enclosed in R Analysis will lie in the query area q during the future time. Proof: Supposing vertex d to be the first point that will 6.1 Experimental Setting and Details lie in the query q, then at this timestamp the topological relationship between R and q can be shown as in Figure 5. In this section, we evaluate the query and update So according to Lemma 1, we can deduce Lemma 2. performance of PRA-tree with aTPR-tree and TPR*-tree. Heuristic 1: Given an intermediate entry E and a PRA We use the Network-based Generator of Moving Objects query q(qR,qT), the subtree of E totally satisfy query q if [11] to generate 100k moving objects. The input to the the vertices of MBR in entry E lie in query area qR during generator is the road map of Oldenburg (a city in future time qT. If true, the query algorithm only returns the Germany). An object appears on a network node, and aggregate information of E without accessing its subtree. randomly chooses a destination. When the object reaches Heuristic 1 reduces the searching cost considerably, its destination, an update is reported by randomly while incurring rather small computational overhead. selecting the next destination. When normalize the data Specifically, the algorithm first scans the velocity bucket space to 10000×10000, the default velocity of objects is 30 equal to 20 per timestamp. At each timestamp about 0.8 performance. In addition, the larger the query range area, % moving objects update their velocities. The PRA the better the PTRA query performance of PRA tree, for queries are generated as follows: i) the query spatial the intermediate nodes in PRA-tree store the aggregate extent qRlen is set as 100×100,400×400,800×800,1200 information of this subtree, thus reducing the node ×1200,1600×1600 respectively, and the starting point of accesses needed by TPR*-tree. The aTPR-tree holds a its extent randomly distributes in (10000-qRlen)×(10000- relative worse performance than PRA-tree for its larger MBRs with time. qRlen).ii) the query time window is [Tisu,Tisu+qTlen] 0 400 800 1200 1600 20 40 60 80 100 (Tisu is the time when the query is presented ), where 800 800 1600 1600 qTlen=20,40,60,80,100 respectively. The query perform- 600 TPR*-tree 600 1200 TPR*-tree 1200 Node accesses aTPR-tree Node accesses aTPR-tree ance is measured as the average number of node accesses 400 PRA-tree 400 800 PRA-tree 800 in processing ten PRA queries with the same parameters. 200 200 400 400 The update performance is measured as the average node 0 0 0 0 accesses in executing 100 moving objects updates. 0 400 800 1200 1600 20 40 60 80 100 qRlen qTlen Table 1 summarizes the parameters of PRA-tree exploited for the workload. For all simulations, we use a a) qTlen=50 b) qRlen=1000 Celeron 2.4GHz CPU with 256MByte memory. Figure 7 comparison of PWRA query performance Table 1 PRA-tree parameters Parameter Value Description Figure 7 a) and b) shows the PWRA query cost as a Number 25/50/100/1 The velocity domain is split function of qRlen and qTlen respectively. In Figure 7 a) of buckets 50/200/250 into 5×5, 5×10, 10×10, 10×15, we fix the parameter qTlen=50 and in Figure 7 b) we fix and 10×25 respectively. the parameter qRlen=1000. As seen, the query Page size 1k The page size of PRA-tree. performance of PRA-tree works best and TPR*-tree Fanout 21 The average fanout of PRA-tree exhibits a worst performance. This is because the MBRs in intermediate node. PRA-tree overlap relatively less than those of aTPR-tree Average 3 The average height of PRA- and TPR*-tree, thus having a good query performance. height tree. While TPR*-tree and aTPR-tree may visit many 6.2 Performance Analysis unnecessary nodes for the massive overlaps between area of MBRs with time. Furthermore, PRA-tree exploits more We compare the query and update performance of PRA- precise branch-and-bound strategy, thus having a best tree, TPR*-tree and aTPR-tree by node accesses. In order performance. to study the deterioration of the indexes with time, we 5k 10k 15k 20k 25k 5k 10k 15k 20k 25k 1000 1000 50 50 measure the performance of PRA-tree, aTPR-tree and TPR*-tree TPR*-tree 800 aTPR-tree 800 40 aTPR-tree 40 TPR*-tree, using the same query workload, after every 5k Node accesses PRA-tree PRA-tree Node accesses 600 600 30 30 updates. 400 400 20 20 0 400 800 1200 1600 20 40 60 80 100 200 200 10 10 240 240 360 360 TPR*-tree TPR*-tree 0 0 0 0 180 180 270 aTPR-tree 270 5k 10k 15k 20k 25k 5k 10k 15k 20k 25k Node accesses aTPR-tree Node accesses PRA-tree PRA-tree Number of updates Number of updates 120 120 180 180 90 90 a) update cost b) query cost 60 60 Figure 8 comparison of update& query performance 0 0 0 0 0 400 800 1200 1600 20 40 60 80 100 qRlen qTlen Figure 8 compares the average PRWA query and update cost as a function of the number of updates. As a) qTlen=50 b) qRlen=1000 shown in figure 8 a), the node accesses needed in PRA- Figure 6 comparison of PTRA query performance tree update execution are far less than TPR*-tree and Figure 6 a) and b) shows the PTRA query cost as a aTPR-tree. And PRA-tree and TPR*-tree have nearly function of qRlen and qTlen respectively. In Figure 6 a) constant update cost This is because PRA-tree exploits the we fix the parameter qTlen=50 and in Figure 6 b) we fix bottom-up deletion strategy, avoiding the excessive node the parameter qRlen=1000. As seen, the query accesses for deletion search, while TPR*-tree and aTPR- performance of PRA-tree works best and TPR*-tree tree process update in top-down manner, needing more exhibits a worst performance. This is because PRA-tree is node accesses. constructed both on the space and velocity domain, the Figure 8 b) shows the node accesses in processing one area of MBRs overlap relatively less than those of aTPR- PWRA query as function of an interval of 5k updates tree and TPR*-tree, thus having a good query when fixing the parameters qTlen=50 and qRlen=1000. It performance. While TPR*-tree and aTPR-tree may visit is clear that the query cost increases with the number of many unnecessary nodes for the massive overlaps updates. The PRA-tree has a slow increasing query cost, between area of MBRs with time, causing a worse query while the cost of TPR*-tree and aTPR-tree increase 31 significantly. This is because the MBRs of PRA-tree Computation: A Survey. IEEE TKDE, NO.2, extend less than those of TPR*-tree and aTPR-tree, so the pp.271-286, 2005. degradation is not much expensive. [3] Simonas Saltenis, Christian S. Jensen, et al.. 0 50 100 150 200 250 0 50 100 150 200 250 200 200 20 20 Indexing the Positions of Continuously Moving PRA-tree 190 190 PRA-tree Objects. Proc. SIGMOD Conf., pp.331-342, 2000. Node accesses Node accesses 15 15 180 180 [4] Tao Y., Papadias D., and Sun J.. The TPR*-Tree: 170 170 10 10 An Optimized Spatio-Temporal Access Method for 160 160 5 5 Predictive Queries. Proc. VLDB Conf., pp.790-801, 0 50 100 150 200 250 0 50 100 150 200 250 Number of velocity buckets Number of velocity buckets 2003. a) effect on update cost b) effect on query cost [5] Jignesh M. Patel, Yun Chen, V. Prasad Chaka. Figure 9 effect of velocity bucket number STRIPES: An Efficient Index for Predicted Trajectories. Proc. SIGMOD Conf., pp.635-646, Figure 9 shows the effect of velocity buckets number on PWRA query and update performance of PRA-tree. 2004. We fix the parameters qTlen=50 and qRlen=1000. As [6] Yufei Tao, Dimitris Papadias, Jian Zhai, and Qing shown in Figure9 a), the update cost of PRA-tree increase Li. Venn Sampling: A Novel Prediction Technique slightly with the number of velocity buckets, because the for Moving Objects. Proc. Int’l Conf. Data Eng., small velocity bucket window causes the shift of moving pp.680-691, 2005. objects between buckets, thus incurs excessive node [7] Yufei Tao, Jimeng Sun, Dimitris Papadias. accesses. From Figure 9 b) we can see, the PWRA query Selectivity Estimation for Predictive Spatio- performance will degrade with too many buckets, this is Temporal Queries. Proc. Int’l Conf. Data Eng., because of, as mentioned earlier, the uncertainty of moving objects distribution in velocity and space, the pp.417-428, 2003. probability of overlaps between the area covered by [8] M. Jurgens and H. Lenz. The Ra*-Tree: An velocity buckets and query region don’t decrease linearly Improved R-Tree with Materialized Data for as expected. However, it may bring more TPR-tree node Supporting Range Queries on OLAP-Data. Proc. accesses. In this experiment, we set the number of DEXA Workshop, 1998. velocity bucket equal to 25. [9] Bin Lin and Jianwen Su. On bulk loading TPR-tree. Proc. IEEE Conf. Mobile Data Management 7. Conclusion (MDM), pp.114-124, 2004. This paper investigates the problem of PRA queries. Our [10] M. Lee, W. Hsu, C. Jensen, B. Cui, and K. Teo. contribution is a novel indexing method, referred to as Supporting Frequent Updates in R-Trees: A predicted range aggregate R-tree (PRA-tree). PRA-tree Bottom-Up Approach. Proc. VLDB Conf., pp.608- considers the distribution of moving objects both in 619, 2003. velocity and space domain. First, we partition the velocity [11] Thomas Brinkhoff. A Framework for Generating domain into velocity buckets, and then we use aTPR-tree to index the moving objects in each bucket. To support Network-Based Moving Objects. GeoInformatica, frequent updates a supplemented hash index on leaf nodes Vol.6 (2), pp.153-180, Kluwer, 2002. is added to PRA-tree. Also an extended bottom-up deletion algorithm is developed for PRA-tree. An open problem is to extend our work to support large scale of concurrent PRA queries efficiently. Further, the PRA query results need to be revaluated for the frequent object updates, so developing novel incremental algorithms for PRA queries is also an interesting work. References [1] Yufei Tao and Dimitris Papadias, Range Aggregate Processing in Spatial Databases. IEEE TKDE, NO.12, pp.1555-1570, 2004. [2] Ine´s Fernando Vega Lo´pez, Richard T. Snodgrass, and Bongki Moon. Spatiotemporal Aggregate 32