Time-Based Similar Trajectories on Graphs Shima Moghtasedi University of Pisa, Pisa, Italy shima.moghtasedi@di.unipi.it Abstract. Graphs can be used to represent not only road networks but also routing networks and computer networks. In such kind of appli- cations, entities, such as packets, are moving on the network and the information associated with these movements are known regarding to the vertices they traverse. In such a context a trajectory for these enti- ties can be defined as sequences of vertices and time intervals. We aim at designing fast algorithms for finding all the trajectories which are similar to (i.e. moving close to) a given trajectory in a specified time window. Keywords: graph trajectory · similarity · data structures 1 Introduction In this work, we are interested in algorithms to efficiently retrieve similar tra- jectories to a given one for an undirected graph. Many papers in literature have focused on extracting information from sets of trajectories, basically, looking at them as strings, so the similarity between trajectories corresponds to similarity between strings. In such a context, the similarity measure have been formalized by means of some concepts like longest common subsequences [8]. We aim at exploiting here the topology of the network, assessing that two trajectories are similar if they traverse “close” vertices in the same time interval. More formally, in our case, two trajectories are similar at time t if they are passing through either the same vertex or two neighbor vertices. We say that two trajectories are similar in a time interval T if they are similar for each time t ∈ T . We also say that a trajectory is close to a vertex v if the trajectory goes through either v or a neighbor of v. Using these notions of similarity, we aim at solving the following problems. Problem 1. Given a graph, a set of trajectories, a trajectory x, and a time in- terval, find the subset of trajectories which are similar to x in the specified time interval. Problem 2. Given a vertex v of a graph G, a set of trajectories, and a time interval, find all the trajectories close to v for the whole given time interval. Related work. Some papers have focused on finding trajectories moving together in a certain area of the plane in some applications like traffic monitoring, e- marketing and route finding [9, 6, 7]. These proposed algorithms do not extend 2 Shima Moghtasedi to the case when the movements are constrained to be walks on a graph. Some papers in this area, focus to cluster the segments of trajectories on the plane [2, 4]. On the other hand, there are some papers answering topological based queries, specifically on trajectories and graphs [3, 5, 10], but they do not extend smoothly to our problem. Among them, the result by Luo et al. [3] finds all the trajectories arriving to a given vertex v during time interval T = [a, b], under the restriction that each trajectory does not traverse the same vertex twice or more. In particu- lar, they find all the trajectories y there exists a time t ∈ [a, b] such that y passes by v. Although close, this result is not suitable for studying our problem, as we would need to do search in their structure for each t ∈ [a, b], to find trajectories passing v or close to v for the whole time interval T . Our contribution. In this paper, we propose a method, to solve our problems which consists of the following two phases. The preprocessing phase: given a graph and a set of trajectories, prepare suitable data structures to maintain trajectories in order to answer queries. This phase is the same for both problems. The query phase: – given a vertex and a time interval, use the data structures built in the pre- processing phase to find all trajectories traversing close to the vertex within the given time interval, which corresponds to Problem 2 – given a trajectory and a time interval, use the solutions of Problem 2 to answer queries for Problem 1. 2 Definitions and Algorithm Overview Definition 1 (Trajectory). Given an undirected graph G = (V, E), a tra- jectory x is a sequence x = h(v1 , T1 ), (v2 , T2 ), · · · , (vl , Tl )i such that for each i ∈ N, 1 ≤ i ≤ l − 1, we have (vi , vi+1 ) ∈ E and Ti is a time interval [ai , bi ] with ai , bi ∈ N, and ai ≤ bi < ai+1 ≤ bi+1 , bi + 1 = ai+1 . We call l, i.e. the length of x, as number of movements. Given a set of trajectories T , let us define the projection set Sv for each v ∈ V , as {(T, x)|(v, T ) ∈ x} (i.e. (v, T ) appears in x). As we can see in the following example, there can be more than one pair associated with a vertex for a given trajectory, since each trajectory can traverse a vertex multiple times during its lifetime. For a trajectory x and a time t ∈ N, the notation x(t) indicates the vertex v ∈ V such that x contains (v, T ) and t ∈ T . For each vertex v ∈ V , we denote Nv as the set of neighbors of vertex v and {v} ∪ Nv as Bv . Definition 2 (Time Constrained Similar Trajectories). Given two trajec- tories x, y and time t ∈ N, let u and v be vertices x(t) and y(t), respectively. We say x, y are similar at time t, if either u = v or u ∈ Nv . For a given time interval T , x and y are similar in T if they are similar for each time t ∈ T . We explain the definitions by giving a simple example. Time-Based Similar Trajectories on Graphs 3 Example 1 Graph G = (V, E), V = {v1 , . . . v4 } and a set of two trajectories T = {x1 , x2 } are shown in Figure 1. Consider (v1 , [1, 7]) in x1 . This means that trajectory x1 have traversed through the vertex v1 ∈ G for each time t ∈ [1, 7]. x1 = h(v1 , [1, 7]), (v2 , [8, 9]), (v4 , [10, 13]), (v1 , [14, 17]), (v3 , [18, 20])i x2 = h(v1 , [1, 4]), (v2 , [5, 7]), (v3 , [8, 9])i x1 x2 v1 [1, 7], [14, 17] [1, 4] v2 [8, 9] [5, 7] v3 [18, 20] [8, 9] v4 [10, 13] Fig. 1. An example of set of trajectories T = {x1 , x2 } on the graph G; Each row of the table indicates Sv , v ∈ V , implicitly. For a given trajectory x2 and the time interval T = [3, 6] in our example, the trajectory x1 is similar to x2 at time interval T , since for each time t ∈ T they are at neighbor vertices v1 and v2 . For instance, x1 is at v1 during the time interval [3, 6], and x2 is at v1 at times t = 3, 4 and at v2 for t = 5, 6. In our given example, in Figure 1, we have Sv1 = {([1, 7], x1 ), ([14, 17], x1 ), ([1, 4], x2 )}. Preprocessing and Query Definition. For both Problems 1 and 2, we define pre- processing(G, T ), where G = (V, E) is a graph and T is a set of trajectories. Given G and T , we aim at answering the following queries, which corresponds to Problem 1 and 2, respectively. query1 (x, T ). Given a trajectory x, and a time interval T = [a, b], find the set of trajectories which are similar to x in the time interval. query2 (v, T ). Given a vertex v, and a time interval T = [a, b], find the set of trajectories, traversing vertices in Bv for each time t ∈ T . Lemma 1. Suppose we can solve query2 (vi , Ti ) in time O(Z) for some Z(∗). By using its solution, query1 (x, T = [a, b]) can be answered in time O(log l + h · Z), where h is the number of movements in x within the time interval T . Proof. Let H be the set of pairs (vi , Ti ) in x, with Ti = [ai , bi ] such that b ≥ ai and a ≤ bi . We have h = |H|. We run query2 (vi , Ti ) for each pair (vi , Ti ) ∈ H. Th Let Uvi be the output set of query2 (vi , Ti ), then we report i=1 Uvi as the solution set for query1 (x, T ). We now prove that this algorithm is correct. For each pair (vi , Ti ) in x, trajectory x is at vertex vi during time interval Ti (i.e. ∀t ∈ Ti , x(t) = vi ). This means that all trajectories, traversing either vi or Nvi for each time instance t ∈ Ti , are similar to x in a time interval Ti (Definition 2). By definition of query2 , these are returned by query2 (vi , Ti ). As the set of Sh trajectories similar to x in T = i=1 Ti is the intersection of trajectories similar to x in each Ti with 1 ≤ i ≤ h, we return the intersection of the sets Uvi . As x is 4 Shima Moghtasedi a time ordered sequence of pairs, we can find the set H easily in time O(log l+h). Since, we run query2 (vi , Ti ) for each pair (vi , Ti ) ∈ H, we pay O(h · Z). The cost of the intersection is dominated by this cost. Hence, in the following we will concentrate on solving query2 . In our solution preprocessing(G, T ) is the same for both the problems. Preprocessing Phase. Recall that for each vertex u ∈ V , Su is the set of tra- jectories from T that pass through u with the corresponding S time intervals. Similarly, here we define SBu for each vertex u ∈ V as SBu = w∈Bu Sw . Let |SBu | denotes the size of the set SBu ,which is the number of movements that trajectories have done within the vertices in Bu . Given a set SBu , we define C(SBu ), a refinement of SBu , which basically considers time intervals spent by a trajectory in Bu as a unique interval in O(|SBu | · |Nu |) time. More formally, assume y is a trajectory passing through Bu , i.e. y appears in SBu . If we have consecutive pairs h(ui , Ti ), (ui+1 , Ti+1 )i belong to y, such that ui , ui+1 ∈ Bu , we have h(Ti , y), (Ti+1 , y)i ∈ SBu , so we consider (Ti ∪ Ti+1 , y) ∈ C(SBu ). For instance, assume we are given v3 ∈ V in our graph example in Figure 1, and Bv3 = {v1 , v2 , v3 }. For the trajectory x1 = h(v1 , [1, 7]), (v2 , [8, 9])i, we consider ([1, 9], x1 ) ∈ C(SBv3 ). Our preprocessing phase builds an interval tree [1] rep- resentation of C(SBu ) for each u ∈ V , considering all time intervals {Ti |∃x ∈ T s.t (Ti , x) ∈ C(SBu )}. Briefly, interval tree is a binary tree to store a set of intervals based on the median of the intervals endpoints. In this structure all the intervals intersect the median point are stored at the root of the tree. Those intervals lying completely to the left and right of the median point, are stored at the left subtree and right subtree, respectively. These subtrees are constructed recursively in the same way. The interval tree associated with a given vertex u helps to find for any given query interval T = [a, b], the set of intervals in C(SBu ) that are contained in T = [a, b]. Moreover, intervals are endowed with the trajectories they belong to, by definition of C(SBu ). Letting |C(SBu )| = ru , an interval tree on the set of intervals in C(SBu ) can be built in O(ru log ru ) time, using O(ru ) space. The interval tree has depth O(log ru ). As a result, we obtain the following lemma. Lemma 2. preprocessing(G, T ) takes O(Σu∈V (|SBu | · |Nu |) + ru log ru ) time and uses O(Σu∈V ru ) space. 2.1 Solving QUERY2 (v, T ) Given a vertex v, and a time interval T = [a, b], we want to find the set of trajectories, traversing the vertices in Bv at each time instance t ∈ T . We will make use of the interval trees built in preprocessing(G, T ). Query Phase. Consider the interval tree associated with C(SBv ) which maintains the time intervals endowed with the trajectories passing Bv they belong to. We obtain all the trajectories traversing Bv for each time t ∈ T , by finding all the Time-Based Similar Trajectories on Graphs 5 intervals which are fully contained in T . As the interval tree associated with v has depth O(log rv ), the cost of retrieving all the intervals fully contained in T is O(log rv + |Uv |), where |Uv | is the number of reported trajectory id’s. Indeed, notice that we cannot have two time intervals belonging to the same trajectory in the output set, since we are reporting the intervals fully contained in T . Theorem 1. For a vertex v and a time interval T , we can answer query2 (v, T ) in O(log rv + |Uv |) time, where |Uv | is the number of reported trajectories. Let K = Σvi |(vi ,Ti )∈H |Uvi |, by applying Lemma 1, we obtain the following. Corollary 1. query1 (x, T ) can be answered in O(log l + Σvi |(vi ,Ti )∈H log rvi + K) time. As future work, we plan to enhance our algorithm to reduce the storage usage and consider different similarity measures. Acknowledgments. The author wishes to thank R. Grossi and A. Marino for helpful discussions. References 1. De Berg, M., Van Kreveld, M., Overmars, M., Schwarzkopf, O.C.: Computational geometry. In: Computational geometry, pp. 1–17. Springer (2000) 2. Li, Y., Han, J., Yang, J.: Clustering moving objects. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 617–622. ACM (2004) 3. Luo, W., Tan, H., Chen, L., Ni, L.M.: Finding time period-based most frequent path in big trajectory data. In: Proceedings of the 2013 ACM SIGMOD interna- tional conference on management of data. pp. 713–724. ACM (2013) 4. Nanni, M., Pedreschi, D.: Time-focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems 27(3), 267–289 (2006) 5. Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Trajectory sim- ilarity join in spatial networks. Proceedings of the VLDB Endowment 10(11), 1178–1189 (2017) 6. Surynek, P.: An application of pebble motion on graphs to abstract multi-robot path planning. In: Tools with Artificial Intelligence, 2009. ICTAI’09. 21st Interna- tional Conference on. pp. 151–158. IEEE (2009) 7. Tao, Y., Papadias, D.: Efficient historical r-trees. In: Scientific and Statistical Database Management, 2001. SSDBM 2001. Proceedings. Thirteenth International Conference on. pp. 223–232. IEEE (2001) 8. Vlachos, M., Kollios, G., Gunopulos, D.: Discovering similar multidimensional tra- jectories. In: Data Engineering, 2002. Proceedings. 18th International Conference on. pp. 673–684. IEEE (2002) 9. Wang, L., Zheng, Y., Xie, X., Ma, W.Y.: A flexible spatio-temporal indexing scheme for large-scale gps track retrieval. In: Mobile Data Management, 2008. MDM’08. 9th International Conference on. pp. 1–8. IEEE (2008) 10. Wei, L.Y., Zheng, Y., Peng, W.C.: Constructing popular routes from uncertain trajectories. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 195–203. ACM (2012)