Time-Based Similar Trajectories on Graphs

                                 Shima Moghtasedi

                           University of Pisa, Pisa, Italy
                          shima.moghtasedi@di.unipi.it


      Abstract. Graphs can be used to represent not only road networks but
      also routing networks and computer networks. In such kind of appli-
      cations, entities, such as packets, are moving on the network and the
      information associated with these movements are known regarding to
      the vertices they traverse. In such a context a trajectory for these enti-
      ties can be defined as sequences of vertices and time intervals. We aim at
      designing fast algorithms for finding all the trajectories which are similar
      to (i.e. moving close to) a given trajectory in a specified time window.

      Keywords: graph trajectory · similarity · data structures


1   Introduction
In this work, we are interested in algorithms to efficiently retrieve similar tra-
jectories to a given one for an undirected graph. Many papers in literature have
focused on extracting information from sets of trajectories, basically, looking at
them as strings, so the similarity between trajectories corresponds to similarity
between strings. In such a context, the similarity measure have been formalized
by means of some concepts like longest common subsequences [8]. We aim at
exploiting here the topology of the network, assessing that two trajectories are
similar if they traverse “close” vertices in the same time interval. More formally,
in our case, two trajectories are similar at time t if they are passing through
either the same vertex or two neighbor vertices. We say that two trajectories are
similar in a time interval T if they are similar for each time t ∈ T . We also say
that a trajectory is close to a vertex v if the trajectory goes through either v or
a neighbor of v. Using these notions of similarity, we aim at solving the following
problems.
Problem 1. Given a graph, a set of trajectories, a trajectory x, and a time in-
terval, find the subset of trajectories which are similar to x in the specified time
interval.

Problem 2. Given a vertex v of a graph G, a set of trajectories, and a time
interval, find all the trajectories close to v for the whole given time interval.

Related work. Some papers have focused on finding trajectories moving together
in a certain area of the plane in some applications like traffic monitoring, e-
marketing and route finding [9, 6, 7]. These proposed algorithms do not extend
2         Shima Moghtasedi

to the case when the movements are constrained to be walks on a graph. Some
papers in this area, focus to cluster the segments of trajectories on the plane [2,
4]. On the other hand, there are some papers answering topological based queries,
specifically on trajectories and graphs [3, 5, 10], but they do not extend smoothly
to our problem. Among them, the result by Luo et al. [3] finds all the trajectories
arriving to a given vertex v during time interval T = [a, b], under the restriction
that each trajectory does not traverse the same vertex twice or more. In particu-
lar, they find all the trajectories y there exists a time t ∈ [a, b] such that y passes
by v. Although close, this result is not suitable for studying our problem, as we
would need to do search in their structure for each t ∈ [a, b], to find trajectories
passing v or close to v for the whole time interval T .

Our contribution. In this paper, we propose a method, to solve our problems
which consists of the following two phases.
    The preprocessing phase: given a graph and a set of trajectories, prepare
suitable data structures to maintain trajectories in order to answer queries. This
phase is the same for both problems.
    The query phase:
    – given a vertex and a time interval, use the data structures built in the pre-
      processing phase to find all trajectories traversing close to the vertex within
      the given time interval, which corresponds to Problem 2
    – given a trajectory and a time interval, use the solutions of Problem 2 to
      answer queries for Problem 1.


2      Definitions and Algorithm Overview
Definition 1 (Trajectory). Given an undirected graph G = (V, E), a tra-
jectory x is a sequence x = h(v1 , T1 ), (v2 , T2 ), · · · , (vl , Tl )i such that for each
i ∈ N, 1 ≤ i ≤ l − 1, we have (vi , vi+1 ) ∈ E and Ti is a time interval [ai , bi ] with
ai , bi ∈ N, and ai ≤ bi < ai+1 ≤ bi+1 , bi + 1 = ai+1 . We call l, i.e. the length of
x, as number of movements.

Given a set of trajectories T , let us define the projection set Sv for each v ∈ V ,
as {(T, x)|(v, T ) ∈ x} (i.e. (v, T ) appears in x). As we can see in the following
example, there can be more than one pair associated with a vertex for a given
trajectory, since each trajectory can traverse a vertex multiple times during its
lifetime. For a trajectory x and a time t ∈ N, the notation x(t) indicates the
vertex v ∈ V such that x contains (v, T ) and t ∈ T . For each vertex v ∈ V , we
denote Nv as the set of neighbors of vertex v and {v} ∪ Nv as Bv .
Definition 2 (Time Constrained Similar Trajectories). Given two trajec-
tories x, y and time t ∈ N, let u and v be vertices x(t) and y(t), respectively.
We say x, y are similar at time t, if either u = v or u ∈ Nv . For a given time
interval T , x and y are similar in T if they are similar for each time t ∈ T .
We explain the definitions by giving a simple example.
                                    Time-Based Similar Trajectories on Graphs           3

Example 1 Graph G = (V, E), V = {v1 , . . . v4 } and a set of two trajectories
T = {x1 , x2 } are shown in Figure 1. Consider (v1 , [1, 7]) in x1 . This means that
trajectory x1 have traversed through the vertex v1 ∈ G for each time t ∈ [1, 7].
    x1 = h(v1 , [1, 7]), (v2 , [8, 9]), (v4 , [10, 13]), (v1 , [14, 17]), (v3 , [18, 20])i
    x2 = h(v1 , [1, 4]), (v2 , [5, 7]), (v3 , [8, 9])i


                 x1         x2
       v1 [1, 7], [14, 17] [1, 4]
       v2      [8, 9]      [5, 7]
       v3     [18, 20]     [8, 9]
       v4     [10, 13]


Fig. 1. An example of set of trajectories T = {x1 , x2 } on the graph G; Each row of
the table indicates Sv , v ∈ V , implicitly.


    For a given trajectory x2 and the time interval T = [3, 6] in our example,
the trajectory x1 is similar to x2 at time interval T , since for each time t ∈ T
they are at neighbor vertices v1 and v2 . For instance, x1 is at v1 during the time
interval [3, 6], and x2 is at v1 at times t = 3, 4 and at v2 for t = 5, 6. In our
given example, in Figure 1, we have Sv1 = {([1, 7], x1 ), ([14, 17], x1 ), ([1, 4], x2 )}.

Preprocessing and Query Definition. For both Problems 1 and 2, we define pre-
processing(G, T ), where G = (V, E) is a graph and T is a set of trajectories.
Given G and T , we aim at answering the following queries, which corresponds
to Problem 1 and 2, respectively.
 query1 (x, T ). Given a trajectory x, and a time interval T = [a, b], find the set
   of trajectories which are similar to x in the time interval.
 query2 (v, T ). Given a vertex v, and a time interval T = [a, b], find the set of
   trajectories, traversing vertices in Bv for each time t ∈ T .

Lemma 1. Suppose we can solve query2 (vi , Ti ) in time O(Z) for some Z(∗).
By using its solution, query1 (x, T = [a, b]) can be answered in time O(log l +
h · Z), where h is the number of movements in x within the time interval T .

Proof. Let H be the set of pairs (vi , Ti ) in x, with Ti = [ai , bi ] such that b ≥ ai
and a ≤ bi . We have h = |H|. We run query2 (vi , Ti ) for each pair (vi , Ti ) ∈ H.
                                                                       Th
Let Uvi be the output set of query2 (vi , Ti ), then we report i=1 Uvi as the
solution set for query1 (x, T ). We now prove that this algorithm is correct. For
each pair (vi , Ti ) in x, trajectory x is at vertex vi during time interval Ti (i.e.
∀t ∈ Ti , x(t) = vi ). This means that all trajectories, traversing either vi or Nvi
for each time instance t ∈ Ti , are similar to x in a time interval Ti (Definition 2).
By definition of query2 , these are returned by query2 (vi , Ti ). As the set of
                                   Sh
trajectories similar to x in T = i=1 Ti is the intersection of trajectories similar
to x in each Ti with 1 ≤ i ≤ h, we return the intersection of the sets Uvi . As x is
4       Shima Moghtasedi

a time ordered sequence of pairs, we can find the set H easily in time O(log l+h).
Since, we run query2 (vi , Ti ) for each pair (vi , Ti ) ∈ H, we pay O(h · Z). The
cost of the intersection is dominated by this cost.

Hence, in the following we will concentrate on solving query2 . In our solution
preprocessing(G, T ) is the same for both the problems.

Preprocessing Phase. Recall that for each vertex u ∈ V , Su is the set of tra-
jectories from T that pass through u with the corresponding S              time intervals.
Similarly, here we define SBu for each vertex u ∈ V as SBu = w∈Bu Sw . Let
|SBu | denotes the size of the set SBu ,which is the number of movements that
trajectories have done within the vertices in Bu . Given a set SBu , we define
C(SBu ), a refinement of SBu , which basically considers time intervals spent by
a trajectory in Bu as a unique interval in O(|SBu | · |Nu |) time. More formally,
assume y is a trajectory passing through Bu , i.e. y appears in SBu . If we have
consecutive pairs h(ui , Ti ), (ui+1 , Ti+1 )i belong to y, such that ui , ui+1 ∈ Bu ,
we have h(Ti , y), (Ti+1 , y)i ∈ SBu , so we consider (Ti ∪ Ti+1 , y) ∈ C(SBu ). For
instance, assume we are given v3 ∈ V in our graph example in Figure 1, and
Bv3 = {v1 , v2 , v3 }. For the trajectory x1 = h(v1 , [1, 7]), (v2 , [8, 9])i, we consider
([1, 9], x1 ) ∈ C(SBv3 ). Our preprocessing phase builds an interval tree [1] rep-
resentation of C(SBu ) for each u ∈ V , considering all time intervals {Ti |∃x ∈
T s.t (Ti , x) ∈ C(SBu )}. Briefly, interval tree is a binary tree to store a set of
intervals based on the median of the intervals endpoints. In this structure all
the intervals intersect the median point are stored at the root of the tree. Those
intervals lying completely to the left and right of the median point, are stored at
the left subtree and right subtree, respectively. These subtrees are constructed
recursively in the same way. The interval tree associated with a given vertex
u helps to find for any given query interval T = [a, b], the set of intervals in
C(SBu ) that are contained in T = [a, b]. Moreover, intervals are endowed with
the trajectories they belong to, by definition of C(SBu ). Letting |C(SBu )| = ru ,
an interval tree on the set of intervals in C(SBu ) can be built in O(ru log ru )
time, using O(ru ) space. The interval tree has depth O(log ru ). As a result, we
obtain the following lemma.
Lemma 2. preprocessing(G, T ) takes O(Σu∈V (|SBu | · |Nu |) + ru log ru ) time
and uses O(Σu∈V ru ) space.


2.1   Solving QUERY2 (v, T )

Given a vertex v, and a time interval T = [a, b], we want to find the set of
trajectories, traversing the vertices in Bv at each time instance t ∈ T . We will
make use of the interval trees built in preprocessing(G, T ).

Query Phase. Consider the interval tree associated with C(SBv ) which maintains
the time intervals endowed with the trajectories passing Bv they belong to. We
obtain all the trajectories traversing Bv for each time t ∈ T , by finding all the
                                  Time-Based Similar Trajectories on Graphs           5

intervals which are fully contained in T . As the interval tree associated with v
has depth O(log rv ), the cost of retrieving all the intervals fully contained in T
is O(log rv + |Uv |), where |Uv | is the number of reported trajectory id’s. Indeed,
notice that we cannot have two time intervals belonging to the same trajectory
in the output set, since we are reporting the intervals fully contained in T .
Theorem 1. For a vertex v and a time interval T , we can answer query2 (v, T )
in O(log rv + |Uv |) time, where |Uv | is the number of reported trajectories.
Let K = Σvi |(vi ,Ti )∈H |Uvi |, by applying Lemma 1, we obtain the following.
Corollary 1. query1 (x, T ) can be answered in O(log l + Σvi |(vi ,Ti )∈H log rvi +
K) time.
As future work, we plan to enhance our algorithm to reduce the storage usage
and consider different similarity measures.

Acknowledgments. The author wishes to thank R. Grossi and A. Marino for
helpful discussions.

References
 1. De Berg, M., Van Kreveld, M., Overmars, M., Schwarzkopf, O.C.: Computational
    geometry. In: Computational geometry, pp. 1–17. Springer (2000)
 2. Li, Y., Han, J., Yang, J.: Clustering moving objects. In: Proceedings of the tenth
    ACM SIGKDD international conference on Knowledge discovery and data mining.
    pp. 617–622. ACM (2004)
 3. Luo, W., Tan, H., Chen, L., Ni, L.M.: Finding time period-based most frequent
    path in big trajectory data. In: Proceedings of the 2013 ACM SIGMOD interna-
    tional conference on management of data. pp. 713–724. ACM (2013)
 4. Nanni, M., Pedreschi, D.: Time-focused clustering of trajectories of moving objects.
    Journal of Intelligent Information Systems 27(3), 267–289 (2006)
 5. Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Trajectory sim-
    ilarity join in spatial networks. Proceedings of the VLDB Endowment 10(11),
    1178–1189 (2017)
 6. Surynek, P.: An application of pebble motion on graphs to abstract multi-robot
    path planning. In: Tools with Artificial Intelligence, 2009. ICTAI’09. 21st Interna-
    tional Conference on. pp. 151–158. IEEE (2009)
 7. Tao, Y., Papadias, D.: Efficient historical r-trees. In: Scientific and Statistical
    Database Management, 2001. SSDBM 2001. Proceedings. Thirteenth International
    Conference on. pp. 223–232. IEEE (2001)
 8. Vlachos, M., Kollios, G., Gunopulos, D.: Discovering similar multidimensional tra-
    jectories. In: Data Engineering, 2002. Proceedings. 18th International Conference
    on. pp. 673–684. IEEE (2002)
 9. Wang, L., Zheng, Y., Xie, X., Ma, W.Y.: A flexible spatio-temporal indexing
    scheme for large-scale gps track retrieval. In: Mobile Data Management, 2008.
    MDM’08. 9th International Conference on. pp. 1–8. IEEE (2008)
10. Wei, L.Y., Zheng, Y., Peng, W.C.: Constructing popular routes from uncertain
    trajectories. In: Proceedings of the 18th ACM SIGKDD international conference
    on Knowledge discovery and data mining. pp. 195–203. ACM (2012)