-

Predicting Globally and Locally: A Comparison of Methods for Vehicle Trajectory Prediction

William Groves

groves@cs.umn.edu 0

Ernesto Nunes

enunes@cs.umn.edu 0

Maria Gini

gini@cs.umn.edu 0 0 Department of Computer Science and Engineering, University of Minnesota , USA

2 6

We propose eigen-based and Markov-based methods to explore the global and local structure of patterns in real-world GPS taxi trajectories. Our primary goal is to predict the subsequent path of an in-progress taxi trajectory. The exploration of global and local structure in the data differentiates this work from the state-of-the-art literature in trajectory prediction methods, which mostly focuses on local structures and feature selection. We propose four algorithms: a frequency based algorithm FreqCount, which we use as a benchmark, two eigen-based (EigenStrat, LapStrat), and a Markov-based algorithm (MCStrat). Pairwise performance analysis on a large real-world data set reveals that LapStrat is the best performer, followed by MCStrat.

In order to discover characteristic patterns in large spatiotemporal data sets, mining algorithms have to take into account spatial relations, such as topology and direction, as well as temporal relations. The increased use of devices that are capable of storing driving-related spatio-temporal information helps researchers and practitioners gather the necessary data to understand driving patterns in cities, and to design location-based services for drivers. To the urban planner, the work can help to aggregate driver habits and can uncover alternative routes that could help alleviate traffic. Additionally, it also helps prioritize the maintenance of roads.

Our work combines data mining techniques that discover

global structure in the data, and local probabilistic methods that predict short-term routes for drivers, based on past driving trajectories through the road network of a city.

The literature on prediction has offered Markov-based

and other probabilistic methods that predict paths accurately.

However, most methods rely on local structure of data, and

use many extra features to improve prediction accuracy. In this paper we use only the basic spatio-temporal data stream. We advance the state-of-the-art by proposing the LapStrat algorithm. This algorithm reduces dimensionality and clusters data using spectral clustering to then predict a subsequent path using a Bayesian network. Our algorithm supports global analysis of the data, via clustering, as well as local inference using the Bayesian framework. In addition, since our algorithm only uses location and time data, it can be easily generalized to other domains with spatio-temporal information. Our contributions are summarized as follows: 1. We offer a systematic way of extracting common behavioral characteristics from a large set of observations using an algorithm inspired by principal component analysis (EigenStrat) and our LapStrat algorithm.

2. We compare the effectiveness of methods that explore

global structure only (FreqCount and EigenStrat), local structure only (MCStrat), and mixed global and local structure (LapStrat). We show experimentally that

LapStrat offers competitive prediction power compared to the more local structure-reliant MCStrat algorithm.

Related Work

Eigendecomposition has been used extensively to analyze and summarize the characteristic structure of data sets. The structure of network flows is analyzed in [Lakhina et al., 2004], principal component analysis (PCA) is used to summarize the characteristics of the flows that pass through an internet service provider. [Zhang et al., 2009] identify two weaknesses that make PCA less effective on real-world data. i.e. sensitivity to outliers in the data, and concerns about its interpretation, and present an alternative, Laplacian eigenanalysis. The difference between these methods is due to the set of relationships each method considers: the Laplacian matrix only considers similarity between close neighbors, while PCA considers relationships between all pairs of points. These studies focus on the clustering power of the eigen-based methods to find structures in the data. Our work goes beyond summarizing the structure of the taxi routes, and uses the eigenanalysis clusters to predict the subsequent path of an in-progress taxi trajectory.

Research in travel prediction based on driver behavior has enjoyed some recent popularity. [Krumm, 2010] predicts the next turn a driver will take by choosing with higher likelihood a turn that links more destinations or is more time efficient. [Ziebart et al., 2008] offer algorithms for turn prediction, route prediction, and destination prediction. The study uses a Markov model representation and inverse reinforcement learning coupled with maximum entropy to provide accurate predictions for each of their prediction tasks. [Veloso et al., 2011] proposes a Naive Bayes model to predict that a taxi will visit an area, using time of the day, day of the week, weather, and land use as features. In [Fiosina and Fiosins, 2012], travel time prediction in a decentralized setting is investigated. The work uses kernel density estimation to predict the travel time of a vehicle based on features including length of the route, average speed in the system, congestion level, number of traffic lights, and number of left turns in the route.

All these studies use features beyond location to improve prediction accuracy, but they do not offer a comprehensive analysis of the structure of traffic data alone. Our work addresses this shortcoming by providing both an analysis of commuting patterns, using eigenanalysis, and route prediction based on partial trajectories. 3

Data Preparation

The GPS trajectories we use for our experiments are taken from the publicly available Beijing Taxi data set which includes 1 to 5-minute resolution location data for over tenthousand taxis for one week in 2009 [Yuan et al., 2010]. Beijing, China is reported to have seventy-thousand registered taxis, so this data set represents a large cross-section of all taxi traffic for the one-week period [Zhu et al., 2012].

Because the data set contains only location and time information of each taxi, preprocessing the data into segments based on individual taxi fares is useful. The data has sufficient detail to facilitate inference on when a taxi ride is completed: for example, a taxi waiting for a fare will be stopped at a taxi stand for many minutes [Zhu et al., 2012]. Using these inferences, the data is separated into taxi rides.

To facilitate analysis, the taxi trajectories are discretized into transitions on a region grid with cells of size 1.5 km × 1.5 km square. V =< v1, v2, . . . , vw > is a collection of trajectories. We divide it into VTR, VTE, VVA which are the training, test, and validation sets, respectively. A trajectory vi is a sequence of N time-ordered GPS coordinates: vi =< c1vi , . . . cjvi , . . . , cvi >. Each coordinate contains a GPS lat

N itude and longitude value, cjvi = (xj , yj ). Given a complete trajectory (vi), a partial trajectory (50% of a full trajectory) can be generated as vipartial =< c1vi , c2vi , . . . , cvNi/2 >. The last location of a partial trajectory vlast =< cvNi/2 > is used i to begin the prediction task.

The relevant portion of the city’s area containing the majority of the city’s taxi trips, called a city grid, is enclosed in a matrix of dimension 17 × 20. Each si corresponds to the center of a grid square in the euclidean xy-space. The city graph is encoded as a rectilinear grid with directed edges (esisj ) between adjacent grid squares. I(cj , si) is an indicator function that returns 1 if GPS coordinate cj is closer to grid center si than to any other grid center and otherwise returns 0. Equation 1 shows an indicator function to determine if two

GPS coordinates indicate traversal in the graph.

Φ(cjvi , ckvi , eslsm ) = 1, if I(cjvi , sl) ∗ I(ckvi , sm) = 1 0

Otherwise (1)

From trajectory vi, a policy vector πi is created having one ) x e d illn 8 e c d i r g ( e d u itt a L 4 S4 S5

S6 value for each edge in the city grid. Each δsl,sm is a directed edge coefficient indicating that a transition occurred between sl and sm in the trajectory. The policy vectors for this data set graph have length (|π|) of 1286, based on the number of edges in the graph. A small sample city grid is in Figure 1. A collection of policies Π =< π1, π2, . . . , πw > is computed from a collection of trajectories V :

πvi =< δsv1i,s2 , . . . , δsvli,sm , . . . > δsvli,sm = (1, if PjN=−11 Φ(cjvi , cjv+i1, esl,sm ) ≥ 1 0

Otherwise A graphical example showing a trajectory converted into a policy is shown in Figure 2. All visited locations for trajec

GPS Waypoints (time ordered)

Policy Grid Transitions . o N e c n e u q e S it n o p y a W 12 10 8 6 4 2 0 (2) (3) (4) (5) 8 Longitude (grid cell index) 12 tory vipartial are given by θvipartial : θvipartial ωsi = =< ωs1 , ωs2 , . . . , ωsm >, with vpartial (1, if Pjn=1 I(cj i , si) ≥ 1 0

Otherwise

A baseline approach for prediction, FreqCount, uses observed probabilities of each outgoing transition from each node in the graph. Figure 3 shows the relative frequencies of transitions between grid squares in the training set. This city grid discretization is similar to methods used by others in this domain [Krumm and Horvitz, 2006; Veloso et al., 2011]. 4

Methods

This work proposes four methods that explore either the local or the global structure or a mix of both to predict short-term trajectories for drivers, based on past trajectories. location probability ) x e d n il le 8 c d i r g ( e d u itt a L 4

8 Longitude (grid cell index) 12

8 Longitude (grid cell index) 12 Benchmark: Frequency Count Method. A benchmark prediction measure, FreqCount, uses frequency counts for transitions in the training set to predict future actions. The relative frequency of each rectilinear transition from each location in the grid is computed and is normalized based on the number of trajectories involving the grid cell. The resulting policy matrix is a Markov chain that determines the next predicted action based on the current location of the vehicle.

The FreqCount method computes a policy vector based on all trajectories in the training set VT R. πFreqCount contains first order Markov transition probabilities computed from all trajectories as in Equation 6.

πFreqCount δsi,sj =

Pv∈VT R δsi,sj Pv∈VT R

PkM=1 δsvi,sk (6)

The probability of a transition (si → sj ) is computed as

the count of the transition si → sj in VT R divided by the count of all transitions exiting si in VT R.

Policy iteration (Algorithm 1) is applied to the last loca

tion of a partial trajectory using the frequency count policy set ΠFreqCount =< πFreqCount > to determine a basic prediction of future actions. This method only considers frequency of occurrence for each transition in the training set, so it is expected to perform poorly in areas where trajectories intersect.

Algorithm 1: Policy Iteration

Input: Location vector with last location of taxi θlast, a policy list Π, prediction horizon niter Output: A location vector containing visit probabilities for future locations θˆ 1 θaccum ← θlast 2 for π ∈ Π do 3 t ← 1

θ0 ← θlast while t ≤ niter do θt =< ωst1 , ωst2 , . . . , ωsti , . . . , ωstM > , where ωsti = maxsj ∈S (ωst−j1 ∗ δsπj ,si )

= max(ωsθiaccum , ωsθit ) EigenStrat: Eigen Analysis of Covariance. This method exploits linear relationships between transitions in the grid which 1) can be matched to partial trajectories for purposes of prediction and 2) can be used to study behaviors in the system. The first part of the algorithm focuses on model generation. For each pair of edges, the covariance is computed using the training set observations. The n largest eigenvectors are computed from the covariance matrix. These form a collection of characteristic eigen-strategies from training data.

When predicting for an in-progress trajectory, the algo

rithm takes the policy generated from a partial taxi trajectory πvpredict , a maximum angle to use as the relevancy threshold α, and the eigen-strategies as Π. Eigen-strategies having an angular distance less than α to πvpredict are added to Πrel.

This collection is then used for policy iteration. Optimal val

ues for α and dims are learned experimentally.

Eigenpolicies also facilitate exploration of strategic decisions. Figure 7 shows an eigenpolicy plot with a distinct pattern in the training data. Taxis were strongly confined to trajectories either the inside circle or the perimeter of the circle,

Algorithm 2: EigenStrat

Input: ΠTR, number of principal components (dims),

minimum angle between policies (α), prediction horizon (horizon), partial policy (πvipartial )

Output: Inferred location vector θˆ 1 Generate covariance matrix C|πi|×|πi| (where πi ∈ ΠTR) between transitions on the grid 2 Get the dims eigenvectors of C with largest eigenvalues 3 Compute cosine similarity between πvipartial and the principal components (πj , j = 1 . . . dims): Πrel = {πj ||cos(πj , πvipartial )| > α} 4 If the cos(πj , πvipartial ) < 0, then flip the sign of the coefficients for this eigenpolicy. Use Algorithm 1 with Πrel on vipartial for horizon iterations to compute θˆ but rarely between these regions. The two series (positive and negative) indicate the sign and magnitude of the grid coefficients for this eigenvector. We believe analysis of this type has great promise for large spatio-temporal data sets. )x 8 e d n lli e c d i r g ( e d ittu 4 a L positive directions negative directions

4 Longitude (grid cell index) 8

LapStrat: Spectral Clustering-Inspired Algorithm. Lap

Strat (Algorithm 3) combines spectral clustering and

Bayesian-based policy iteration to cluster policies and infer driver next turns. Spectral clustering operates upon a similarity graph and its respective Laplacian operator. This work follows the approach of [Shi and Malik, 2000] using an unnormalized graph Laplacian. We use Jaccard index to compute the similarity graph between policies. We chose the Jaccard index, because it finds similarities between policies that are almost parallel. This is important in cases such as two highways that only have one meeting point; in this case, if the highways are alternative routes to the same intersection, they should be similar with respect to the intersection point.

The input to the Jaccard index are two vectors representing

policies generated in Section 3. J (πi, πj ) is the Jaccard similarity for pair πi and πj . The unnormalized Laplacian is computed by subtracting the degree matrix from the similarity matrix in the same fashion as [Shi and Malik, 2000]. We choose the dims eigenvectors with smallest eigenvalues, and Algorithm 3: LapStrat

Input: ΠTR, dimension (dims), number of clusters (k),

similarity threshold (ǫ), prediction (horizon), partial policy (πvipartial )

Output: Inferred location vector θˆ 1 Generate similarity matrix W|ΠTR|×|ΠTR| where

J (πi, πj ), if J (πi, πj ) ≥ ǫ wij =

0 Otherwise 2 Generate Laplacian (L): L = D − W and ∀dij ∈ D iterations to compute θˆ (Pz=1

|ΠT R| wiz , if i = z dij =

0 Otherwise 3 Get the dims eigenvectors with smallest eigenvalues 4 Use k-means to find the mean centroids (πj , j = 1 . . . k) of k policy clusters 5 Find all centroids similar to πvipartial :

Πrel = {πj |J (πj , πvipartial ) > ǫ} 6 Use Algorithm 1 with Πrel on vipartial for horizon perform k-means to find clusters in the reduced dimension.

The optimal value for dims is learned experimentally.

MCStrat: Markov Chain-Based Algorithm. The Markov chain approach uses local, recent information from vpartial predict, the partial trajectory to predict from. Given the last k edges traversed by the vehicle, the algorithm finds all complete trajectories in the training set containing the same k edges to build a set of relevant policies Vrel using the match function. match(k, a, b) returns 1 only if at least the last k transitions in the policy generated by trajectory a are also found in b.

Using Equation 9, Vrel is used to build a composite single

relevant policy πrel, that obeys the Markov assumption, so the resulting policy preserves the probability mass.

Vrel = {vi match(k, πvppraerdtiiactl , πvi ) = 1, vi ∈ VTR} (7) πrel =< δs1,s2 , . . . , δsπir,selj , . . . >

πrel πrel δsi,sj =

Pv∈Vrel δsvi,sj

Pv∈Vrel PkM=1 δsvi,sk Using the composite πrel, policy iteration is then performed on the last location vector computed from vpredict. Method Complexity Comparison. A comparison of the storage complexity of the methods appears in Table 1.

Model FreqCount EigenStrat LapStrat MCStrat Model Construction

O(|π|) O((|π|)2) O((|ΠT R |)2) O(1)

Model Storage

O(|π|) O(dims × |π|) O(k × |π|) O(|ΠT R| × |π|)

Results

Given an in-progress taxi trajectory, the methods presented facilitate predictions about the future movement of the vehicle. To simulate this task, a collection of partial trajectories (e.g. Figure 4) is generated from complete trajectories in the test set. A set of relevant policy vectors is generated using one of the four methods described, and policy iteration is performed to generate the future location predictions. The inferred future location matrix (e.g. Figure 5) is compared against the actual complete taxi trajectory (e.g. Figure 6). Prediction results are scored by comparing the inferred visited location vector θˆ against the full location vector θvi . The scores are computed using Pearson’s correlation: score = Cor(θˆ, θvi ). The scores reported are the aggregate mean of scores from examples in the validation set.

The data set contains 100,000 subtrajectories (of approximately 1 hour in length) from 10,000 taxis. The data set is split randomly into 3 disjoint collections to facilitate experimentation: 90% in the training set, and 5% in both the test and validation sets. For each model type, the training set is used to generate the model. Model parameters are optimized using the test set. Scores are computed using predictions made on partial trajectories from the validation set.

Results of each method for 4 prediction horizons are shown in Table 2. The methods leveraging more local information near the last location of the vehicle (LapStrat, MCStrat) perform better than the methods relying only on global patterns (FreqCount, EigenStrat). This is true for all prediction horizons, but the more local methods have an even greater performance advantage for larger prediction horizons.

Statistical significance testing was performed on the vali

dation set results, as shown in Table 3. The best performing methods (LapStrat and MCStrat) achieve a statistically significant performance improvement over the other methods.

However, the relative performance difference between the local methods is not significantly different.

Conclusions

The methods presented can be applied to many other spatiotemporal domains where only basic location and time information is collected from portable devices, such as sensor networks as well as mobile phone networks. These predictions assume the action space is large but fixed and observations implicitly are clustered into distinct but repeated goals. In this domain, each observation is a set of actions a driver takes in fulfillment of a specific goal: for example, to take a passenger from the airport to his/her home. In future work, we pro

FreqCount EigenStrat LapStrat MCStrat

n/a pose to extend this work using a hierarchical approach which simultaneously incorporates global and local predictions to provide more robust results.

[Fiosina and Fiosins , 2012]

Fiosina and

Fiosins . Cooperative kernel-based forecasting in decentralized multiagent systems for urban traffic networks . In Ubiquitous Data Mining , pages 3 - 7 . ECAI, 2012 .

[Krumm and Horvitz , 2006]

Krumm and

Horvitz . Predestination: Inferring destinations from partial trajectories . UbiComp 2006 , pages 243 - 260 , 2006 .

[Krumm , 2010]

Krumm . Where will they turn: predicting turn proportions at intersections . Personal and Ubiquitous Computing , 14 ( 7 ): 591 - 599 , 2010 .

[Lakhina et al., 2004 ]

Lakhina ,

Papagiannaki ,

Crovella ,

Diot ,

E. D.

Kolaczyk , and

Taft . Structural analysis of network traffic flows . Perform. Eval. Rev. , 32 ( 1 ): 61 - 72 , 2004 .

[Shi and Malik , 2000]

Shi and

Malik . Normalized cuts and image segmentation . IEEE Trans. on Pattern Analysis and Machine Intelligence , 22 ( 8 ): 888 - 905 , 2000 .

[Veloso et al., 2011 ]

Veloso ,

Phithakkitnukoon , and

Bento . Urban mobility study using taxi traces . In Proc. of the 2011 Int'l Workshop on Trajectory Data Mining and Analysis , pages 23 - 30 , 2011 .

[Yuan et al., 2010 ]

Yuan ,

Zheng ,

Zhang ,

Xie ,

Xie , G. Sun, and

Huang . T-drive: driving directions based on taxi trajectories . In Proc. of the 18th SIGSPATIAL Int'l Conf. on Advances in GIS , pages 99 - 108 , 2010 .

[Zhang et al., 2009 ]

Zhang , P. Niyogi, and Mary

McPeek . Laplacian eigenfunctions learn population structure . PLoS ONE , 4 ( 12 ): e7928 , 12 2009 .

[Zhu et al., 2012 ]

Zhu ,

Zheng ,

Zhang ,

Santani ,

Xie , and

Yang. Inferring Taxi Status Using GPS Trajectories. ArXiv e-prints, May 2012 .

[Ziebart et al., 2008 ]

Ziebart ,

Maas ,

Dey , and

Bagnell . Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior . In Proc. of the 10th Int'l Conf. on Ubiquitous computing, UbiComp '08 , pages 322 - 331 , 2008 .