=Paper=
{{Paper
|id=Vol-1088/paper1
|storemode=property
|title=Predicting Globally and Locally: A Comparison of Methods for Vehicle Trajectory Prediction
|pdfUrl=https://ceur-ws.org/Vol-1088/paper1.pdf
|volume=Vol-1088
|dblpUrl=https://dblp.org/rec/conf/ijcai/GrovesNG13
}}
==Predicting Globally and Locally: A Comparison of Methods for Vehicle Trajectory Prediction==
Predicting Globally and Locally: A Comparison of Methods for Vehicle Trajectory Prediction William Groves, Ernesto Nunes, and Maria Gini Department of Computer Science and Engineering, University of Minnesota {groves, enunes, gini}@cs.umn.edu Abstract global analysis of the data, via clustering, as well as local in- ference using the Bayesian framework. In addition, since our We propose eigen-based and Markov-based meth- algorithm only uses location and time data, it can be easily ods to explore the global and local structure of generalized to other domains with spatio-temporal informa- patterns in real-world GPS taxi trajectories. Our tion. Our contributions are summarized as follows: primary goal is to predict the subsequent path of an in-progress taxi trajectory. The exploration of 1. We offer a systematic way of extracting common behav- global and local structure in the data differenti- ioral characteristics from a large set of observations us- ates this work from the state-of-the-art literature ing an algorithm inspired by principal component anal- in trajectory prediction methods, which mostly fo- ysis (EigenStrat) and our LapStrat algorithm. cuses on local structures and feature selection. We 2. We compare the effectiveness of methods that explore propose four algorithms: a frequency based algo- global structure only (FreqCount and EigenStrat), lo- rithm FreqCount, which we use as a benchmark, cal structure only (MCStrat), and mixed global and lo- two eigen-based (EigenStrat, LapStrat), and a cal structure (LapStrat). We show experimentally that Markov-based algorithm (MCStrat). Pairwise per- LapStrat offers competitive prediction power compared formance analysis on a large real-world data set re- to the more local structure-reliant MCStrat algorithm. veals that LapStrat is the best performer, followed by MCStrat. 2 Related Work Eigendecomposition has been used extensively to analyze and 1 Introduction summarize the characteristic structure of data sets. The struc- In order to discover characteristic patterns in large spatio- ture of network flows is analyzed in [Lakhina et al., 2004], temporal data sets, mining algorithms have to take into ac- principal component analysis (PCA) is used to summarize the count spatial relations, such as topology and direction, as well characteristics of the flows that pass through an internet ser- as temporal relations. The increased use of devices that are vice provider. [Zhang et al., 2009] identify two weaknesses capable of storing driving-related spatio-temporal informa- that make PCA less effective on real-world data. i.e. sensi- tion helps researchers and practitioners gather the necessary tivity to outliers in the data, and concerns about its interpreta- data to understand driving patterns in cities, and to design tion, and present an alternative, Laplacian eigenanalysis. The location-based services for drivers. To the urban planner, the difference between these methods is due to the set of relation- work can help to aggregate driver habits and can uncover al- ships each method considers: the Laplacian matrix only con- ternative routes that could help alleviate traffic. Additionally, siders similarity between close neighbors, while PCA consid- it also helps prioritize the maintenance of roads. ers relationships between all pairs of points. These studies Our work combines data mining techniques that discover focus on the clustering power of the eigen-based methods to global structure in the data, and local probabilistic methods find structures in the data. Our work goes beyond summariz- that predict short-term routes for drivers, based on past driv- ing the structure of the taxi routes, and uses the eigenanalysis ing trajectories through the road network of a city. clusters to predict the subsequent path of an in-progress taxi The literature on prediction has offered Markov-based trajectory. and other probabilistic methods that predict paths accurately. Research in travel prediction based on driver behavior has However, most methods rely on local structure of data, and enjoyed some recent popularity. [Krumm, 2010] predicts the use many extra features to improve prediction accuracy. In next turn a driver will take by choosing with higher likeli- this paper we use only the basic spatio-temporal data stream. hood a turn that links more destinations or is more time effi- We advance the state-of-the-art by proposing the LapStrat cient. [Ziebart et al., 2008] offer algorithms for turn predic- algorithm. This algorithm reduces dimensionality and clus- tion, route prediction, and destination prediction. The study ters data using spectral clustering to then predict a subse- uses a Markov model representation and inverse reinforce- quent path using a Bayesian network. Our algorithm supports ment learning coupled with maximum entropy to provide ac- curate predictions for each of their prediction tasks. [Veloso S1 e S1 S2 S2 e S2 S3 S3 et al., 2011] proposes a Naive Bayes model to predict that a e e taxi will visit an area, using time of the day, day of the week, S2 S1 S3 S2 weather, and land use as features. In [Fiosina and Fiosins, e e e e e e S4 S1 S1 S4 S5 S2 S2 S5 S6 S3 S3 S6 2012], travel time prediction in a decentralized setting is in- vestigated. The work uses kernel density estimation to predict S4 e S4 S5 S5 e S5 S6 S6 the travel time of a vehicle based on features including length e e of the route, average speed in the system, congestion level, S5 S4 S6 S5 number of traffic lights, and number of left turns in the route. All these studies use features beyond location to improve Figure 1: City grid transitions are all rectilinear. prediction accuracy, but they do not offer a comprehensive analysis of the structure of traffic data alone. Our work ad- value for each edge in the city grid. Each δsl ,sm is a directed dresses this shortcoming by providing both an analysis of edge coefficient indicating that a transition occurred between commuting patterns, using eigenanalysis, and route predic- sl and sm in the trajectory. The policy vectors for this data tion based on partial trajectories. set graph have length (|π|) of 1286, based on the number of edges in the graph. A small sample city grid is in Figure 1. A 3 Data Preparation collection of policies Π =< π1 , π2 , . . . , πw > is computed from a collection of trajectories V : The GPS trajectories we use for our experiments are taken from the publicly available Beijing Taxi data set which in- π vi =< δsv1i,s2 , . . . , δsvli,sm , . . . > (2) cludes 1 to 5-minute resolution location data for over ten- ( PN −1 vi vi thousand taxis for one week in 2009 [Yuan et al., 2010]. Bei- 1, if j=1 Φ(cj , cj+1 , esl ,sm ) ≥ 1 δsvli,sm = (3) jing, China is reported to have seventy-thousand registered 0 Otherwise taxis, so this data set represents a large cross-section of all A graphical example showing a trajectory converted into a taxi traffic for the one-week period [Zhu et al., 2012]. policy is shown in Figure 2. All visited locations for trajec- Because the data set contains only location and time in- formation of each taxi, preprocessing the data into segments 12 based on individual taxi fares is useful. The data has sufficient GPS Waypoints (time ordered) Policy Grid Transitions detail to facilitate inference on when a taxi ride is completed: 10 Waypoint Sequence No. Latitude (grid cell index) for example, a taxi waiting for a fare will be stopped at a taxi stand for many minutes [Zhu et al., 2012]. Using these infer- 8 8 ences, the data is separated into taxi rides. 6 To facilitate analysis, the taxi trajectories are discretized into transitions on a region grid with cells of size 1.5 km × 4 1.5 km square. V =< v1 , v2 , . . . , vw > is a collection of trajectories. We divide it into VTR , VTE , VVA which are the 2 training, test, and validation sets, respectively. A trajectory vi is a sequence of N time-ordered GPS coordinates: vi =< 4 8 12 0 cv1 i , . . . cvj i , . . . , cvNi >. Each coordinate contains a GPS lat- Longitude (grid cell index) itude and longitude value, cvj i = (xj , yj ). Given a complete trajectory (vi ), a partial trajectory (50% of a full trajectory) Figure 2: A trajectory converted to a policy in the city grid. can be generated as vipartial =< cv1 i , cv2 i , . . . , cvN/2 i >. The partial last location of a partial trajectory vi last vi =< cN/2 > is used tory vipartial are given by θ vi : to begin the prediction task. vipartial θ =< ωs1 , ωs2 , . . . , ωsm >, with (4) The relevant portion of the city’s area containing the ma- ( Pn vipartial jority of the city’s taxi trips, called a city grid, is enclosed ωsi = 1, if j=1 I(c j , si ) ≥ 1 (5) in a matrix of dimension 17 × 20. Each si corresponds to 0 Otherwise the center of a grid square in the euclidean xy-space. The city graph is encoded as a rectilinear grid with directed edges A baseline approach for prediction, FreqCount, uses ob- (esi sj ) between adjacent grid squares. I(cj , si ) is an indicator served probabilities of each outgoing transition from each function that returns 1 if GPS coordinate cj is closer to grid node in the graph. Figure 3 shows the relative frequencies center si than to any other grid center and otherwise returns of transitions between grid squares in the training set. This 0. Equation 1 shows an indicator function to determine if two city grid discretization is similar to methods used by others in GPS coordinates indicate traversal in the graph. this domain [Krumm and Horvitz, 2006; Veloso et al., 2011]. 1, if I(cvj i , sl ) ∗ I(ckvi , sm ) = 1 4 Methods vi vi Φ(cj , ck , esl sm ) = 0 Otherwise This work proposes four methods that explore either the local (1) or the global structure or a mix of both to predict short-term From trajectory vi , a policy vector πi is created having one trajectories for drivers, based on past trajectories. location probability Latitude (grid cell index) Latitude (grid cell index) Latitude (grid cell index) 8 8 8 4 4 4 8 12 8 12 8 12 Longitude (grid cell index) Longitude (grid cell index) Longitude (grid cell index) partial Figure 4: A sample partial policy π vi Figure 5: θ̂, location probabilities from Fre- Figure 6: Actual complete trajectory corre- qCount method with horizon of 3 sponding to Fig. 4, trajectory π vi 15 Algorithm 1: Policy Iteration Input: Location vector with last location of taxi θ last , a Latitude (grid cell index) 12 policy list Π, prediction horizon niter Output: A location vector containing visit probabilities 9 for future locations θ̂ accum 1 θ ← θ last 6 2 for π ∈ Π do 3 t←1 3 4 θ 0 ← θ last 5 while t ≤ niter do 0 0 3 6 9 12 15 6 θ t =< ωst 1 , ωst 2 , . . . , ωst i , . . . , ωst M > Longitude (grid cell index) 7 , where ωst i = maxsj ∈S (ωst−1 j ∗ δsπj ,si ) 8 t←t+1 Figure 3: Visualization of frequency counts for edge transitions in the training set. Longer arrows indicate more occurrences. 9 for Si ∈ S do accum accum t 10 ωsθi = max(ωsθi , ωsθi ) Benchmark: Frequency Count Method. A benchmark pre- diction measure, FreqCount, uses frequency counts for tran- 11 θ̂ = θ accum sitions in the training set to predict future actions. The relative frequency of each rectilinear transition from each location in the grid is computed and is normalized based on the number EigenStrat: Eigen Analysis of Covariance. This method of trajectories involving the grid cell. The resulting policy exploits linear relationships between transitions in the grid matrix is a Markov chain that determines the next predicted which 1) can be matched to partial trajectories for purposes action based on the current location of the vehicle. of prediction and 2) can be used to study behaviors in the The FreqCount method computes a policy vector based system. The first part of the algorithm focuses on model gen- on all trajectories in the training set VT R . π FreqCount contains eration. For each pair of edges, the covariance is computed first order Markov transition probabilities computed from all using the training set observations. The n largest eigenvectors trajectories as in Equation 6. are computed from the covariance matrix. These form a col- P v lection of characteristic eigen-strategies from training data. πFreqCount v∈VT R δsi ,sj δsi ,sj = P PM v (6) When predicting for an in-progress trajectory, the algo- v∈VT R k=1 δsi ,sk rithm takes the policy generated from a partial taxi trajectory The probability of a transition (si → sj ) is computed as π vpredict , a maximum angle to use as the relevancy threshold the count of the transition si → sj in VT R divided by the α, and the eigen-strategies as Π. Eigen-strategies having an count of all transitions exiting si in VT R . angular distance less than α to π vpredict are added to Πrel . Policy iteration (Algorithm 1) is applied to the last loca- This collection is then used for policy iteration. Optimal val- tion of a partial trajectory using the frequency count policy ues for α and dims are learned experimentally. set ΠFreqCount =< π FreqCount > to determine a basic predic- Eigenpolicies also facilitate exploration of strategic deci- tion of future actions. This method only considers frequency sions. Figure 7 shows an eigenpolicy plot with a distinct pat- of occurrence for each transition in the training set, so it is ex- tern in the training data. Taxis were strongly confined to tra- pected to perform poorly in areas where trajectories intersect. jectories either the inside circle or the perimeter of the circle, Algorithm 2: EigenStrat Algorithm 3: LapStrat Input: ΠTR , number of principal components (dims), Input: ΠTR , dimension (dims), number of clusters (k), minimum angle between policies (α), prediction similarity threshold (ǫ), prediction (horizon), partial partial horizon (horizon), partial policy (π vi ) partial policy (π vi ) Output: Inferred location vector θ̂ Output: Inferred location vector θ̂ 1 Generate covariance matrix C|πi |×|πi | (where πi ∈ ΠTR ) 1 Generate similarity matrix W|ΠTR |×|ΠTR | where between transitions on the grid J(πi , πj ), if J(πi , πj ) ≥ ǫ 2 Get the dims eigenvectors of C with largest eigenvalues wij = 0 Otherwise v partial 3 Compute cosine similarity between π i and the 2 Generate Laplacian (L): L = D − W and ∀dij ∈ D principal components (πj , j = 1 . . . dims): (P |ΠT R | partial dij = z=1 wiz , if i = z Πrel = {πj ||cos(πj , π vi )| > α} vi partial 0 Otherwise 4 If the cos(πj , π ) < 0, then flip the sign of the 3 Get the dims eigenvectors with smallest eigenvalues coefficients for this eigenpolicy. Use Algorithm 1 with 4 Use k-means to find the mean centroids (πj , j = 1 . . . k) Πrel on vipartial for horizon iterations to compute θ̂ of k policy clusters v partial 5 Find all centroids similar to π i : partial vi but rarely between these regions. The two series (positive and Πrel = {πj |J(πj , π ) > ǫ} partial negative) indicate the sign and magnitude of the grid coeffi- 6 Use Algorithm 1 with Πrel on vi for horizon cients for this eigenvector. We believe analysis of this type iterations to compute θ̂ has great promise for large spatio-temporal data sets. perform k-means to find clusters in the reduced dimension. positive directions The optimal value for dims is learned experimentally. negative directions Latitude (grid cell index) 8 MCStrat: Markov Chain-Based Algorithm. The Markov partial chain approach uses local, recent information from vpredict , the partial trajectory to predict from. Given the last k edges traversed by the vehicle, the algorithm finds all complete tra- jectories in the training set containing the same k edges to 4 build a set of relevant policies Vrel using the match function. match(k, a, b) returns 1 only if at least the last k transitions in the policy generated by trajectory a are also found in b. Using Equation 9, Vrel is used to build a composite single 4 8 relevant policy πrel , that obeys the Markov assumption, so Longitude (grid cell index) the resulting policy preserves the probability mass. Figure 7: An eigenpolicy showing a strategic pattern. partial Vrel = {vi match(k, π vpredict , π vi ) = 1, vi ∈ VTR } (7) rel rel LapStrat: Spectral Clustering-Inspired Algorithm. Lap- πrel =< δsπ1 ,s2 , . . . , δsπi ,sj , . . . > (8) Strat (Algorithm 3) combines spectral clustering and P v πrel v∈Vrel δsi ,sj Bayesian-based policy iteration to cluster policies and infer δsi ,sj = P PM v (9) driver next turns. Spectral clustering operates upon a simi- v∈Vrel k=1 δsi ,sk larity graph and its respective Laplacian operator. This work Using the composite πrel , policy iteration is then performed follows the approach of [Shi and Malik, 2000] using an un- on the last location vector computed from vpredict . normalized graph Laplacian. We use Jaccard index to com- pute the similarity graph between policies. We chose the Jac- Method Complexity Comparison. A comparison of the card index, because it finds similarities between policies that storage complexity of the methods appears in Table 1. are almost parallel. This is important in cases such as two highways that only have one meeting point; in this case, if Model Model Construction Model Storage the highways are alternative routes to the same intersection, FreqCount O(|π|) O(|π|) they should be similar with respect to the intersection point. EigenStrat O((|π|)2 ) O(dims × |π|) The input to the Jaccard index are two vectors representing policies generated in Section 3. J(πi , πj ) is the Jaccard sim- LapStrat O((|ΠT R |)2 ) O(k × |π|) ilarity for pair πi and πj . The unnormalized Laplacian is MCStrat O(1) O(|ΠT R | × |π|) computed by subtracting the degree matrix from the similar- ity matrix in the same fashion as [Shi and Malik, 2000]. We Table 1: Space complexity of methods. choose the dims eigenvectors with smallest eigenvalues, and 5 Results Method FreqCount EigenStrat LapStrat MCStrat Given an in-progress taxi trajectory, the methods presented FreqCount n/a n/a n/a facilitate predictions about the future movement of the ve- EigenStrat 0.431 n/a n/a hicle. To simulate this task, a collection of partial trajecto- ries (e.g. Figure 4) is generated from complete trajectories LapStrat *0.000211 *0.000218 0.462 in the test set. A set of relevant policy vectors is generated MCStrat *0.00149 *0.000243 n/a using one of the four methods described, and policy itera- tion is performed to generate the future location predictions. Table 3: p-values of Wilcoxon signed-rank test pairs. Starred (*) val- The inferred future location matrix (e.g. Figure 5) is com- ues indicate the row method achieves statistically significant (0.1% pared against the actual complete taxi trajectory (e.g. Fig- significance level) improvement over the column method for a pre- ure 6). Prediction results are scored by comparing the in- diction horizon of 6. If n/a, the row method’s mean is not better than the column method. ferred visited location vector θ̂ against the full location vec- tor θ vi . The scores are computed using Pearson’s correlation: score = Cor(θ̂, θ vi ). The scores reported are the aggregate pose to extend this work using a hierarchical approach which mean of scores from examples in the validation set. simultaneously incorporates global and local predictions to The data set contains 100,000 subtrajectories (of approxi- provide more robust results. mately 1 hour in length) from 10,000 taxis. The data set is split randomly into 3 disjoint collections to facilitate experi- References mentation: 90% in the training set, and 5% in both the test and [Fiosina and Fiosins, 2012] J. Fiosina and M. Fiosins. Co- validation sets. For each model type, the training set is used operative kernel-based forecasting in decentralized multi- to generate the model. Model parameters are optimized using agent systems for urban traffic networks. In Ubiquitous the test set. Scores are computed using predictions made on Data Mining, pages 3–7. ECAI, 2012. partial trajectories from the validation set. Results of each method for 4 prediction horizons are shown [Krumm and Horvitz, 2006] J. Krumm and E. Horvitz. Pre- in Table 2. The methods leveraging more local information destination: Inferring destinations from partial trajecto- near the last location of the vehicle (LapStrat, MCStrat) per- ries. UbiComp 2006, pages 243–260, 2006. form better than the methods relying only on global patterns [Krumm, 2010] J. Krumm. Where will they turn: predicting (FreqCount, EigenStrat). This is true for all prediction hori- turn proportions at intersections. Personal and Ubiquitous zons, but the more local methods have an even greater perfor- Computing, 14(7):591–599, 2010. mance advantage for larger prediction horizons. [Lakhina et al., 2004] A. Lakhina, K. Papagiannaki, Prediction Horizon M. Crovella, C. Diot, E. D. Kolaczyk, and N. Taft. Structural analysis of network traffic flows. Perform. Eval. Method 1 2 4 6 Rev., 32(1):61–72, 2004. FreqCount .579 (.141) .593 (.127) .583 (.123) .573 (.122) EigenStrat .563 (.143) .576 (.134) .574 (.140) .574 (.140) [Shi and Malik, 2000] J. Shi and J. Malik. Normalized cuts LapStrat .590 (.144) .618 (.139) .626 (.137) .626 (.137) and image segmentation. IEEE Trans. on Pattern Analysis MCStrat .600 (.146) .616 (.149) .621 (.149) .621 (.149) and Machine Intelligence, 22(8):888–905, 2000. [Veloso et al., 2011] M. Veloso, S. Phithakkitnukoon, and Table 2: Correlation (std. dev.) by method and prediction horizon. C. Bento. Urban mobility study using taxi traces. In Proc. The best score is in bold. of the 2011 Int’l Workshop on Trajectory Data Mining and Analysis, pages 23–30, 2011. Statistical significance testing was performed on the vali- [Yuan et al., 2010] J. Yuan, Y. Zheng, C. Zhang, W. Xie, dation set results, as shown in Table 3. The best performing X. Xie, G. Sun, and Y. Huang. T-drive: driving directions methods (LapStrat and MCStrat) achieve a statistically sig- based on taxi trajectories. In Proc. of the 18th SIGSPATIAL nificant performance improvement over the other methods. Int’l Conf. on Advances in GIS, pages 99–108, 2010. However, the relative performance difference between the lo- cal methods is not significantly different. [Zhang et al., 2009] J. Zhang, P. Niyogi, and Mary S. McPeek. Laplacian eigenfunctions learn population struc- 6 Conclusions ture. PLoS ONE, 4(12):e7928, 12 2009. The methods presented can be applied to many other spatio- [Zhu et al., 2012] Y. Zhu, Y. Zheng, L. Zhang, D. Santani, temporal domains where only basic location and time infor- X. Xie, and Q. Yang. Inferring Taxi Status Using GPS mation is collected from portable devices, such as sensor net- Trajectories. ArXiv e-prints, May 2012. works as well as mobile phone networks. These predictions [Ziebart et al., 2008] B. Ziebart, A. Maas, A. Dey, and assume the action space is large but fixed and observations J. Bagnell. Navigate like a cabbie: probabilistic reason- implicitly are clustered into distinct but repeated goals. In ing from observed context-aware behavior. In Proc. of the this domain, each observation is a set of actions a driver takes 10th Int’l Conf. on Ubiquitous computing, UbiComp ’08, in fulfillment of a specific goal: for example, to take a passen- pages 322–331, 2008. ger from the airport to his/her home. In future work, we pro-