<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Predicting Globally and Locally: A Comparison of Methods for Vehicle Trajectory Prediction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>William Groves</string-name>
          <email>groves@cs.umn.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ernesto Nunes</string-name>
          <email>enunes@cs.umn.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Gini</string-name>
          <email>gini@cs.umn.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, University of Minnesota</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>We propose eigen-based and Markov-based methods to explore the global and local structure of patterns in real-world GPS taxi trajectories. Our primary goal is to predict the subsequent path of an in-progress taxi trajectory. The exploration of global and local structure in the data differentiates this work from the state-of-the-art literature in trajectory prediction methods, which mostly focuses on local structures and feature selection. We propose four algorithms: a frequency based algorithm FreqCount, which we use as a benchmark, two eigen-based (EigenStrat, LapStrat), and a Markov-based algorithm (MCStrat). Pairwise performance analysis on a large real-world data set reveals that LapStrat is the best performer, followed by MCStrat.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In order to discover characteristic patterns in large
spatiotemporal data sets, mining algorithms have to take into
account spatial relations, such as topology and direction, as well
as temporal relations. The increased use of devices that are
capable of storing driving-related spatio-temporal
information helps researchers and practitioners gather the necessary
data to understand driving patterns in cities, and to design
location-based services for drivers. To the urban planner, the
work can help to aggregate driver habits and can uncover
alternative routes that could help alleviate traffic. Additionally,
it also helps prioritize the maintenance of roads.</p>
    </sec>
    <sec id="sec-2">
      <title>Our work combines data mining techniques that discover</title>
      <p>global structure in the data, and local probabilistic methods
that predict short-term routes for drivers, based on past
driving trajectories through the road network of a city.</p>
    </sec>
    <sec id="sec-3">
      <title>The literature on prediction has offered Markov-based</title>
      <p>and other probabilistic methods that predict paths accurately.</p>
    </sec>
    <sec id="sec-4">
      <title>However, most methods rely on local structure of data, and</title>
      <p>use many extra features to improve prediction accuracy. In
this paper we use only the basic spatio-temporal data stream.
We advance the state-of-the-art by proposing the LapStrat
algorithm. This algorithm reduces dimensionality and
clusters data using spectral clustering to then predict a
subsequent path using a Bayesian network. Our algorithm supports
global analysis of the data, via clustering, as well as local
inference using the Bayesian framework. In addition, since our
algorithm only uses location and time data, it can be easily
generalized to other domains with spatio-temporal
information. Our contributions are summarized as follows:
1. We offer a systematic way of extracting common
behavioral characteristics from a large set of observations
using an algorithm inspired by principal component
analysis (EigenStrat) and our LapStrat algorithm.</p>
    </sec>
    <sec id="sec-5">
      <title>2. We compare the effectiveness of methods that explore</title>
      <p>global structure only (FreqCount and EigenStrat),
local structure only (MCStrat), and mixed global and
local structure (LapStrat). We show experimentally that</p>
    </sec>
    <sec id="sec-6">
      <title>LapStrat offers competitive prediction power compared to the more local structure-reliant MCStrat algorithm.</title>
      <p>2</p>
      <sec id="sec-6-1">
        <title>Related Work</title>
        <p>Eigendecomposition has been used extensively to analyze and
summarize the characteristic structure of data sets. The
structure of network flows is analyzed in [Lakhina et al., 2004],
principal component analysis (PCA) is used to summarize the
characteristics of the flows that pass through an internet
service provider. [Zhang et al., 2009] identify two weaknesses
that make PCA less effective on real-world data. i.e.
sensitivity to outliers in the data, and concerns about its
interpretation, and present an alternative, Laplacian eigenanalysis. The
difference between these methods is due to the set of
relationships each method considers: the Laplacian matrix only
considers similarity between close neighbors, while PCA
considers relationships between all pairs of points. These studies
focus on the clustering power of the eigen-based methods to
find structures in the data. Our work goes beyond
summarizing the structure of the taxi routes, and uses the eigenanalysis
clusters to predict the subsequent path of an in-progress taxi
trajectory.</p>
        <p>Research in travel prediction based on driver behavior has
enjoyed some recent popularity. [Krumm, 2010] predicts the
next turn a driver will take by choosing with higher
likelihood a turn that links more destinations or is more time
efficient. [Ziebart et al., 2008] offer algorithms for turn
prediction, route prediction, and destination prediction. The study
uses a Markov model representation and inverse
reinforcement learning coupled with maximum entropy to provide
accurate predictions for each of their prediction tasks. [Veloso
et al., 2011] proposes a Naive Bayes model to predict that a
taxi will visit an area, using time of the day, day of the week,
weather, and land use as features. In [Fiosina and Fiosins,
2012], travel time prediction in a decentralized setting is
investigated. The work uses kernel density estimation to predict
the travel time of a vehicle based on features including length
of the route, average speed in the system, congestion level,
number of traffic lights, and number of left turns in the route.</p>
        <p>All these studies use features beyond location to improve
prediction accuracy, but they do not offer a comprehensive
analysis of the structure of traffic data alone. Our work
addresses this shortcoming by providing both an analysis of
commuting patterns, using eigenanalysis, and route
prediction based on partial trajectories.
3</p>
      </sec>
      <sec id="sec-6-2">
        <title>Data Preparation</title>
        <p>The GPS trajectories we use for our experiments are taken
from the publicly available Beijing Taxi data set which
includes 1 to 5-minute resolution location data for over
tenthousand taxis for one week in 2009 [Yuan et al., 2010].
Beijing, China is reported to have seventy-thousand registered
taxis, so this data set represents a large cross-section of all
taxi traffic for the one-week period [Zhu et al., 2012].</p>
        <p>Because the data set contains only location and time
information of each taxi, preprocessing the data into segments
based on individual taxi fares is useful. The data has sufficient
detail to facilitate inference on when a taxi ride is completed:
for example, a taxi waiting for a fare will be stopped at a taxi
stand for many minutes [Zhu et al., 2012]. Using these
inferences, the data is separated into taxi rides.</p>
        <p>To facilitate analysis, the taxi trajectories are discretized
into transitions on a region grid with cells of size 1.5 km ×
1.5 km square. V =&lt; v1, v2, . . . , vw &gt; is a collection of
trajectories. We divide it into VTR, VTE, VVA which are the
training, test, and validation sets, respectively. A trajectory
vi is a sequence of N time-ordered GPS coordinates: vi =&lt;
c1vi , . . . cjvi , . . . , cvi &gt;. Each coordinate contains a GPS
lat</p>
        <p>N
itude and longitude value, cjvi = (xj , yj ). Given a complete
trajectory (vi), a partial trajectory (50% of a full trajectory)
can be generated as vipartial =&lt; c1vi , c2vi , . . . , cvNi/2 &gt;. The
last location of a partial trajectory vlast =&lt; cvNi/2 &gt; is used
i
to begin the prediction task.</p>
        <p>The relevant portion of the city’s area containing the
majority of the city’s taxi trips, called a city grid, is enclosed
in a matrix of dimension 17 × 20. Each si corresponds to
the center of a grid square in the euclidean xy-space. The
city graph is encoded as a rectilinear grid with directed edges
(esisj ) between adjacent grid squares. I(cj , si) is an indicator
function that returns 1 if GPS coordinate cj is closer to grid
center si than to any other grid center and otherwise returns
0. Equation 1 shows an indicator function to determine if two</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>GPS coordinates indicate traversal in the graph.</title>
      <p>Φ(cjvi , ckvi , eslsm ) =
1, if I(cjvi , sl) ∗ I(ckvi , sm) = 1
0</p>
    </sec>
    <sec id="sec-8">
      <title>Otherwise (1)</title>
      <p>From trajectory vi, a policy vector πi is created having one
)
x
e
d
illn 8
e
c
d
i
r
g
(
e
d
u
itt
a
L
4
S4
S5</p>
      <p>S6
value for each edge in the city grid. Each δsl,sm is a directed
edge coefficient indicating that a transition occurred between
sl and sm in the trajectory. The policy vectors for this data
set graph have length (|π|) of 1286, based on the number of
edges in the graph. A small sample city grid is in Figure 1. A
collection of policies Π =&lt; π1, π2, . . . , πw &gt; is computed
from a collection of trajectories V :</p>
      <p>πvi =&lt; δsv1i,s2 , . . . , δsvli,sm , . . . &gt;
δsvli,sm =
(1, if PjN=−11 Φ(cjvi , cjv+i1, esl,sm ) ≥ 1
0</p>
    </sec>
    <sec id="sec-9">
      <title>Otherwise</title>
    </sec>
    <sec id="sec-10">
      <title>A graphical example showing a trajectory converted into a policy is shown in Figure 2. All visited locations for trajec</title>
      <p>GPS Waypoints (time ordered)</p>
      <p>Policy Grid Transitions
.
o
N
e
c
n
e
u
q
e
S
it
n
o
p
y
a
W
12
10
8
6
4
2
0
(2)
(3)
(4)
(5)
8
Longitude (grid cell index)
12
tory vipartial are given by θvipartial :
θvipartial
ωsi =
=&lt; ωs1 , ωs2 , . . . , ωsm &gt;, with
vpartial
(1, if Pjn=1 I(cj i
, si) ≥ 1
0</p>
    </sec>
    <sec id="sec-11">
      <title>Otherwise</title>
      <p>A baseline approach for prediction, FreqCount, uses
observed probabilities of each outgoing transition from each
node in the graph. Figure 3 shows the relative frequencies
of transitions between grid squares in the training set. This
city grid discretization is similar to methods used by others in
this domain [Krumm and Horvitz, 2006; Veloso et al., 2011].
4</p>
      <sec id="sec-11-1">
        <title>Methods</title>
        <p>This work proposes four methods that explore either the local
or the global structure or a mix of both to predict short-term
trajectories for drivers, based on past trajectories.
location probability
)
x
e
d
n
il
le 8
c
d
i
r
g
(
e
d
u
itt
a
L
4</p>
        <p>8
Longitude (grid cell index)
12</p>
        <p>8
Longitude (grid cell index)
12
Benchmark: Frequency Count Method. A benchmark
prediction measure, FreqCount, uses frequency counts for
transitions in the training set to predict future actions. The relative
frequency of each rectilinear transition from each location in
the grid is computed and is normalized based on the number
of trajectories involving the grid cell. The resulting policy
matrix is a Markov chain that determines the next predicted
action based on the current location of the vehicle.</p>
        <p>The FreqCount method computes a policy vector based
on all trajectories in the training set VT R. πFreqCount contains
first order Markov transition probabilities computed from all
trajectories as in Equation 6.</p>
        <p>πFreqCount
δsi,sj
=</p>
        <p>v</p>
        <p>Pv∈VT R δsi,sj
Pv∈VT R</p>
        <p>PkM=1 δsvi,sk
(6)</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>The probability of a transition (si → sj ) is computed as</title>
      <p>the count of the transition si → sj in VT R divided by the
count of all transitions exiting si in VT R.</p>
    </sec>
    <sec id="sec-13">
      <title>Policy iteration (Algorithm 1) is applied to the last loca</title>
      <p>tion of a partial trajectory using the frequency count policy
set ΠFreqCount =&lt; πFreqCount &gt; to determine a basic
prediction of future actions. This method only considers frequency
of occurrence for each transition in the training set, so it is
expected to perform poorly in areas where trajectories intersect.</p>
      <sec id="sec-13-1">
        <title>Algorithm 1: Policy Iteration</title>
        <p>Input: Location vector with last location of taxi θlast, a
policy list Π, prediction horizon niter
Output: A location vector containing visit probabilities
for future locations θˆ
1 θaccum ← θlast
2 for π ∈ Π do
3 t ← 1</p>
        <p>θ0 ← θlast
while t ≤ niter do
θt =&lt; ωst1 , ωst2 , . . . , ωsti , . . . , ωstM &gt;
, where ωsti = maxsj ∈S (ωst−j1 ∗ δsπj ,si )</p>
        <p>= max(ωsθiaccum , ωsθit )
EigenStrat: Eigen Analysis of Covariance. This method
exploits linear relationships between transitions in the grid
which 1) can be matched to partial trajectories for purposes
of prediction and 2) can be used to study behaviors in the
system. The first part of the algorithm focuses on model
generation. For each pair of edges, the covariance is computed
using the training set observations. The n largest eigenvectors
are computed from the covariance matrix. These form a
collection of characteristic eigen-strategies from training data.</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>When predicting for an in-progress trajectory, the algo</title>
      <p>rithm takes the policy generated from a partial taxi trajectory
πvpredict , a maximum angle to use as the relevancy threshold
α, and the eigen-strategies as Π. Eigen-strategies having an
angular distance less than α to πvpredict are added to Πrel.</p>
    </sec>
    <sec id="sec-15">
      <title>This collection is then used for policy iteration. Optimal val</title>
      <p>ues for α and dims are learned experimentally.</p>
    </sec>
    <sec id="sec-16">
      <title>Eigenpolicies also facilitate exploration of strategic decisions. Figure 7 shows an eigenpolicy plot with a distinct pattern in the training data. Taxis were strongly confined to trajectories either the inside circle or the perimeter of the circle,</title>
      <p>Algorithm 2: EigenStrat</p>
    </sec>
    <sec id="sec-17">
      <title>Input: ΠTR, number of principal components (dims),</title>
      <p>minimum angle between policies (α), prediction
horizon (horizon), partial policy (πvipartial )</p>
      <p>Output: Inferred location vector θˆ
1 Generate covariance matrix C|πi|×|πi| (where πi ∈ ΠTR)
between transitions on the grid
2 Get the dims eigenvectors of C with largest eigenvalues
3 Compute cosine similarity between πvipartial and the
principal components (πj , j = 1 . . . dims):
Πrel = {πj ||cos(πj , πvipartial )| &gt; α}
4 If the cos(πj , πvipartial ) &lt; 0, then flip the sign of the
coefficients for this eigenpolicy. Use Algorithm 1 with
Πrel on vipartial for horizon iterations to compute θˆ
but rarely between these regions. The two series (positive and
negative) indicate the sign and magnitude of the grid
coefficients for this eigenvector. We believe analysis of this type
has great promise for large spatio-temporal data sets.
)x 8
e
d
n
lli
e
c
d
i
r
g
(
e
d
ittu 4
a
L
positive directions
negative directions</p>
      <p>4
Longitude (grid cell index)
8</p>
      <p>LapStrat: Spectral Clustering-Inspired Algorithm.
Lap</p>
    </sec>
    <sec id="sec-18">
      <title>Strat (Algorithm 3) combines spectral clustering and</title>
      <p>Bayesian-based policy iteration to cluster policies and infer
driver next turns. Spectral clustering operates upon a
similarity graph and its respective Laplacian operator. This work
follows the approach of [Shi and Malik, 2000] using an
unnormalized graph Laplacian. We use Jaccard index to
compute the similarity graph between policies. We chose the
Jaccard index, because it finds similarities between policies that
are almost parallel. This is important in cases such as two
highways that only have one meeting point; in this case, if
the highways are alternative routes to the same intersection,
they should be similar with respect to the intersection point.</p>
    </sec>
    <sec id="sec-19">
      <title>The input to the Jaccard index are two vectors representing</title>
      <p>policies generated in Section 3. J (πi, πj ) is the Jaccard
similarity for pair πi and πj . The unnormalized Laplacian is
computed by subtracting the degree matrix from the
similarity matrix in the same fashion as [Shi and Malik, 2000]. We
choose the dims eigenvectors with smallest eigenvalues, and
Algorithm 3: LapStrat</p>
    </sec>
    <sec id="sec-20">
      <title>Input: ΠTR, dimension (dims), number of clusters (k),</title>
      <p>similarity threshold (ǫ), prediction (horizon),
partial policy (πvipartial )</p>
      <p>Output: Inferred location vector θˆ
1 Generate similarity matrix W|ΠTR|×|ΠTR| where</p>
      <p>J (πi, πj ), if J (πi, πj ) ≥ ǫ
wij =</p>
      <p>0 Otherwise
2 Generate Laplacian (L): L = D − W and ∀dij ∈ D
iterations to compute θˆ
(Pz=1</p>
      <p>|ΠT R| wiz , if i = z
dij =</p>
      <p>0 Otherwise
3 Get the dims eigenvectors with smallest eigenvalues
4 Use k-means to find the mean centroids (πj , j = 1 . . . k)
of k policy clusters
5 Find all centroids similar to πvipartial :</p>
      <p>Πrel = {πj |J (πj , πvipartial ) &gt; ǫ}
6 Use Algorithm 1 with Πrel on vipartial for horizon
perform k-means to find clusters in the reduced dimension.</p>
    </sec>
    <sec id="sec-21">
      <title>The optimal value for dims is learned experimentally.</title>
      <p>MCStrat: Markov Chain-Based Algorithm. The Markov
chain approach uses local, recent information from vpartial
predict,
the partial trajectory to predict from. Given the last k edges
traversed by the vehicle, the algorithm finds all complete
trajectories in the training set containing the same k edges to
build a set of relevant policies Vrel using the match function.
match(k, a, b) returns 1 only if at least the last k transitions
in the policy generated by trajectory a are also found in b.</p>
    </sec>
    <sec id="sec-22">
      <title>Using Equation 9, Vrel is used to build a composite single</title>
      <p>relevant policy πrel, that obeys the Markov assumption, so
the resulting policy preserves the probability mass.</p>
      <p>Vrel = {vi match(k, πvppraerdtiiactl , πvi ) = 1, vi ∈ VTR} (7)
πrel =&lt; δs1,s2 , . . . , δsπir,selj , . . . &gt;</p>
      <p>πrel
πrel
δsi,sj =</p>
      <p>Pv∈Vrel δsvi,sj</p>
      <p>Pv∈Vrel PkM=1 δsvi,sk
Using the composite πrel, policy iteration is then performed
on the last location vector computed from vpredict.
Method Complexity Comparison. A comparison of the
storage complexity of the methods appears in Table 1.</p>
    </sec>
    <sec id="sec-23">
      <title>Model</title>
      <sec id="sec-23-1">
        <title>FreqCount</title>
      </sec>
      <sec id="sec-23-2">
        <title>EigenStrat</title>
      </sec>
      <sec id="sec-23-3">
        <title>LapStrat</title>
      </sec>
      <sec id="sec-23-4">
        <title>MCStrat</title>
      </sec>
    </sec>
    <sec id="sec-24">
      <title>Model Construction</title>
      <p>O(|π|)
O((|π|)2)
O((|ΠT R |)2)
O(1)</p>
    </sec>
    <sec id="sec-25">
      <title>Model Storage</title>
      <p>O(|π|)
O(dims × |π|)
O(k × |π|)
O(|ΠT R| × |π|)</p>
      <sec id="sec-25-1">
        <title>Results</title>
        <p>Given an in-progress taxi trajectory, the methods presented
facilitate predictions about the future movement of the
vehicle. To simulate this task, a collection of partial
trajectories (e.g. Figure 4) is generated from complete trajectories
in the test set. A set of relevant policy vectors is generated
using one of the four methods described, and policy
iteration is performed to generate the future location predictions.
The inferred future location matrix (e.g. Figure 5) is
compared against the actual complete taxi trajectory (e.g.
Figure 6). Prediction results are scored by comparing the
inferred visited location vector θˆ against the full location
vector θvi . The scores are computed using Pearson’s correlation:
score = Cor(θˆ, θvi ). The scores reported are the aggregate
mean of scores from examples in the validation set.</p>
        <p>The data set contains 100,000 subtrajectories (of
approximately 1 hour in length) from 10,000 taxis. The data set is
split randomly into 3 disjoint collections to facilitate
experimentation: 90% in the training set, and 5% in both the test and
validation sets. For each model type, the training set is used
to generate the model. Model parameters are optimized using
the test set. Scores are computed using predictions made on
partial trajectories from the validation set.</p>
        <p>Results of each method for 4 prediction horizons are shown
in Table 2. The methods leveraging more local information
near the last location of the vehicle (LapStrat, MCStrat)
perform better than the methods relying only on global patterns
(FreqCount, EigenStrat). This is true for all prediction
horizons, but the more local methods have an even greater
performance advantage for larger prediction horizons.</p>
      </sec>
    </sec>
    <sec id="sec-26">
      <title>Statistical significance testing was performed on the vali</title>
      <p>dation set results, as shown in Table 3. The best performing
methods (LapStrat and MCStrat) achieve a statistically
significant performance improvement over the other methods.</p>
    </sec>
    <sec id="sec-27">
      <title>However, the relative performance difference between the local methods is not significantly different.</title>
      <p>6</p>
      <sec id="sec-27-1">
        <title>Conclusions</title>
        <p>The methods presented can be applied to many other
spatiotemporal domains where only basic location and time
information is collected from portable devices, such as sensor
networks as well as mobile phone networks. These predictions
assume the action space is large but fixed and observations
implicitly are clustered into distinct but repeated goals. In
this domain, each observation is a set of actions a driver takes
in fulfillment of a specific goal: for example, to take a
passenger from the airport to his/her home. In future work, we
pro</p>
        <sec id="sec-27-1-1">
          <title>FreqCount</title>
        </sec>
        <sec id="sec-27-1-2">
          <title>EigenStrat</title>
        </sec>
        <sec id="sec-27-1-3">
          <title>LapStrat MCStrat</title>
          <p>n/a
pose to extend this work using a hierarchical approach which
simultaneously incorporates global and local predictions to
provide more robust results.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Fiosina and Fiosins</source>
          , 2012]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fiosina</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Fiosins</surname>
          </string-name>
          .
          <article-title>Cooperative kernel-based forecasting in decentralized multiagent systems for urban traffic networks</article-title>
          .
          <source>In Ubiquitous Data Mining</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>7</lpage>
          . ECAI,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Krumm and Horvitz</source>
          , 2006]
          <string-name>
            <given-names>J.</given-names>
            <surname>Krumm</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Horvitz</surname>
          </string-name>
          . Predestination:
          <article-title>Inferring destinations from partial trajectories</article-title>
          .
          <source>UbiComp</source>
          <year>2006</year>
          , pages
          <fpage>243</fpage>
          -
          <lpage>260</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Krumm</source>
          , 2010]
          <string-name>
            <given-names>J.</given-names>
            <surname>Krumm</surname>
          </string-name>
          .
          <article-title>Where will they turn: predicting turn proportions at intersections</article-title>
          .
          <source>Personal and Ubiquitous Computing</source>
          ,
          <volume>14</volume>
          (
          <issue>7</issue>
          ):
          <fpage>591</fpage>
          -
          <lpage>599</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Lakhina et al.,
          <year>2004</year>
          ]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lakhina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Papagiannaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Crovella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Diot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Kolaczyk</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Taft</surname>
          </string-name>
          .
          <article-title>Structural analysis of network traffic flows</article-title>
          .
          <source>Perform. Eval. Rev.</source>
          ,
          <volume>32</volume>
          (
          <issue>1</issue>
          ):
          <fpage>61</fpage>
          -
          <lpage>72</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Shi and Malik</source>
          , 2000]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shi</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Malik</surname>
          </string-name>
          .
          <article-title>Normalized cuts and image segmentation</article-title>
          .
          <source>IEEE Trans. on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>22</volume>
          (
          <issue>8</issue>
          ):
          <fpage>888</fpage>
          -
          <lpage>905</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Veloso et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>M.</given-names>
            <surname>Veloso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Phithakkitnukoon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bento</surname>
          </string-name>
          .
          <article-title>Urban mobility study using taxi traces</article-title>
          .
          <source>In Proc. of the 2011 Int'l Workshop on Trajectory Data Mining and Analysis</source>
          , pages
          <fpage>23</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Yuan et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          , G. Sun, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>T-drive: driving directions based on taxi trajectories</article-title>
          .
          <source>In Proc. of the 18th SIGSPATIAL Int'l Conf. on Advances in GIS</source>
          , pages
          <fpage>99</fpage>
          -
          <lpage>108</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Zhang et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Niyogi, and
          <string-name>
            <surname>Mary</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McPeek</surname>
          </string-name>
          .
          <article-title>Laplacian eigenfunctions learn population structure</article-title>
          .
          <source>PLoS ONE</source>
          ,
          <volume>4</volume>
          (
          <issue>12</issue>
          ):
          <year>e7928</year>
          ,
          <year>12 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Zhu et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang. Inferring Taxi Status Using GPS Trajectories. ArXiv</surname>
          </string-name>
          e-prints,
          <year>May 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Ziebart et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ziebart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Bagnell</surname>
          </string-name>
          .
          <article-title>Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior</article-title>
          .
          <source>In Proc. of the 10th Int'l Conf. on Ubiquitous computing, UbiComp '08</source>
          , pages
          <fpage>322</fpage>
          -
          <lpage>331</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>