<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Inferring Destination from Mobility Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ali A. Naeem</string-name>
          <email>ali.naeem@insight-centre.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kenneth N. Brown</string-name>
          <email>ken.brown@insight-centre.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Insight Centre for Data Analytics, Department of Computer Science University College Cork</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Destination prediction in a moving vehicle has several applications such as alternative route recommendations even in cases where the driver has not entered their destination into the system. In this paper a hierarchical approach to destination prediction is presented. A Discrete Time Markov Chain model is used to make an initial prediction of a general region the vehicle might be travelling to. Following that a more complex Bayesian Inference Model is used to make a fine grained prediction within that destination region. The model is tested on a dataset of 442 taxis operating in Porto, Portugal. Experiments are run on two maps. One is a smaller map concentrating specificially on trips within the Porto city centre and surrounding areas. The second map covers a much larger area going as far as Lisbon. We achieve predictions for Porto with average distance error of less than 0.6 km from early on in the trip and less than 1.6 km dropping to less than 1 km for the wider area.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The ubiquity of smart phones and other GPS devices has increased interest
in location-based services such as location-based social networking and
realtime information updating. Destination prediction is an integral part of these
services as it enables, for example, sight-seeing recommendations, more efficient
route recommendations based on current conditions, etc. Most of the previous
work is based on data mining of historical trajectories of known individuals to
provide personalised destination prediction. For unknown users or new users, the
problem is significantly harder since we have a much larger set of a priori probable
destinations. In this paper, the question addressed is: can the destination of a
vehicle during its journey be correctly and reliably predicted using a history of
previous trip trajectories from a wider population?</p>
      <p>We propose a hierarchical model which first predicts a general region for the
destination and then refines it by making a second more fine grained
prediction within that higher destination region. The initial prediction to select the
destination region is made by a Discrete Time Markov Chain (DTMC) model.
The states in the markov chain are the discrete locations a car might occupy.
The second prediction is made by a more complex Bayesian Inference model
which calculates the probability of the destination location given some observed
features of the trip such as start location and last two observed locations.</p>
      <p>The model inference is evaluated on actual datasets of taxis in Porto,
Portugal. If traffic is limited to the Porto urban region, our best model predicts the
destination coordinates to within 0.5 km on average across the entire trip. This
accuracy is reached very early into the trip in the smaller map. For the more
general case covering a much larger area of Porto and surrounding regions, the
best model predicts to an accuracy of 1.6 km early on in the trip and the error
dips to below 1 km as the trip progresses.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The Predestination algorithm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] uses a Bayesian Inference model combined
with geographic data such as types of land cover. This allows the model to
preemptively rule out, for example, areas of water as possible destinations. The
algorithm implements an open world model which attempts to capture the
possibility of a vehicle going to destinations which have not yet been observed in the
dataset. It does this by assuming that there is a certain likelihood of a driver
going to a destination which is near some other destination that they have visited
before (for instance, they may stop off at the supermarket or restaurant close
to their home or place of work. The algorithm also includes the idea of driving
efficiency. It supposes that the driver will take a reasonable route to their
destination and thus gives higher preferences to those destinations which can be
reached within reasonable travel.
      </p>
      <p>
        The PROCAB algorithm [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] improves on Predestination by using Markov
Decision Processes to learn preferences on the part of drivers (for instance some
road types are more preferable and hence have higher utility). This allows the
algorithm to learn contextual information which increases performance.
PROCAB is shown to perform similarly to Predestination when a large portion of
the trip is observed and do better than Predestination earlier in the trip.
      </p>
      <p>
        It is often the case that in approaches for destination prediction where the
destination probability is calculated conditioned on certain observed features of
the trip, there exists a data sparsity problem. Specifically the method is only
feasible if there exists in the data a historical trajectory with the exact same
characteristics as the current observed trip. One solution to this is to include
extra information such as road maps, route planning, etc as seen in the
Predestination algorithm with land cover [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. An alternative method is Sub-Trajectory
Synthesis (SubSyn) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] which attempts to predict other destinations by
synthesising sub-trajectories from the observed data. In the current work, we will
be presenting a geometric model to address sparsity which estimates transition
probabilities in specific trips not observed in the dataset.
      </p>
      <p>
        The T-Drive algorithm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] builds a landmark graph in which a node represents
a road travelled often by taxis. The route prediction is made in a hierarchical
fashion similar to the concept presented in this paper. Initially, a rough route
is predicted using the landmark graph. In the second step, the rough route is
refined to give a final prediction.
      </p>
      <p>
        The problem of destination prediction with taxis specifically has also been
attempted with neural networks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and regression based approaches [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. A
common thread in these approaches is some discretization of the locations whether
by applying a uniform grid or clustering.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data Preprocessing</title>
      <p>The dataset utilised1 consists of approximately 1.7 million taxi journeys from
442 taxis operating in the city of Porto in Portugal over a period of one year.
The GPS readings are recorded every 15 seconds. Other data such as time of
pickup, driver ID and trip ID are also available.</p>
      <p>The prediction task was separated into two problems of increasing difficulty.
In the first problem, we focus on modelling an area which captures the city
centre and surrounding areas of Porto. Any trips that ventured outside this
bounding box were removed. The area selected for modelling in problem 1 is
shown in figure 1. This smaller map covers a distance of approximately 40 km
measured diagonally. In the second problem, the area of the bounding box was
increased so that it took in Porto and other surrounding cities such as Lisbon.
This larger box captures the full extent of the trips recorded in the dataset. The
area covered by the larger bounding box is shown in figure 2. This larger map
covers a distance of 445 km measured diagonally. The raw data was cleaned by
1 https://www.kaggle.com/c/pkdd-15-predict-taxi-service-trajectory-i/data
having missing data were removed. However, our data exploration indicated that
these weren’t the only trips with missing data points. To identify these trips, the
individual GPS data points were used to calculate the instantaneous speed of
travel at every recording made on a trip. It is specified that the GPS data points
are recorded once every 15 seconds. Hence, any trips with unreasonably large
spikes in the instantaneous speed indicated missing or noisy data. Once a trip is
identified as having noisy or missing data, the entire trip was excluded from the
data. Secondly, the data is collected by mobile terminals carried by the drivers.
Therefore, some ’trips’ were in fact times where the driver had forgotten to turn
off the data collection when outside the vehicle and as such represented pure
walking trips. Those trips are often observed to have data points concentrated
within a small geographical area such as a shopping mall. All such trips were
identified and removed by excluding any trips that never exceeded a speed of 15
km/hr at any point.</p>
      <p>In order to reason about similar trips and locations, we need to discretize
the space. Rather than applying a uniform grid, we chose to cluster the GPS
data points. This has the advantage that it creates tighter clusters where there
was denser traffic hence giving finer predictions in those areas. We used K-Means
Clustering at two levels of granularity: a coarse clustering with fewer clusters and
a second much finer clustering. The specific values for the number of clusters k
at each level were determined by clustering the data for a range of k values and
carrying our silhouette analysis.</p>
      <p>
        Silhouette analysis [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] can be used to analyse the separation distance between
the clusters produced by a clustering algorithm. The analysis involves calculating
a number known as the mean Silhouette coefficient of all the samples. This
coefficient has a range between -1 and +1 and indicates how close each point
in one cluster is to points in the neighbouring clusters. Silhouette coefficients
near +1 indicate that the sample is far away from the neighbouring clusters and
this is desirable. A value of 0 indicates that the sample is on or very close to
the decision boundary between 2 neighbouring clusters. Negative values are an
indication that the sample has been assigned to the wrong cluster.
      </p>
      <p>The silhouette coefficient is calculated using the mean intra-cluster distance
a and the mean nearest cluster distance b (the distance between the sample and
the nearest cluster that the sample is not a part of). The formula for silhouette
coefficient S then is:</p>
      <p>S =</p>
      <p>b a
max(a; b)
To determine the best value for k at each level, a range [8; 12] at the coarse
level and [248; 260] at fine grained level was specified. The clustering algorithm
was then run for each value of k which falls into the range. The mean silhouette
coefficient was then calculated for each resultant clustering. The value for k
which yielded the highest mean silhouette coefficient at each level was chosen.
In the present case, for the smaller bounding box, the best values for k at the
coarse level turned out to be 8 clusters and the fine level 259 clusters. A similar
analysis was carried out for the data contained in the larger bounding box. The
ideal clustering calculated was 18 clusters at the coarse level. Note that the
dropoff behaviours were very different between the city center regions and the
rural regions. For this reason, rather than attempt a global clustering at the
fine level, a hierarchical clustering scheme was devised. In this case, the data
within each of the 18 higher regions were isolated and a separate clustering
carried out within each of the 18 higher regions. A visualisation of these fine
clusters would be too small to be discernible. For illustration purposes, figure
5 shows a sample clustering in the larger bounding box. The actual clusters
are much finer than this at the lower level. Following the clustering, the trips
are now represented as a sequence of cluster visits with repeated cluster IDs
for sequences of GPS locations that remain within the same cluster. When a
prediction is made, the simplest level of prediction then predicts a destination
cluster rather than coordinates. In figure 6, a cluster with radius r2 is shown.
(The actual clusters aren’t perfectly circular but it is assumed these are average
figures for the radius.) Assuming further that there is a uniform distribution
for drop off locations inside the cluster, the average prediction error even in the
most ideal case where the correct destination cluster is identified is given by a
circle with radius r1 whose area is 12 of the circle with radius r2. This gives the
following expression for the areas of the circles:</p>
      <p>A1 =
A2
r2
12 =
r2
1 1
2 =) r1 = p2 r2
Therefore on average, for simple cluster prediction, an error factor of 0:707r2 is
expected even in cases where the correct cluster is identified due to the variability
of the exact destination location within that cluster.
To predict the destination of a taxi using an observed partial trajectory, we
propose a hierarchical model. At the high level, the aim is to identify a general region
or area the taxi is travelling to. The logic is that the early stages of a trip will
be similar for any trip which terminates in destinations within the same
neighbourhood. In the clustering scheme, this equates to picking a destination cluster
from the coarse clustering. Once the higher destination region is identified, the
algorithm will then ’zoom’ down and make a fine grained destination prediction
within the region identified in the previous step. At this stage a cluster from the
fine grained clustering is picked.
The high level model is based on Discrete Time Markov Chains (DTMCs). The
states of the DTMCs are defined to be the clusters visited by the taxi during its
trip. Before training, the trips are factorised by source cluster and destination
cluster pairs. A separate DTMC is trained for every source/destination pair.
During prediction, the source cluster is known. Only those DTMCs associated
with this source are considered. The destination which gives maximum likelihood
for the observed transition is then chosen as the predicted destination region.
The low level model is a Bayesian Inference model. We represent each journey
as the sequence of clusters the taxi passed through. We calculate the destination
cluster probability conditioned on the high level destination region, source cluster
and last two clusters visited. The conditional probabilities are calculated as a
set of probability tables by counting the transitions between clusters that are
observed in the dataset. We represent the source cluster, destination cluster
and last two visited clusters xi and xi 1 as unique cluster IDs. The notation
xi 1 ! xi represents the last observed transition between two clusters in the
most recent time step on the trip. Applying Bayes Rule, we have the following:
P (destinationlowerjxi 1 ! xi; source; destinationhigher) =</p>
      <sec id="sec-3-1">
        <title>P (xi 1 ! xijsource; destinationhigher; destinationlower)</title>
      </sec>
      <sec id="sec-3-2">
        <title>P (xi 1jsource; destinationhigher; destinationlower)</title>
      </sec>
      <sec id="sec-3-3">
        <title>P (sourcejdestinationhigher; destinationlower)</title>
      </sec>
      <sec id="sec-3-4">
        <title>P (destinationhigherjdestinationlower)</title>
        <sec id="sec-3-4-1">
          <title>P (destinationlower)</title>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>P (xi 1 ! xi; source; destinationhigher)</title>
        <p>However, P (xi 1 ! xi; source; destinationhigher) is a normalization term and
not required for the purpose of comparing different probabilities. In addition, this
joint probability distribution is computationally expensive to calculate.
Therefore, we simplify the above expression and compare based on proportionality.
Thus, we look for the destination cluster which maximises:</p>
      </sec>
      <sec id="sec-3-6">
        <title>P (destinationlowerjxi 1 ! xi; source; destinationhigher) /</title>
      </sec>
      <sec id="sec-3-7">
        <title>P (xi 1 ! xijsource; destinationhigher; destinationlower)</title>
      </sec>
      <sec id="sec-3-8">
        <title>P (xi 1jsource; destinationhigher; destinationlower)</title>
      </sec>
      <sec id="sec-3-9">
        <title>P (sourcejdestinationhigher; destinationlower)</title>
      </sec>
      <sec id="sec-3-10">
        <title>P (destinationhigherjdestinationlower)</title>
        <sec id="sec-3-10-1">
          <title>P (destinationlower)</title>
          <p>For example assume a trip starts in cluster 70 and in the current time step is at
cluster 206 having previously been in cluster 115. Assume further that the high
level model has predicted a destination region 5. Then we have:
P (destinationlowerjxi 1 = 115 ! xi = 206; source = 70; destinationhigher = 5)</p>
          <p>The experiments were run using different variations of the hierarchical model
described. These are as follows:
Baseline The baseline model always predicts the centroid coordinates of the
current cluster the taxi is in as the destination.</p>
          <p>Model 1 Reports the centroid coordinates of the most likely lower level
destination cluster within the high level region (a Maximum A Posteriori estimate).
Model 2 Reports the weighted average coordinates of all lower level destination
cluster centroids within the high level region. The centroid locations of all
lower clusters are weighted by the probability of each cluster and summed
to give the expected destination coordinates.</p>
          <p>Model 3 Reports the weighted average coordinates of the top 5 most likely
lower level destination cluster centroids within the high level region. The
centroid locations of the top 5 lower clusters are weighted by the probability
of each cluster and summed to give the expected location. (The probabilities
of the top 5 clusters were rescaled before weighting was carried out.)
Model 1 simply picks the most likely destination cluster at the lower level
and reports the cluster centroid as the predicted location. However, we have seen
that there is an expected error of at least 0:707r (where r is the average radius
of a cluster) because of the difference between the actual drop off location inside
the cluster and the cluster centroid which the algorithm reports (a
discretization error arising from the clustering scheme). Models 2 and 3 are motivated as
a means to try and address this. In models 2 and 3, the algorithm reports an
expected location by multiplying the lower cluster centroids with their
probability and summing. This means that if a particular dropoff is within cluster A
but very close to or bordering cluster B, then the probability of cluster B will be
higher than normal therefore pulling the reported location somewhere between
clusters A and B which is the desired behaviour.
4.3</p>
          <p>Data Sparsity
A problem encountered with the above method is the high dimensionality of the
search space. The number of parameters mean that an exceedingly large number
of combinations of those parameters are possible. Assuming a clustering scheme
of 259 clusters (which comprises source and visit clusters) and the high level
clustering of 8 clusters (the destination regions) then there are 259 sources by
259 previous locations by 259 current locations by 8 destination regions giving
138991832 possibilities at the higher level alone. This is not including the lower
level destination prediction which must be made once a destination region is
decided.</p>
          <p>Some of these possibilities are ruled out by geographic features of the map
itself. There are some clusters which cannot be reached from others in one
transition and hence cannot form a (previous location, current location) pair. Even
excluding these impossible transitions however, we found that the dataset did
not contain information on every feasible transition. In absence of any observed
data, the model predicts a zero probability for these transitions, which is clearly
not the case. A common technique to counter this is to assign a non-zero
probability to every possible destination (a uniform prior).</p>
          <p>We propose a reasonable approximation of what the missing probabilities
might be as a smoothing technique. The idea is given a certain current location
and candidate destination, assign higher probabilities to next locations which
are in the general direction of that candidate destination.</p>
          <p>In figure 7, the vehicle has travelled to its current location (marked C) from
the source S. From there the vehicle has some choices around it for the next
cluster to visit (marked green). Given the source and candidate destination D2,
the model assigns higher probabilities to next locations in the general direction of
D2. The model achieves this by calculating the angle which is the difference
in heading between the possible next location and candidate destination. This
angle is calculated for all possible next locations. The algorithm then ranks
the next locations according to increasing . Probability mass is applied to
entries in this ordered list by a negative exponential function. When making a
prediction, the algorithm first checks if there is data available in the data itself.
If there is none available, it then uses the estimated probability from the model
described above.
To conduct the experiments, for each problem, the data was split into a training
and test set. Approximately 80% of the trips were randomly sampled for the
training set with the remaining 20% being allocated for testing. The evaluation
metric used is the Mean Haversine Distance between our predicted drop off
coordinates and actual drop off coordinates.</p>
          <p>The results from the different model variants (as described in the previous
section) in the case of the smaller bounding box are shown in figure 8 and for
the larger bounding box in figure 9. In the figures, the x axis indicates the
percentage of trip travelled. The y axis is the average haversine distance error
over all the test trips at each point in the trip. In the smaller bounding box,
the best performance is an error of about 0.6 km in the smaller map and 1.6 km
dipping below 1 km as the trip progresses in the larger map. This performance
is seen with Models 2 and 3 which show the best performance in both smaller
and larger area maps. On the graphs these lines coincide with minimal difference
between them. This indicates that picking only the top 5 lower clusters doesn’t
alter the prediction performance too much as compared to weighting over all
lower level destination clusters. The peak at the end of the trip is because at this
stage the taxi is actually at its destination but the model still attempts to make a
prediction which increases the error. It is noted that within the city centre region,
there are patterns to learn and leverage to make a fairly accurate prediction.
Increasing the area covered captures the longer trips which was found to have
exhibit different and more variable behaviour. This resulted in a somewhat lower
prediction accuracy particularly earlier on in the trip. Furthermore, as there is
a larger number of destination clusters to choose from, the best result from the
larger bounding box is still less accurate than those from the smaller bounding
box. Finally the clusters themselves are larger which in turn amplifies any penalty
from picking the wrong cluster.</p>
          <p>It is noteworthy that in the smaller bounding box, the model reaches peak
prediction accuracy very early on (from around 5% into the trip). We always
report the centroid of the most likely cluster in Model 1. Our analysis showed
that even if we predicted the correct cluster 100% of the time, the discretization
would still give us a theoretical minimum error of 0.22 km. To address this
problem, in Models 2 &amp; 3, we report the weighted mean location of the low level
cluster centroids (weighted by the probability of the cluster). The weighting also
helps us in cases where our high-level cluster prediction is wrong. We believe
the majority of these cases are where the true destination coordinates are close
to the boundary between high level clusters so we end up picking an adjacent
high level cluster instead. The low-level prediction is still able to predict the
neighbouring low level clusters giving a final predicted location closer to the
actual destination.</p>
          <p>In the smaller map, while the prediction accuracy reaches its peak 5% into the
trip, the prediction doesn’t improve from there as in the larger map. Analysing
the predictive behaviour of the model, we note the model soon starts to pick
either the actual destination cluster or an adjacent cluster and this behaviour
continues until the end of the trip. The Markov property may have an effect
here as the model makes a prediction based only on the current information at
each time step. In the current case, the best models outperform the baseline
until about 75% into the trip in the smaller map (considering the average error
across all test trips at the 75% point). This indicates that the prediction model is
delivering an insight of value for about three quarters into each trip on average.</p>
          <p>
            The Predestination [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] algorithm reports an error of 2.8 km at the start of
the trip and dropping to 1 km at the end (on data from a different city). In other
papers where this data from Kaggle has been utilized, the authors note that the
test set provided is very small for reliable model comparison and therefore test
on their own local validation sets as we have done. [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] &amp; [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] report errors on the
Kaggle public leaderboard of 2.27 km and 2.39 km respectively. One caveat is
that these results may have come from models trained on data which has been
less aggressively cleaned than our approach. Our model on the small Kaggle test
set of 320 trips scored 2.76 km on the public leaderboard. We believe the local
validation is more significant as the results are calculated over a much larger
number of trips.
6
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we presented a hierarchical model for predicting the destination
of a moving vehicle. The model made an initial prediction at a high level using
Discrete Time Markov Chains for each possible source and destination pair. It
then zoomed down to refine the prediction within that higher level cluster using
Bayesian Inference. We showed that in earlier phases of the trip, the model
correctly predicts the destination to within 0.5 km on average in our smaller map
and around 1.6 km on the larger map (this figure improves as trip progresses on
the larger map).</p>
      <p>We will improve on these results by considering extra pieces of information.
For instance, we will factor trips by time buckets and attempt to learn time
based patterns. This captures the idea that the patterns of pickups and dropoffs
will vary a lot depending on time of day. E.g.: Dropoffs at restaurants and
entertainment venues are more likely in the evening. A second avenue to explore
is to relax the Markov assumption and consider greater number of locations
(such as last 5) in making our prediction. Finally we will attempt to learn driver
preferences (such as favouring a particular section of the city for drop offs and
working during a specific time of the day) to develop more flexible models.
Acknowledgement. This research was supported by Science Foundation
Ireland under Grant No. 12/RC/2289.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>De Brébisson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simon</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auvolat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincent</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Artificial neural networks applied to taxi destination prediction</article-title>
          .
          <source>In: Proceedings of the 2015th International Conference on ECML PKDD Discovery</source>
          Challenge - Volume
          <volume>1526</volume>
          . pp.
          <fpage>40</fpage>
          -
          <lpage>51</lpage>
          . ECMLPKDDDC'15,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          .org, Aachen, Germany, Germany (
          <year>2015</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>3056172</volume>
          .
          <fpage>3056178</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Krumm</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horvitz</surname>
          </string-name>
          , E.: Predestination:
          <article-title>Inferring destinations from partial trajectories</article-title>
          . In: Eighth International Conference on Ubiquitous Computing (UbiComp
          <year>2006</year>
          ). Springer (
          <year>September 2006</year>
          ), https://www.microsoft.com/enus/research/publication/predestination-inferring
          <article-title>-destinations-from-partialtrajectories/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lam</surname>
            ,
            <given-names>H.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diaz-Aviles</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pascale</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkoufas</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>(blue) taxi destination and trip time prediction from partial trajectories</article-title>
          .
          <source>In: Proceedings of the 2015th International Conference on ECML PKDD Discovery</source>
          Challenge - Volume
          <volume>1526</volume>
          . pp.
          <fpage>63</fpage>
          -
          <lpage>74</lpage>
          . ECMLPKDDDC'15,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          .org, Aachen, Germany, Germany (
          <year>2015</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>3056172</volume>
          .
          <fpage>3056180</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Rousseeuw</surname>
            ,
            <given-names>P.J.:</given-names>
          </string-name>
          <article-title>Silhouettes: A graphical aid to the interpretation and validation of cluster analysis</article-title>
          .
          <source>Journal of Computational and Applied Mathematics</source>
          <volume>20</volume>
          (
          <string-name>
            <surname>Supplement</surname>
            <given-names>C)</given-names>
          </string-name>
          ,
          <volume>53</volume>
          -
          <fpage>65</fpage>
          (
          <year>1987</year>
          ), http://www.sciencedirect.com/science/article/pii/0377042787901257
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , R.,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Destination prediction by sub-trajectory synthesis and privacy protection against such prediction</article-title>
          .
          <source>ICDE 2013 (April</source>
          <year>2013</year>
          ), https://www.microsoft.com/enus/research/publication/destination
          <article-title>-prediction-by-sub-trajectory-synthesis-andprivacy-protection-against-such-prediction/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>N.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , G.:
          <article-title>T-drive: Enhancing driving directions with taxi drivers' intelligence</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering (January</source>
          <year>2013</year>
          ), https://www.microsoft.com/en-us/research/publication/tdrive-enhancing
          <article-title>-driving-directions-with-taxi-drivers-intelligence/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ziebart</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maas</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dey</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagnell</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior</article-title>
          .
          <source>In: Proceedings of the 10th International Conference on Ubiquitous Computing</source>
          . pp.
          <fpage>322</fpage>
          -
          <lpage>331</lpage>
          . UbiComp '08,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2008</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/1409635.1409678
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>