(Blue) Taxi Destination and Trip Time
         Prediction from Partial Trajectories

                   Hoang Thanh Lam, Ernesto Diaz-Aviles,
              Alessandra Pascale, Yiannis Gkoufas, and Bei Chen

                              IBM Research – Ireland

    {t.l.hoang, e.diaz-aviles, apascale, yiannisg, beichen2}@ie.ibm.com


      Abstract Real-time estimation of destination and travel time for taxis
      is of great importance for existing electronic dispatch systems. We present
      an approach based on trip matching and ensemble learning, in which
      we leverage the patterns observed in a dataset of roughly 1.7 million
      taxi journeys to predict the corresponding final destination and travel
      time for ongoing taxi trips, as a solution for the ECML/PKDD Discov-
      ery Challenge 2015 competition. The results of our empirical evaluation
      show that our approach is effective and very robust, which led our team
      – BlueTaxi – to the 3rd and 7th position of the final rankings for the
      trip time and destination prediction tasks, respectively. Given the fact
      that the final rankings were computed using a very small test set (with
      only 320 trips) we believe that our approach is one of the most robust
      solutions for the challenge based on the consistency of our good results
      across the test sets.


Keywords: Taxi trajectories; Trip matching; Ensemle learning; Kaggle;
ECML–PKDD Discovery Challenge


1    Introduction

Taxi dispatch planning and coordination is challenging given the traffic dynamics
in modern urban cities. The improvement of electronic dispatching of taxis can
have a positive impact in a city, not only by reducing traffic congestion and
environmental impact, but also improving the local economy, e.g., by mobilizing
more customers in cities where the demand of taxis is very high.
    Current electronic dispatch systems capture localization and taximeter state
from mobile data terminals installed in GPS-enabled taxis. The collective traffic
dynamics extracted from the taxis’ GPS trajectories and the stream of data col-
lected in real-time represent a valuable source of information to understand and
predict the behavior of future trips. In particular, we are interested in improv-
ing taxi electronic dispatch by predicting the final destination and the total trip
time of ongoing taxi rides based on their initial partial trajectories. A solution to
this challenge is very valuable given that in most cases busy taxi drivers do not
report the final destination of the ride, which makes electronic dispatch planning
more difficult and sub-optimal, e.g., ideally the destination of a ride should be
very close to the start of the next one.
    These two tasks, namely taxi destination and trip time prediction, are the
ones that correspond to the Discovery Challenge part of ECML/PKDD 2015 [2].
This paper details our data-driven solution to the challenge that placed our
team, BlueTaxi, in the 3rd and 7th position in the final ranking for the time and
destination prediction task, respectively.
    In particular, we are given a data set with roughly 1.7 million taxi journeys
from 442 taxis operating in the city of Porto, Portugal, for a period of one year.
Besides polylines with GPS updates (at a resolution of 15 seconds), for every
taxi trip we are provided with additional information that includes the taxi id,
origin call id, taxi stand id, starting timestamp of the ride, etc. The competition
is hosted on Kaggle’s platform1 , which allows participants to submit predictions
for a given test set whose ground-truth is held by the competition organizers.
Submissions are scored immediately (based on their predictive performance rel-
ative to a 50% of the hidden test dataset) and summarized on a live public
leaderboard (LB). At the end of the competition, the other 50% of the held-out
test set, i.e., from which the participants did not receive any feedback, is used to
compute the final rankings, also known as the private LB. The challenge details
and dataset characteristics can be found in [2].
    The challenge organizers explicitly asked for prediction of trip destination
and duration at 5 specific time snapshots, namely 2014-08-14 18:00:00, 2014-09-
30 08:30:00, 2014-10-06 17:45:00, 2014-11-01 04:00:00, and 2014-12-21 14:30:00.
    Our solution is based on a simple intuition: two trips with similar route
likely end at the same or nearby destinations. For example, Figure 1 shows two
complete trips extracted from the data. Both trips ended at the Porto airport.
Although the first parts of the trips are very different, the last parts of the trips
are the same. Therefore, using the destinations of similar trips in the past we
can predict the destination of a test trip.
    This simple intuition serves as the key idea behind our approach in which we
extract features (destination and trip time) from similar trips and then build a
model ensemble based on the constructed features to predict taxis’ destination
and travel time. In the next sections we detail the feature extraction process,
the predictive models of our approach, and present the experimental results that
show the effectiveness of our solution.


2     Feature Extraction

From our initial data exploration, we found that the dataset contains missing
GPS updates and erroneous information. In fact, the missing value column part
of the dataset was erroneous: most values of the column were False, even though
the records included missing information. We detected missing GPS points in the
1
    https://www.kaggle.com/
Figure 1. Two trips with different starting points but with the same destination (Porto
airport). The final part of the trajectories are very close to each other, which helps to
estimate the destination of similar trips.

trajectory by observing large jumps between two consecutive GPS updates, i.e.,
by considering distances exceeding the distance that a taxi would have travelled
at a speed limit (e.g., 160 km/h).
    We also found that the trip start timestamp information was unreliable,
e.g., some trips in the test set have starting timestamps 5 hours before the cut-
off timestamps of the corresponding snapshot but they contain only a few GPS
updates. Moreover, some taxi trips have very unusual trajectories, e.g., we noted
that some taxi drivers forgot to turn the meter off after completing a trip, for
instance in the way back to the city center after dropping a passenger at the
airport, such turning back trajectories were very difficult to predict.
    Considering these issues, we preprocessed the data accordingly and extracted
the features that are detailed in this section, which will serve as input to build
our predictive models for destination and trip time prediction.


2.1   Feature Extraction for Destination Prediction

For predicting the destination of a given incomplete test trip A, our method is
based on trip matching which uses destinations of trips in the training data with
similar trajectories to predict the final destination of the given test trip. Figure 1
shows an example of two trips with different starting points both going toward
the Porto airport. As we can see, the final part of the trips are very similar,
both trips took the same highways before ending at the same destination. We
captures such pattern by a set of features described as follows.
– Final destination coordinates and Haversine distance to 10 nearest neighbors:
for a given test trip A we search for 10 trips in the training data that are closest
to it. Similarly, for each pair of trips A and B we compute the mean Haversine
distance (Equation 1) between the corresponding points in their trajectories. We
ignored trips B that have fewer GPS updates than the ones in A.
– Kernel regression (KR) as a smooth version of k-NN regression method. Our
previous work on bus arrival time prediction shows that KR gives better results
than k-NN [6,8]. Destinations calculated by KR were used as features to estimate
the final destination of a test trip. KR requires to set the bandwidth parameter,
in our experiment, we set it to 0.005, 0.05 and 0.5 to obtain three different
KR-based destination predictions corresponding to these values.
– The aforementioned features from KR are sensitive to the metric used for evalu-
ating the similarity between trips. We experimented with dynamic time warping
and mean Haversine distance, but only the latter metric was chosen given its
efficiency in computation. Moreover, some trips may have very different initial
trajectory but still share the same destination, e.g., see Figure 1. Therefore, be-
sides computing KR predictions for the full trip, we also compute them using
only the last d meters of the ride during the trip matching step, where d ∈ {100,
200, 300, 400, 500, 700, 1000, 1200, 1500}. The models showed that the last 500
to 700 meters of the trips are the most important features for predicting both
latitude and longitude of the final destination.
– For a pair of trips A and B, we also consider the distance metric that looks for
the best match along their trajectories without alignment. The empirical results
showed that KR predictions by best matching gives more accurate results than
first aligning the initial part of A’s trajectory with B’s.
– The features described so far do not consider contextual information such
as the taxi id, call id, taxi stand, time of day, or day of the week, which are
very important, consider for example Figure 2, which shows a heatmap of the
destinations for all trips with origin call id 59708 in the training data, as we can
see the destination is quite regular, although overall, there are 155 trips, only
8 major destinations were identified. In our work, we exploit this information
by using KR to match a test trip with only trips with the same call id, taxi id,
day of the week, hour of the day, or taxi stand id. We called the KR models
based on these features contextual KR. When a contextual field of the test trip
has a missing value, the corresponding contextual KR was replaced by the KR
prediction with no contextual information. The empirical results showed that
contextual KR-based prediction on the call id gave the best results among all
KR contextual related features.
– We observed that there are many GPS updates that are erroneous, e.g., coor-
dinates completely outside the trajectory shape or some GPS updates are very
close together due to traffic jam or when a taxi is parked or stays still at the
same location with the meter on. These type of noisy GPS updates influence the
performance of the KR prediction, especially the KR using the mean Haversine
distance metrics which requires point to point comparison across trips. Therefore
we used the RDP algorithm [3,7] to simplify the trips with parameter ϵ equal
Figure 2. Heatmap of destinations of 155 trips with call id 59708. There are only
8 major destinations. Using call id information one can narrow down the possible
destinations of a given taxi.

to 1 × 10−6 , 5 × 10−6 , and 5 × 10−5 . From the simplified trips we extracted all
KR-related features as we did for the raw trip trajectories.
    Besides features extracted via trip matching we also added features extracted
directly from the partially observed trips:
– Euclidean distance traveled and Haversine distance between the first and the
last GPS update.
– Direction: to define whether the taxi is moving outside of the city or vice
versa. We compared the distance between the city center and the first and the
last point of the trips. If the former is larger the taxi is considered as moving
toward the city center and moving away (e.g., to the countryside) otherwise.
– Time gap between the cut-off timestamps and the starting timestamps. We
observed that this feature is not very reliable because the starting timestamps
are quite noisy.
– Number of GPS updates (this feature is also noisy due to missing values).
– Day of the week. We observed that the prediction error was higher for some
days of the week, but the Random Forest model used in our experiments did not
rank this feature high for destination prediction (cf. Section 4).
– The first and the last GPS location.


2.2   Feature Extraction for Trip Time Prediction

The set of features for the time prediction task is very similar to the set of
features for destination prediction, with the difference that travel time of closest
trips were considered as the target variable instead of destination. The features
extracted for this task are as follows.
– Travel time and Haversine distance to 10 nearest neighbors.
– Kernel regression features. All KR related features for destination prediction
were also extracted for time prediction, with the difference that travel time of
closest trips was considered as target variable instead of destination.
    In addition to, all features extracted from the partially observed trips as
described in the previous section, we also considered the following additional
particular features for time prediction which were extracted directly from the
incomplete trips observed (i.e., trips that are still ongoing):
– Average speed calculated on the last d meters of the partial trajectory observed
so far and on the entire incomplete trip, where d ∈ {10, 20, 50, 100, 200}.
These features convey up-to-date traffic condition at the moment of making a
prediction.
– Average speed calculated on all the incomplete trips observed so far with the
starting time no more than an hour apart from the cut-off time-stamp. These
features convey information about the traffic condition around the snapshot
timestamps.
– Average acceleration calculated on the last d meters of the incomplete trips
observed so far and on the entire incomplete trips, where d ∈ {10, 20, 50, 100,
200}.
– Shape complexity: the ratio between the (Euclidean) traveled distance and
the Haversine distance between the first and the last GPS location. Trips with
higher complexity (e.g., zig-zag trips) tend to be trips for which the taxi drivers
were driving around the city to search for passengers. Zig-zag trips tend to have
longer travel time so it is reasonable to identify those trips beforehand.
– Missing values in the GPS trace were identified by calculating the speed be-
tween any pair of consecutive GPS updates. If the estimated speed is over the
speed limit v̂ km/h even for only one pair of consecutive GPS updates in the
partially observed trip, the trip was labelled as a trip with missing GPS updates.
We used speed limits v̂ ∈ {100, 120, 140, 160} km/h. Trips with missing values
tend to have longer travel time.
    In total, we have 66 features for the trip time prediction tasks.


3     Predictive Modelling
In order to train the predictive models, first, we created a local training dataset
by considering 5 time snapshots from weekdays that resemble the disclosed test
set and was fixed for all trained models.
    Overall we extracted 13301 trips from these snapshots in the training data,
which is a small fraction of the 1.7 million trips in the original training set.
We also considered roughly 12000 additional trips from 5 snapshots with start
timestamps one hour after the specified test set snapshots. However, models
trained on these combined sets yielded improvement only locally but not on the
public leaderboard. Therefore, we decided to ignore the additional training trips
although in practice to obtain robust results one should consider these additional
data for training the models as well.
    We observed that feature extraction for the training set is not very efficient
because most features were extracted based on nearest neighbor search. In order
to speed up this process, we propose to use an index structure based on geohash 2 .
2
    https://en.wikipedia.org/wiki/Geohash
We first represent each GPS by its geohash, then we search for the nearest
trips via range queries to retrieve trips with a maximum distance threshold
of 1km from the first point of the test trip. This simple indexing technique
speeds up the nearest neighbor search significantly since range queries are very
efficient with geohashes. It is important to note that this technique is a heuristic
solution which does not guarantee exact nearest neighbors results. However, we
did not observe significant differences in the prediction results when exact or
approximated nearest neighbors were used.
     In the rest of this section, we describe in detail the different models for each
of the prediction tasks.


3.1   Models for Destination Prediction

Given the set of features, any regression model can be used to predict latitude
and longitude independently. To this end, we chose Random Forest (RF) [1]
given the robustness of its predictions observed in both the local validation set
and in the public LB test set. Besides RF models, we also experimented with
Support Vector Regression (SVR), but the results on the public LB did not differ
much, so we used only RF for the final destination prediction. Moreover, with
a RF model we can easily assess the contribution of each feature on the final
prediction. With this insight, we know whether a new set of features is relevant or
not every time it is added to the model. Thanks to this we could perform feature
selection using the rfcv function provided with the randomForest package in
R. With feature selection the obtained results did not differ on the public LB
but the results were more robust, in fact it needs fewer trees to obtain similar
prediction results.
    To handle outliers, we first trained an initial RF model with 2000 trees and
removed from training set all trips that prediction error was greater than 90% of
error quantile. Subsequently, a new RF model re-trained with the new training
set became our final model.


3.2   Ensemble for Trip Time Prediction

For the trip time prediction, before training the models we removed outliers with
a trip travel time that exceeded a median absolute deviation of 3.5. Furthermore,
we trained the models using as target variable the (log-transformed) delta time
between the cut-off time point of our training snapshot and the timestamp as-
sociated to the last point of the trajectory. That is, our goal is to train models
to predict the trip time remaining for a given ongoing trip. This preprocessing
strategy led to significantly better results.
    Then, we used the features described in Section 2.2 to train a model en-
semble for a robust prediction. The individual members of the ensemble include
the following regression models: Gradient Boosted Regression Trees (GBRT) [4],
Random Forest regressor (RF) [1], and Extremely Randomized Trees regres-
sor (ERT) [5].
    In order to produce a single predictor we follow a stacked generalization
(stacking) [10] approach summarized as follows:
– Remove from our training data, DT , a subset of the samples (i.e., trips) and
split them equally to form a validation set Dv and (local) test set Dt , i.e.,
|Dv | = |Dt |.
                                                     ′
– Train n models on the remaining training data DT = DT \ (Dv ∪ Dt ).
– For each model, compute the predictions for the validation set Pv .
– For the validation set the corresponding true trip time is known, which leads
to a supervised learning problem: using the predictions Pvn as the input and
the correct responses in Dv as the output, train a meta-regressor Me (Pvn ) to
ensemble the predictions.
                                                              ′
– Insert the validation set Dv back into the training data DT .
                                       ′
– Train the models again using the DT ∪ Dv training dataset.
– Predict for the test set Dt to obtain Pt .
– Ensemble the final prediction for the test set using Me (Ptn ).
    Table 2 details the models parametrization and predictive performance in
terms of the Root Mean Squared Logarithmic Error (RMSLE) defined in Equa-
tion 2. We experimented with regularized linear regression and with simple av-
erage for the meta-regressor to produce the final prediction. The results of our
approach are presented in Section 4.2.


4     Experimental Results
During the competition, the performance on the ground-truth test set held by
the organizers was only available for a 50% of the test trips and via a limited
number of submissions to the Kaggle’s website. Therefore, in order to analyze
the predictions results locally, we split our subset of training data with 13301
trips into two parts: local training and validation datasets. This section reports
the experimental results for the public and private LB test sets as well as for the
local validation set.

4.1   Destination Prediction
The evaluation metric for destination prediction is Mean Haversine Distance
(MHD), which measures distances between two points on a sphere based on
their latitude and longitude. The MHD can be computed as follows.
                                  r
                                       a 
           MHD = 2 · r · arctan             , where                        (1)
                                     1−a
                          ϕ2 − ϕ1                               λ2 − λ1 
                a = sin2            + cos(ϕ1 ) · cos(ϕ2 ) · sin2
                             2                                      2
and ϕ, λ, r = 6371km are the latitude, longitude, and the Earth’s radius, re-
spectively.
    Table 1(a) shows the mean Haversine distance from the predicted destination
to the correct destination on three different test sets. Note that for this task we
Table 1. Destination prediction error in terms of the Mean Haversine Distance (MHD).

                      (a)                                  (b)
     Dataset                MHD (km)    Day of the week and time    MHD (km)
     Public LB                2.27      Monday (17:45)                 2.24
     Private LB               2.14      Tuesday (08:45)                2.40
     Local validation set     2.39      Thursday (18:00)               2.48
                                        Saturday (04:00)               2.21
                                        Sunday (14:30)                 2.95


            A                                 B


        C                                      D


  Figure 3. Examples of trips on the LB test set and our prediction (blue circle).


used a 66%–34% training–validation split. We can see that the results are quite
consistent and the prediction errors on three different test sets are small.
    Table 1(b) reports the prediction error on the local test set at different snap-
shots. As we can see, among the snapshots, trips in early morning on Saturday
are easier to predict while trips on Sunday afternoon are very difficult to predict
with significantly higher prediction error. On Sundays, taxis tend to commit
longer trips with irregular destinations while early morning Saturdays taxis usu-
ally go to popular destinations like airports, train stations, or hospitals.
    Figure 3 shows four trips on the LB test set and our prediction. Some predic-
tions are just about one block away from the last GPS location (cases A and B),
while some predictions are very far away from the last location (cases C and D).
The later cases usually concern trips that took the highway to go to the other
side of the city by skipping the city center or go to the airport via A3 motorway.
4.2   Trip Time Prediction

For the trip time task, predictions are evaluated using the Root Mean Squared
Logarithmic Error (RMSLE), which is defined as follows:
                              v
                              u n
                              u1 X
                 RMSLE = t             (ln(pi + 1) − ln(ai + 1))2         (2)
                                 n i=1

where n is the total number of observations in the test data set, pi is the predicted
value, ai is the actual value for travel time for trip i, and ln is the natural
logarithm.
   Table 2 summarizes the predictive performance of the individual model mem-
bers of our ensemble locally using a 80%–20% training–validation split.


Table 2. Regression models for trip time prediction and the corresponding performance
(RMSLE) in our local test dataset. The models include Gradient Boosted Regression
Trees (GBRT), Random Forest regressor (RF), and Extremely Randomized Trees re-
gressor (ERT).

        Model     Parameters                                                                              RMSLE

        GBRT      Default parameter settings used for the GBRT models: learning rate=0.1, max depth=3,
                  max features=n features, min samples leaf=3, min samples split=3, n estimators=128
        GBRT-01   loss=‘squared loss’, subsample=1.0                                                      0.41508
        GBRT-02   loss=‘squared loss’, subsample=0.8                                                      0.41498
        GBRT-03   loss=‘least absolute deviation’, subsample=1.0                                          0.40886
        GBRT-04   loss=‘least absolute deviation’, subsample=0.8                                          0.40952
        GBRT-05   loss=‘huber’, subsample=1.0, alpha quantile=0.9                                         0.41218
        GBRT-06   loss=‘huber’, subsample=0.8, alpha quantile=0.9                                         0.41108
        GBRT-07   loss=‘huber’, subsample=1.0, alpha quantile=0.5                                         0.41000
        GBRT-08   loss=‘huber’, subsample=0.8, alpha quantile=0.5                                         0.40705
        GBRT-09   loss=‘quantile’, subsample=1.0, alpha quantile=0.5                                      0.40798
        GBRT-10   loss=‘quantile’, subsample=0.8, alpha quantile=0.5                                      0.40959

        RF        Default parameter settings used for the RF models: max depth=None, n estimators=2500,
                  use out of bag samples=False
        RF-01     max features=n features, min samples leaf=4, min samples split=2                        0.41674
        RF-02     max features=n features, min samples leaf=1, min samples split=1                        0.41872
        RF-03     max features=sqrt(n features),min samples leaf=4, min samples split=2                   0.41737
        RF-04     max features=log2 (n features),min samples leaf=4, min samples split=2                  0.41777
        RF-05     max features=n features,min samples leaf=1,min samples split=1,                         0.41865
                  use out of bag samples=True
        RF-06     max features=sqrt(n features),min samples leaf=4,min samples split=2,                   0.41731
                  use out of bag samples=True
        RF-07     max features=log2 (n features),min samples leaf=4,min samples split=2,                  0.41790
                  use out of bag samples=True

        ERT       Default parameter settings used for the RF models: max depth=None, n estimators=1000,
                  use out of bag samples=False
        ERT-01    max features=n features, min samples leaf=1, min samples split=1                        0.41676
        ERT-02    max features=n features, min samples leaf=1, min samples split=2                        0.41726
        ERT-03    max features=n features,min samples leaf=1,min samples split=2,                         0.41708
                  use out of bag samples=True
        ERT-04    max features=n features,min samples leaf=1,min samples split=1,                         0.41735
                  use out of bag samples=True, n estimators=3000
Table 3. Trip time prediction results on the public and private leaderboards in terms
of RMSLE, where the lower the value, the better.

        Model ensemble description                                         RMSLE
                                                              Public LB    Private LB    Average
        ME1: Model ensemble using the average to com-            0.49627       0.53097    0.51362
        pute the final prediction.
        ME2: Model ensemble using regularized linear re-         0.51925      0.51327     0.51626
        gression with L2 penalty as meta-regressor to com-
        pute the final prediction.
        ME3: Model ensemble using regularized linear re-         0.49855       0.52491   0.51173
        gression with L1 penalty (Lasso) as meta-regressor
        to obtain the final prediction.
        ME4: Model ensemble trained using trips with in-         0.49418       0.54083    0.51750
        stances whose number of GPS updates is between
        2 and 612, which reassemble the distribution ob-
        served in the partial trajectories of the test set.
        The final prediction is computed using mean of the
        individual predictions.
        ME5: Average prediction of model ensembles ME1,         0.49408        0.53563    0.51485
        ME2, M3, and M4 (3rd place on the public LB).
        ME6: Average prediction of model ensembles ME1,          0.49539       0.53097    0.51318
        ME2 and M3 (3rd place on the private LB).


    In Table 3 we present our most relevant results in the official test set of the
competition. The table shows the performance of our strategies in the public and
private leaderboards, which corresponds to a 50%–50% split of the hidden test
set available only to the organizers. Remember that the final ranking is solely
based on the private leaderboard from which the participants did not receive
any feedback during the competition.
    We achieved the best performance in the public leaderboard (RMSLE=0.49408)
using ME5, which corresponds to an average of model ensembles as specified in
Table 3, this result placed our team at the 3rd place. For the private leader-
board, we also achieved the top-3 final position using ME6 (RMSLE=0.53097).
ME6 is similar to the ME5 approach, but it does not include in its average the
predictions of ME4, which corresponds to the models trained using a subset of
the trips with number of GPS updates in the range between 2 and 612 points.
    Note that our ME2 strategy achieved a RMSLE of 0.51925 and 0.51327 in
the public and private leaderboards, respectively. The RMSLE of this entry in
the leaderboard would have achieved the top-1 position in the final rankings.
The competion rules allowed to select two final sets of predictions for the final
evaluation. Unfortunately, given the performance observed in the public leader-
board for which we had feedback, we did not select the ME2 approach as one of
final two candidate submissions.
    The model ensemble that uses regularized linear regression with L1 penalty
(Lasso) [9] achieves the best average performance for the whole test dataset with
an RMSLE=0.51173.
    We used NumPy3 and scikit-learn4 to implement our approach for trip time
prediction.
3
    http://www.numpy.org/
4
    http://scikit-learn.org/
5    Conclusions and Future work
We propose a data-driven approach for taxi destination and trip time prediction
based on trip matching. The experimental results show that our models exhibit
good performance for both prediction tasks and that our approach is very ro-
bust when compared to competitors’ solutions to the ECML/PKDD Discovery
Challenge. For future work, we plan to generalize our current approach to au-
tomatically adapt to contexts and select a particular set of features that would
improve predictive performance within the given context.


References
 1. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
 2. Discovery Challenge: On Learning from Taxi GPS Traces: ECML–PKDD.
    http://www.geolink.pt/ecmlpkdd2015-challenge/ (2015)
 3. Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points
    required to represent a digitized line or its caricature. Cartographica: The Inter-
    national Journal for Geographic Information and Geovisualization 10(2), 112–122
    (1973), doi:10.3138/FM57-6770-U75U-7727
 4. Friedman, J.H.: Greedy function approximation: A gradient boosting machine. The
    Annals of Statistics 29(5) (2001)
 5. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning
    63(1), 3–42 (2006)
 6. Lam, H.T., Bouillet, E.: Flexible sliding windows for kernel regression based bus
    arrival time prediction. In: In proceedings of ECML PKDD (2015)
 7. Ramer, U.: An iterative procedure for the polygonal approximation of plane curves.
    Computer Graphics and Image Processing 1(3), 244 – 256 (1972)
 8. Sinn, M., Yoon, J.W., Calabrese, F., Bouillet, E.: Predicting arrival times of buses
    using real-time gps measurements. In: Intelligent Transportation Systems (ITSC),
    2012 15th International IEEE Conference on. pp. 1227–1232. IEEE (2012)
 9. Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective.
    Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(3),
    273–282 (2011), http://dx.doi.org/10.1111/j.1467-9868.2011.00771.x
10. Wolpert, D.H.: Stacked Generalization. Neural Networks 5(2), 241 – 259 (1992)