Use of LSTM for Short-Term and Long-Term Travel
                      Time Prediction

                         Irem Islek                                       Sule Gunduz Oguducu
           Department Of Computer Engineering                     Department Of Computer Engineering
               Istanbul Technical University                          Istanbul Technical University
                     Istanbul, Turkey                                        Istanbul, Turkey
                     isleki@itu.edu.tr                                     sgunduz@itu.edu.tr


                                                                 of these studies, time series models such as traditional
                                                                 ARIMA [2] or seasonal ARIMA [3] are applied for pre-
                          Abstract                               diction. These models use historical travel time data
                                                                 to fit the model and next step travel time is predicted
     Travel time prediction is an important com-                 using the fitted time series. Another common ap-
     ponent in intelligent transportation systems,               proach for travel time prediction is applying k−nearest
     and plays a key role in daily life. Predicting              neighbors model [4, 5]. In this model, most similar his-
     travel time for a trip is quite challenging and             torical k days are found by the model and the mean of
     has been studied by many researcher. How-                   these travel time values is accepted as the prediction
     ever, most of the studies focus on short term               result. In addition to these models, Kalman filtering
     travel time prediction. In this study, LSTM                 [6, 7], Support Vector Regression [8], Support Vector
     (Long-Short Term Memory) neural network                     Machines [9], Bayesian combination of models [10] are
     models are constructed to predict travel time               also applied for travel time prediction. These studies
     for both long term and short term using real                use current values of several features such as speed,
     world data of New York city. Results of this                traffic flow, weather condition to predict future travel
     study show that, LSTM provides satisfying re-               time.
     sults for long term travel time prediction as                  In the forthcoming years, neural networks come in
     well as short term.                                         the use for travel time prediction. There are several
                                                                 researches which use artificial neural network models
1    Introduction                                                for this problem [11, 12, 13, 14]. In addition to that,
                                                                 some deep learning models are also applied for travel
Traffic is a common problem of urban life and ITS                time prediction such as Deep Belief Network [15].
(Intelligent Transportation System) [1] which is an in-
                                                                    LSTM is also used for travel time prediction [16]. In
tegrated system of different IoT (Internet Of Things)
                                                                 this study, LSTM model is applied for only short term
data sensors, cameras, computers can provide a solu-
                                                                 travel time prediction. Travel time data of this study
tion for this problem. One of the most challenging part
                                                                 is obtained from Highways England. They emphasize
of ITS is travel time prediction because travel time is
                                                                 that deep learning models which take into account the
affected by numerous factors such as day of the week,
                                                                 sequence relation are quite promising in travel time
time of the day, weather conditions, road conditions
                                                                 prediction domain.
etc. Predicting travel time accurately for a future trip
can help people to plan their route more efficiently.               Although a research has been carried out on short
                                                                 term travel time prediction using LSTM, no single
   In recent years, there has been an increasing inter-
                                                                 study exists which uses LSTM for both short term
est in travel time prediction. For this reason, many
                                                                 and long term, in details. Another contribution of this
researchers focus on travel time prediction. In some
                                                                 paper is that, this is the first time LSTM is applied for
Copyright © CIKM 2018 for the individual papers by the papers'   long term travel time prediction problem, to the best of
authors. Copyright © CIKM 2018 for the volume as a collection    our knowledge. As it can be guessed, long term travel
by its editors. This volume and its papers are published under   time prediction is quite difficult than short term travel
the Creative Commons License Attribution 4.0 International (CC
                                                                 time prediction. In long term travel time prediction,
BY 4.0).
the main objective is predicting travel time value for a
                                                                         Table 1: Definition Of Fields
specific day and hour, at least one week ago. In sum-
mary, we can say that main objectives of this study            Field Name      Definition
are applying LSTM models for long term and short
                                                               Speed           Average speed a vehicle traveled
term travel time prediction using real world data of
                                                                               between end points on the link in
New York.
                                                                               the most recent interval
   The paper is organized as follows. Section 2
                                                               TravelTime      Time the average vehicle took to
presents the approach and details of the dataset. Sec-
                                                                               traverse the link
tion 3 describes the experiments and the results ob-
                                                               DataAsOf        The time of data was received
tained. Finally, in Section 4 we conclude and discuss
                                                                               from link
some possible future works.
                                                               LinkId          unique id of the link (a given
                                                                               street section)
2     Prediction Of Travel Time
                                                               LinkPoints      Sequence of Lat/ Long points,
In this study, we aim to predict travel time for short                         describes locations of the sensor
term and long term using a real world travel time data.                        links Google compatible polyline
Several different tests are done to investigate behavior                       from
of LSTM for travel time prediction problem. In this            Borough         NYC borough (Brooklyn, Bronx,
section, we first describe the data set used and then                          Manhattan, Queens, Staten Is-
give the details of the method we applied to solve the                         land)
problem of travel time prediction.                             LinkName        Description of the link location
                                                                               and end points
2.1   Dataset
The data set used in this study is obtained from New
York City Department Of Transportation (NYC DOT)            outlier elimination step helps us to discard the out of
which provides real-time traffic information from ma-       ordinary data rows. Then, LSTM model is applied to
jor arterials and highways within New York city using       the dataset in Prediction Model step. At the end of
numerous IoT sensors [17]. These IoT sensors are dis-       the workflow, the Evaluation step calculates the error
tributed within the five boroughs of New York City:         rates based on selected evaluation metric. The details
Brooklyn, Bronx, Manhattan, Queens, Staten Island.          of the whole process is given below.
Using a free service, this information can be collected
by users for use in application development. In this
dataset, ”link” represents a given street section.
   Rows of obtained data contain these fields: id,          2.2.1   Data Preparation Step
speed, travel time, data as of, link points, borough and
link name. Definition of these fields can be seen in Ta-    The aim of this study is predicting travel time for a se-
ble 1. The data are updated every five minutes and          lected link (a given street section) using previous travel
contain 135 different links within the five boroughs of     time data. In order to achieve this aim, a time series
New York City. The dataset contains real time traffic       is constructed using previous travel time information,
data of links from April 2015 up to the current date        for each link. This dataset contains travel time values
but it is updated regularly by NYC DOT. Data row            of each link for every 5 minutes. As can be seen in
count of each month is nearly 1,150,000. For each link,     Fig. 1, for predicting travel time of 5 minutes later us-
there are nearly 275 rows (travel time values) in one       ing this dataset, the link’s previous travel time values
day. The time intervals between each travel time val-       measured at 5 minute intervals should be used. On
ues of a link are 5 minutes.                                the other hand, for predicting travel time of 15 min-
   In this study, ”DataAsOf” column which represents        utes later using this dataset, previous travel time val-
timestamp of the row and ”TravelTime” column which          ues measured at 15 minute intervals should be used.
represents the travel time value of the link are used for   In case of long term travel time prediction, for pre-
time series modeling. For each link, travel time values     dicting the travel time value of a link at 8 o’clock on
are ordered by ”DataAsOf” column which represents           Monday, the data set should be generated using travel
date and time of the measurement.                           time values of previous weeks’ Monday at 8 o’clock.
                                                               Also, data preparation step transforms data rows to
2.2   LSTM For Travel Time Prediction
                                                            a sequential structure. Using sequential output of this
Our experimental system design consists of four steps.      step, a time series forecasting model can be applied
The first step is Data Preparation step. Afterwards,        easily for predicting next step travel time value.
Figure 1: Datasets for Short and Long Term Predic-                  Figure 2: Long-Short Term Memories
tion
                                                            xt , ht shows the output of the gate and
2.2.2   Outlier Elimination Step                            Wi , Wf , Wc , Wo , Ui , Uf , Uc , Uo , Vo correspond    to
Traffic prediction is quite difficult because of the fact   weight matrices. bi , bf , bc , bo are bias vectors in the
that traffic is affected by numerous different factors.     model equations of LSTM.
Some of these factors can be handled with special so-           First of all, input gate values (it ) and possible mem-
lutions but some other factors such as weather con-         ory cell state values C  ft are calculated using Eq. 1 and
ditions, special events, traffic accidents can be only      Eq. 2.
handled using additional information. Because of the
fact that we do not use additional dataset, these spe-                    it = σ(Wi xt + Ui ht−1 + bi )            (1)
cial situations are eliminated using an outlier detection
method.
   In this study, outliers are detected using the stan-                 C
                                                                        ft = tanh(Wc xt + Uc ht−1 + bc )           (2)
dard deviation of the last N travel time values. In our
case, N is selected as 4. If a travel time value is quite     After that, the forget gate activation can be seen in
different from the previous 4 travel time values, it is     Eq. 3.
considered to be an outlier in our approach. Accord-
ing to this moving average approach, the outliers are                    ft = σ(Wf xt + Uf ht−1 + bf )             (3)
eliminated from the dataset.
                                                               Using the Cft (candidate state value), it (input gate
2.2.3   Prediction Step                                     value) and ft (forget gate value), Ct which is the mem-
                                                            ory cell value can be calculated using Eq. 4.
In the prediction step of the methodology, LSTM
(Long-Short Term Memories) [18] neural network is
                                                                            Ct = it ∗ C
                                                                                      ft + ft ∗ Ct−1               (4)
selected as prediction model of time series. LSTM
network which is a special version of RNN (Recurrent           After state of the memory cell is calculated, the
Neural Networks), has chain of repeating modules in         value of output gate ot and output ht can be calculated
just the same way as RNN. In LSTM networks, each            using Eq. 5 and Eq. 6.
module which is called memory cell contains 3 differ-
ent gates: an input gate, an output gate and a forget                 ot = σ(Wo xt + Uo ht−1 + Vo Ct + bi )        (5)
gate. This memory cell is in the hidden layer of the
LSTM network. A figure of LSTM memory cell can
be seen in Fig 2.                                                               ht = ot tanh(Ct )                  (6)
   The basic job of a cell is remembering the temporal
state of the network. The input gate, output gate and          LSTM networks can be applied to several different
forget gate can be thought as a neuron of a neural          problems which contain prediction on sequential data.
network. The input gate is responsible for deciding         In this study, LSTM is applied for travel time predic-
whether a new value flows into the memory, or not.          tion which is a time series prediction problem. Chain
On the other hand, the output gate decides whether          like structure of LSTM network which is used for this
the memory cell state is going to have an effect on         study, can be seen in Fig. 3. Reason of choosing LSTM
other neurons or not. The forget gate allows the cell       as travel time prediction model is that LSTM neu-
to remember previous state or forget it.                    ral network model is a special kind of neural network
   In a given time t, the input of cell is                  which considers sequential relation.
                                                               In our LSTM experiments, there are a visible in-
                                                            put layer, a visible output layer and 3 hidden layers
                                                            with 4 LSTM neurons. Adam algorithm is used for
                                                            stochastic optimization [19]. The batch size is 1 in our
                                                            experiments.
                                                               In order to compare performance of the LSTM neu-
                                                            ral network on long term travel time prediction, we
                                                            conduct the a prediction model using ARIMA. The
                                                            results of these experiments can be seen in Table II.

                                                             Table 2: Long Term Travel Time Prediction Model
         Figure 3: LSTM Time Series Model
                                                                     Model        Error Rate % (MAPE)
2.2.4   Evaluation Step                                              ARIMA               20.9%
The error rates of the models are estimated using Mean               LSTM                11.2%
Average Percentage Error (MAPE). In Eq. 7 which
shows calculation formula of MAPE, At means actual             According to our tests, it can be said that LSTM
value and Ft means forecasting value.                       provides satisfying results for long term travel time
   Comparison of all experiments can be seen in the         prediction. Reason of this situation is that, LSTM is
next section, Experiments and Results.                      quite suitable for prediction on sequential data.
                              n                                Rest of the experiments are focus on short term
                         100 X |At − Ft |                   travel time prediction. In short term travel time pre-
             M AP E =                                (7)
                          n t=1 |At |                       diction case, we complete 2 different test cases: pre-
                                                            dicting 5 minutes later and predicting 15 minutes later.
3   Experiments and Results                                    For predicting travel time value of 5 minutes later,
                                                            the train datasets obtained using travel times of every
In this section, we describe the details of our experi-
                                                            5th minutes for previous 2 hours. For each link, this
ments and present the results obtained by our experi-
                                                            experiment is done 12 times. You can see the compari-
ments for travel time prediction.
                                                            son of LSTM and ARIMA models for predicting travel
   In the first group of experiments, LSTM network is
                                                            time of 5 minutes later, in Table III.
applied to long term travel time prediction problem.
                                                               As we mentioned before, there is another paper
In long term travel time prediction, the aim is predict-
                                                            which applies LSTM for short term travel time pre-
ing travel time for a specific day and specific time of
                                                            diction [16]. In their study, travel time value of 15
a week. In long term travel time prediction approach,
                                                            minutes later is predicted. In order to apply same
the datasets are constructed using travel time values
                                                            tests for our dataset, train datasets are obtained us-
on the same day and time of previous weeks on a par-
                                                            ing travel times of every 15th minutes for previous 2
ticular day’s previous weeks’ travel time values. For
                                                            hours. For each link, this experiment is repeated 12
instance, if we want to predict travel time value at 2
                                                            times. In addition to LSTM network model, ARIMA
p.m. on Wednesday, we construct the related time se-
                                                            model is also applied for predicting travel time of 15
ries dataset using travel time values on the same day
                                                            minutes later. The results of these tests can be seen
and time of previous weeks. Afterwards, the outlier
                                                            in Table III.
values are eliminated from the datasets because of the
fact that these outlier values originate from unusual        Table 3: Short Term Travel Time Prediction Model
situations such as traffic accident, bad weather con-
ditions, special occasions, etc. These cases cannot be           Test case        Model      Error Rate (MAPE)
handled without any extra information from other re-          5 minutes later    ARIMA             19.4%
sources; therefore outliers are ignored during time se-       5 minutes later    LSTM               9.8%
ries construction.                                           15 minutes later    ARIMA             23.2%
   The long term travel time series datasets are con-        15 minutes later    LSTM              12.7%
structed separately for each link, day and hour. For
each link, there are 2016 different test instances which
corresponds to data of one week. For each test in-             It is apparent from Table III that LSTM models
stances, related previous travel time values are selected   provide lower error rates than ARIMA models for both
for training the related model. There 153 different         cases. In addition to that, it can be said that predict-
links in the dataset.                                       ing 15 minutes later with LSTM network gives higher
error rates than predicting 5 minutes later with LSTM     such as bad weather conditions, traffic jam which is
network. Reason of this situation is that, previous 5     originated from social events, etc.
minutes gives more accurate information about travel
time value of 5 minutes later.
   Finally, some multi-step ahead predictions are per-
                                                          References
formed using LSTM and obtained results are shown in            [1] F.-Y. Wang, ”Parallel control and manage-
Table IV.                                                          ment for intelligent trans- portation sys-
   From the Table IV, it can be seen that the lowest               tems: Concepts, architectures, and appli-
error rates are provided by 1 step ahead predictions.              cations,” Intelligent Transportation Systems,
As the number of steps increases, the error rates are              IEEE Transactions on, vol. 11, no. 3, pp.630-
increases for both prediction models (model for 5 min-             638, 2010.
utes step interval and model for 15 minutes step in-
terval). What is interesting in Table IV is that 3 step        [2] D. Billings, J.S. Yang, ”Application of the
ahead prediction with 5 minutes interval model and 1               arima models to urban roadway travel time
step ahead prediction with 15 minutes interval model               prediction-a case study,” in Systems, Man
are both aim to predict 15 minutes later. 1 step ahead             and Cybernetics, 2006. SMC06. IEEE Inter-
prediction with 15 minutes interval model gives lower              national Conference on, 2006, pp.2529-2534.
error rate than the other. Because of the fact that
multi-step ahead prediction uses its own prediction re-        [3] A. Guin, ”Travel time prediction using a sea-
sults for predicting the next step, errors of each step            sonal autoregressive integrated moving av-
are cumulates.                                                     erage time series model,” in IEEE Interna-
                                                                   tional Conference on Intelligent Transporta-
Table 4: Short Term Travel Time Prediction - Multi                 tion Systems, pp. 493-498, 2006.
step Ahead LSTM
                                                               [4] J. Myung, D. K. Kim, S. Y. Kho, C. H.
                                                                   Park, ”Travel time prediction using k nearest
 Step ahead     Step interval   Error Rate (MAPE)                  neighbor method with combined data from
      1          5 minutes             9.8%                        vehicle detector system and automatic toll
      2          5 minutes            16.2%                        collection system,” Transportation Research
      3          5 minutes            21.4%                        Record: Journal of the Transportation Re-
      1         15 minutes            12.7%                        search Board, pp. 51-59, 2011.
      2         15 minutes            18.4%
      3         15 minutes            22.3%                    [5] B. Bustillos, Y. C. Chiu, ”Real-time freeway-
                                                                   experienced travel time prediction using N-
                                                                   curve and k nearest neighbor methods,”
                                                                   Transportation Research Record: Journal of
4   Conclusion                                                     the Transportation Research Board, pp. 127-
The main goal of the current study was to investigate              137, 2011. ISO 690
the performance of LSTM models for both short term
and long term travel time prediction. As far as we             [6] J. S. Yang, ”Travel time prediction using the
know, this is the first time LSTM is applied for long              GPS test vehicle and Kalman filtering tech-
term travel time prediction and obtained results show              niques,” In American Control Conference,
that it provides satisfying results for this problem.              pp. 2128-2133, 2005.
In addition to that, LSTM network model is applied
for short term prediction and it is shown that LSTM            [7] Y. Yuan, J. Van Lint, R. E. Wilson, V.
model for predicting 5 minutes later overcomes LSTM                Wageningen-Kessels, S. P. Hoogendoorn et
model for predicting 15 minutes later. Also, multi-                al., ”Real-time lagrangian traffic state esti-
step ahead prediction performances are measured for                mator for freeways,” IEEE Transactions on
short term predictions. Evaluation results show that               Intelligent Transportation Systems, vol. 13,
the 1-step ahead predictions give better results than              no. 1, pp. 59-70, Mar 2012.
multi-step ahead prediction. Taken together, these re-
sults suggest that LSTM network model is quite suit-           [8] C. H. Wu, J. M. Ho, D. T. Lee, ”Travel-time
able for travel time prediction problem. As a fearure              prediction with support vector regression,”
work, we are going to improve these results by involv-             IEEE transactions on intelligent transporta-
ing other data resources to predict outlier situations             tion systems, vol. 5, no.4, pp. 276-281, 2004.
 [9] L. Vanajakshi, L. R. Rilett, ”Support vector
     machine technique for the short term predic-
     tion of travel time,” in Intelligent Vehicles
     Symposium, 2007 IEEE, 2007, pp. 600-605.
[10] C. P. I. J. Van Hinsbergen, J. van Lint,
     ”Bayesian combination of travel time pre-
     diction models,” Transportation Research
     Record: Journal of the Transportation Re-
     search Board, pp. 73-80, 2008.
[11] D. Park, L. R. Rilett, G. Han, ”Spectral ba-
     sis neural networks for real-time travel time
     forecasting,” Journal of Transportation En-
     gineering, vol. 125, no. 6, pp. 515-523, 1999.
[12] J. W. C. Van Lint, S. P. Hoogendoorn,
     H. J. van Zuylen, ”Accurate freeway travel
     time prediction with state-space neural net-
     works under missing data,” Transportation
     Research Part C: Emerging Technologies,
     vol. 13, no. 5, pp. 347-369, 2005.
[13] J. Van Lint, S. Hoogendoorn, H. Van Zuylen,
     ”Freeway travel time prediction with state-
     space neural networks: modeling state-space
     dynamics with recurrent neural networks,”
     Transportation Research Record: Journal of
     the Transportation Research Board, pp. 30-
     39, 2002.
[14] D. Park, L. R. Rilett, G. Han, ”Travel-time
     prediction with support vector regression,”
     IEEE transactions on intelligent transporta-
     tion systems, vol. 5, no.4, pp. 276-281, 2004.
[15] C. Siripanpornchana, S. Panichpapiboon, P.
     Chaovalit, ”Travel-time prediction with deep
     learning,” in Region 10 Conference (TEN-
     CON), pp. 1859-1862, 2016.
[16] Y. Duan, Y. Lv, F.Y. Wang, ”Travel time
     prediction with LSTM neural network,” in
     Intelligent Transportation Systems (ITSC),
     2016 IEEE 19th International Conference on
     , 2016, pp. 1053-1058.
[17] City of New York Department of Trans-
     portation, https://data.cityofnewyork.us/
     Transportation/Real-Time-Traffic-Speed-
     Data/xsat-x5sa/data
[18] S. Hochreiter, J. Schmidhuber, ”Long short-
     term memory,” Neural computation, vol. 9,
     no.8, pp. 1735-1780, 1997.
[19] D. P. Kingma, J. Ba, ”Adam: A method
     for stochastic optimization,” arXiv preprint
     arXiv:1412.6980, 2014.