Use of LSTM for Short-Term and Long-Term Travel Time Prediction Irem Islek Sule Gunduz Oguducu Department Of Computer Engineering Department Of Computer Engineering Istanbul Technical University Istanbul Technical University Istanbul, Turkey Istanbul, Turkey isleki@itu.edu.tr sgunduz@itu.edu.tr of these studies, time series models such as traditional ARIMA [2] or seasonal ARIMA [3] are applied for pre- Abstract diction. These models use historical travel time data to fit the model and next step travel time is predicted Travel time prediction is an important com- using the fitted time series. Another common ap- ponent in intelligent transportation systems, proach for travel time prediction is applying k−nearest and plays a key role in daily life. Predicting neighbors model [4, 5]. In this model, most similar his- travel time for a trip is quite challenging and torical k days are found by the model and the mean of has been studied by many researcher. How- these travel time values is accepted as the prediction ever, most of the studies focus on short term result. In addition to these models, Kalman filtering travel time prediction. In this study, LSTM [6, 7], Support Vector Regression [8], Support Vector (Long-Short Term Memory) neural network Machines [9], Bayesian combination of models [10] are models are constructed to predict travel time also applied for travel time prediction. These studies for both long term and short term using real use current values of several features such as speed, world data of New York city. Results of this traffic flow, weather condition to predict future travel study show that, LSTM provides satisfying re- time. sults for long term travel time prediction as In the forthcoming years, neural networks come in well as short term. the use for travel time prediction. There are several researches which use artificial neural network models 1 Introduction for this problem [11, 12, 13, 14]. In addition to that, some deep learning models are also applied for travel Traffic is a common problem of urban life and ITS time prediction such as Deep Belief Network [15]. (Intelligent Transportation System) [1] which is an in- LSTM is also used for travel time prediction [16]. In tegrated system of different IoT (Internet Of Things) this study, LSTM model is applied for only short term data sensors, cameras, computers can provide a solu- travel time prediction. Travel time data of this study tion for this problem. One of the most challenging part is obtained from Highways England. They emphasize of ITS is travel time prediction because travel time is that deep learning models which take into account the affected by numerous factors such as day of the week, sequence relation are quite promising in travel time time of the day, weather conditions, road conditions prediction domain. etc. Predicting travel time accurately for a future trip can help people to plan their route more efficiently. Although a research has been carried out on short term travel time prediction using LSTM, no single In recent years, there has been an increasing inter- study exists which uses LSTM for both short term est in travel time prediction. For this reason, many and long term, in details. Another contribution of this researchers focus on travel time prediction. In some paper is that, this is the first time LSTM is applied for Copyright © CIKM 2018 for the individual papers by the papers' long term travel time prediction problem, to the best of authors. Copyright © CIKM 2018 for the volume as a collection our knowledge. As it can be guessed, long term travel by its editors. This volume and its papers are published under time prediction is quite difficult than short term travel the Creative Commons License Attribution 4.0 International (CC time prediction. In long term travel time prediction, BY 4.0). the main objective is predicting travel time value for a Table 1: Definition Of Fields specific day and hour, at least one week ago. In sum- mary, we can say that main objectives of this study Field Name Definition are applying LSTM models for long term and short Speed Average speed a vehicle traveled term travel time prediction using real world data of between end points on the link in New York. the most recent interval The paper is organized as follows. Section 2 TravelTime Time the average vehicle took to presents the approach and details of the dataset. Sec- traverse the link tion 3 describes the experiments and the results ob- DataAsOf The time of data was received tained. Finally, in Section 4 we conclude and discuss from link some possible future works. LinkId unique id of the link (a given street section) 2 Prediction Of Travel Time LinkPoints Sequence of Lat/ Long points, In this study, we aim to predict travel time for short describes locations of the sensor term and long term using a real world travel time data. links Google compatible polyline Several different tests are done to investigate behavior from of LSTM for travel time prediction problem. In this Borough NYC borough (Brooklyn, Bronx, section, we first describe the data set used and then Manhattan, Queens, Staten Is- give the details of the method we applied to solve the land) problem of travel time prediction. LinkName Description of the link location and end points 2.1 Dataset The data set used in this study is obtained from New York City Department Of Transportation (NYC DOT) outlier elimination step helps us to discard the out of which provides real-time traffic information from ma- ordinary data rows. Then, LSTM model is applied to jor arterials and highways within New York city using the dataset in Prediction Model step. At the end of numerous IoT sensors [17]. These IoT sensors are dis- the workflow, the Evaluation step calculates the error tributed within the five boroughs of New York City: rates based on selected evaluation metric. The details Brooklyn, Bronx, Manhattan, Queens, Staten Island. of the whole process is given below. Using a free service, this information can be collected by users for use in application development. In this dataset, ”link” represents a given street section. Rows of obtained data contain these fields: id, 2.2.1 Data Preparation Step speed, travel time, data as of, link points, borough and link name. Definition of these fields can be seen in Ta- The aim of this study is predicting travel time for a se- ble 1. The data are updated every five minutes and lected link (a given street section) using previous travel contain 135 different links within the five boroughs of time data. In order to achieve this aim, a time series New York City. The dataset contains real time traffic is constructed using previous travel time information, data of links from April 2015 up to the current date for each link. This dataset contains travel time values but it is updated regularly by NYC DOT. Data row of each link for every 5 minutes. As can be seen in count of each month is nearly 1,150,000. For each link, Fig. 1, for predicting travel time of 5 minutes later us- there are nearly 275 rows (travel time values) in one ing this dataset, the link’s previous travel time values day. The time intervals between each travel time val- measured at 5 minute intervals should be used. On ues of a link are 5 minutes. the other hand, for predicting travel time of 15 min- In this study, ”DataAsOf” column which represents utes later using this dataset, previous travel time val- timestamp of the row and ”TravelTime” column which ues measured at 15 minute intervals should be used. represents the travel time value of the link are used for In case of long term travel time prediction, for pre- time series modeling. For each link, travel time values dicting the travel time value of a link at 8 o’clock on are ordered by ”DataAsOf” column which represents Monday, the data set should be generated using travel date and time of the measurement. time values of previous weeks’ Monday at 8 o’clock. Also, data preparation step transforms data rows to 2.2 LSTM For Travel Time Prediction a sequential structure. Using sequential output of this Our experimental system design consists of four steps. step, a time series forecasting model can be applied The first step is Data Preparation step. Afterwards, easily for predicting next step travel time value. Figure 1: Datasets for Short and Long Term Predic- Figure 2: Long-Short Term Memories tion xt , ht shows the output of the gate and 2.2.2 Outlier Elimination Step Wi , Wf , Wc , Wo , Ui , Uf , Uc , Uo , Vo correspond to Traffic prediction is quite difficult because of the fact weight matrices. bi , bf , bc , bo are bias vectors in the that traffic is affected by numerous different factors. model equations of LSTM. Some of these factors can be handled with special so- First of all, input gate values (it ) and possible mem- lutions but some other factors such as weather con- ory cell state values C ft are calculated using Eq. 1 and ditions, special events, traffic accidents can be only Eq. 2. handled using additional information. Because of the fact that we do not use additional dataset, these spe- it = σ(Wi xt + Ui ht−1 + bi ) (1) cial situations are eliminated using an outlier detection method. In this study, outliers are detected using the stan- C ft = tanh(Wc xt + Uc ht−1 + bc ) (2) dard deviation of the last N travel time values. In our case, N is selected as 4. If a travel time value is quite After that, the forget gate activation can be seen in different from the previous 4 travel time values, it is Eq. 3. considered to be an outlier in our approach. Accord- ing to this moving average approach, the outliers are ft = σ(Wf xt + Uf ht−1 + bf ) (3) eliminated from the dataset. Using the Cft (candidate state value), it (input gate 2.2.3 Prediction Step value) and ft (forget gate value), Ct which is the mem- ory cell value can be calculated using Eq. 4. In the prediction step of the methodology, LSTM (Long-Short Term Memories) [18] neural network is Ct = it ∗ C ft + ft ∗ Ct−1 (4) selected as prediction model of time series. LSTM network which is a special version of RNN (Recurrent After state of the memory cell is calculated, the Neural Networks), has chain of repeating modules in value of output gate ot and output ht can be calculated just the same way as RNN. In LSTM networks, each using Eq. 5 and Eq. 6. module which is called memory cell contains 3 differ- ent gates: an input gate, an output gate and a forget ot = σ(Wo xt + Uo ht−1 + Vo Ct + bi ) (5) gate. This memory cell is in the hidden layer of the LSTM network. A figure of LSTM memory cell can be seen in Fig 2. ht = ot tanh(Ct ) (6) The basic job of a cell is remembering the temporal state of the network. The input gate, output gate and LSTM networks can be applied to several different forget gate can be thought as a neuron of a neural problems which contain prediction on sequential data. network. The input gate is responsible for deciding In this study, LSTM is applied for travel time predic- whether a new value flows into the memory, or not. tion which is a time series prediction problem. Chain On the other hand, the output gate decides whether like structure of LSTM network which is used for this the memory cell state is going to have an effect on study, can be seen in Fig. 3. Reason of choosing LSTM other neurons or not. The forget gate allows the cell as travel time prediction model is that LSTM neu- to remember previous state or forget it. ral network model is a special kind of neural network In a given time t, the input of cell is which considers sequential relation. In our LSTM experiments, there are a visible in- put layer, a visible output layer and 3 hidden layers with 4 LSTM neurons. Adam algorithm is used for stochastic optimization [19]. The batch size is 1 in our experiments. In order to compare performance of the LSTM neu- ral network on long term travel time prediction, we conduct the a prediction model using ARIMA. The results of these experiments can be seen in Table II. Table 2: Long Term Travel Time Prediction Model Figure 3: LSTM Time Series Model Model Error Rate % (MAPE) 2.2.4 Evaluation Step ARIMA 20.9% The error rates of the models are estimated using Mean LSTM 11.2% Average Percentage Error (MAPE). In Eq. 7 which shows calculation formula of MAPE, At means actual According to our tests, it can be said that LSTM value and Ft means forecasting value. provides satisfying results for long term travel time Comparison of all experiments can be seen in the prediction. Reason of this situation is that, LSTM is next section, Experiments and Results. quite suitable for prediction on sequential data. n Rest of the experiments are focus on short term 100 X |At − Ft | travel time prediction. In short term travel time pre- M AP E = (7) n t=1 |At | diction case, we complete 2 different test cases: pre- dicting 5 minutes later and predicting 15 minutes later. 3 Experiments and Results For predicting travel time value of 5 minutes later, the train datasets obtained using travel times of every In this section, we describe the details of our experi- 5th minutes for previous 2 hours. For each link, this ments and present the results obtained by our experi- experiment is done 12 times. You can see the compari- ments for travel time prediction. son of LSTM and ARIMA models for predicting travel In the first group of experiments, LSTM network is time of 5 minutes later, in Table III. applied to long term travel time prediction problem. As we mentioned before, there is another paper In long term travel time prediction, the aim is predict- which applies LSTM for short term travel time pre- ing travel time for a specific day and specific time of diction [16]. In their study, travel time value of 15 a week. In long term travel time prediction approach, minutes later is predicted. In order to apply same the datasets are constructed using travel time values tests for our dataset, train datasets are obtained us- on the same day and time of previous weeks on a par- ing travel times of every 15th minutes for previous 2 ticular day’s previous weeks’ travel time values. For hours. For each link, this experiment is repeated 12 instance, if we want to predict travel time value at 2 times. In addition to LSTM network model, ARIMA p.m. on Wednesday, we construct the related time se- model is also applied for predicting travel time of 15 ries dataset using travel time values on the same day minutes later. The results of these tests can be seen and time of previous weeks. Afterwards, the outlier in Table III. values are eliminated from the datasets because of the fact that these outlier values originate from unusual Table 3: Short Term Travel Time Prediction Model situations such as traffic accident, bad weather con- ditions, special occasions, etc. These cases cannot be Test case Model Error Rate (MAPE) handled without any extra information from other re- 5 minutes later ARIMA 19.4% sources; therefore outliers are ignored during time se- 5 minutes later LSTM 9.8% ries construction. 15 minutes later ARIMA 23.2% The long term travel time series datasets are con- 15 minutes later LSTM 12.7% structed separately for each link, day and hour. For each link, there are 2016 different test instances which corresponds to data of one week. For each test in- It is apparent from Table III that LSTM models stances, related previous travel time values are selected provide lower error rates than ARIMA models for both for training the related model. There 153 different cases. In addition to that, it can be said that predict- links in the dataset. ing 15 minutes later with LSTM network gives higher error rates than predicting 5 minutes later with LSTM such as bad weather conditions, traffic jam which is network. Reason of this situation is that, previous 5 originated from social events, etc. minutes gives more accurate information about travel time value of 5 minutes later. Finally, some multi-step ahead predictions are per- References formed using LSTM and obtained results are shown in [1] F.-Y. Wang, ”Parallel control and manage- Table IV. ment for intelligent trans- portation sys- From the Table IV, it can be seen that the lowest tems: Concepts, architectures, and appli- error rates are provided by 1 step ahead predictions. cations,” Intelligent Transportation Systems, As the number of steps increases, the error rates are IEEE Transactions on, vol. 11, no. 3, pp.630- increases for both prediction models (model for 5 min- 638, 2010. utes step interval and model for 15 minutes step in- terval). What is interesting in Table IV is that 3 step [2] D. Billings, J.S. Yang, ”Application of the ahead prediction with 5 minutes interval model and 1 arima models to urban roadway travel time step ahead prediction with 15 minutes interval model prediction-a case study,” in Systems, Man are both aim to predict 15 minutes later. 1 step ahead and Cybernetics, 2006. SMC06. IEEE Inter- prediction with 15 minutes interval model gives lower national Conference on, 2006, pp.2529-2534. error rate than the other. Because of the fact that multi-step ahead prediction uses its own prediction re- [3] A. Guin, ”Travel time prediction using a sea- sults for predicting the next step, errors of each step sonal autoregressive integrated moving av- are cumulates. erage time series model,” in IEEE Interna- tional Conference on Intelligent Transporta- Table 4: Short Term Travel Time Prediction - Multi tion Systems, pp. 493-498, 2006. step Ahead LSTM [4] J. Myung, D. K. Kim, S. Y. Kho, C. H. Park, ”Travel time prediction using k nearest Step ahead Step interval Error Rate (MAPE) neighbor method with combined data from 1 5 minutes 9.8% vehicle detector system and automatic toll 2 5 minutes 16.2% collection system,” Transportation Research 3 5 minutes 21.4% Record: Journal of the Transportation Re- 1 15 minutes 12.7% search Board, pp. 51-59, 2011. 2 15 minutes 18.4% 3 15 minutes 22.3% [5] B. Bustillos, Y. C. Chiu, ”Real-time freeway- experienced travel time prediction using N- curve and k nearest neighbor methods,” Transportation Research Record: Journal of 4 Conclusion the Transportation Research Board, pp. 127- The main goal of the current study was to investigate 137, 2011. ISO 690 the performance of LSTM models for both short term and long term travel time prediction. As far as we [6] J. S. Yang, ”Travel time prediction using the know, this is the first time LSTM is applied for long GPS test vehicle and Kalman filtering tech- term travel time prediction and obtained results show niques,” In American Control Conference, that it provides satisfying results for this problem. pp. 2128-2133, 2005. In addition to that, LSTM network model is applied for short term prediction and it is shown that LSTM [7] Y. Yuan, J. Van Lint, R. E. Wilson, V. model for predicting 5 minutes later overcomes LSTM Wageningen-Kessels, S. P. Hoogendoorn et model for predicting 15 minutes later. Also, multi- al., ”Real-time lagrangian traffic state esti- step ahead prediction performances are measured for mator for freeways,” IEEE Transactions on short term predictions. Evaluation results show that Intelligent Transportation Systems, vol. 13, the 1-step ahead predictions give better results than no. 1, pp. 59-70, Mar 2012. multi-step ahead prediction. Taken together, these re- sults suggest that LSTM network model is quite suit- [8] C. H. Wu, J. M. Ho, D. T. Lee, ”Travel-time able for travel time prediction problem. As a fearure prediction with support vector regression,” work, we are going to improve these results by involv- IEEE transactions on intelligent transporta- ing other data resources to predict outlier situations tion systems, vol. 5, no.4, pp. 276-281, 2004. [9] L. Vanajakshi, L. R. Rilett, ”Support vector machine technique for the short term predic- tion of travel time,” in Intelligent Vehicles Symposium, 2007 IEEE, 2007, pp. 600-605. [10] C. P. I. J. Van Hinsbergen, J. van Lint, ”Bayesian combination of travel time pre- diction models,” Transportation Research Record: Journal of the Transportation Re- search Board, pp. 73-80, 2008. [11] D. Park, L. R. Rilett, G. Han, ”Spectral ba- sis neural networks for real-time travel time forecasting,” Journal of Transportation En- gineering, vol. 125, no. 6, pp. 515-523, 1999. [12] J. W. C. Van Lint, S. P. Hoogendoorn, H. J. van Zuylen, ”Accurate freeway travel time prediction with state-space neural net- works under missing data,” Transportation Research Part C: Emerging Technologies, vol. 13, no. 5, pp. 347-369, 2005. [13] J. Van Lint, S. Hoogendoorn, H. Van Zuylen, ”Freeway travel time prediction with state- space neural networks: modeling state-space dynamics with recurrent neural networks,” Transportation Research Record: Journal of the Transportation Research Board, pp. 30- 39, 2002. [14] D. Park, L. R. Rilett, G. Han, ”Travel-time prediction with support vector regression,” IEEE transactions on intelligent transporta- tion systems, vol. 5, no.4, pp. 276-281, 2004. [15] C. Siripanpornchana, S. Panichpapiboon, P. Chaovalit, ”Travel-time prediction with deep learning,” in Region 10 Conference (TEN- CON), pp. 1859-1862, 2016. [16] Y. Duan, Y. Lv, F.Y. Wang, ”Travel time prediction with LSTM neural network,” in Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on , 2016, pp. 1053-1058. [17] City of New York Department of Trans- portation, https://data.cityofnewyork.us/ Transportation/Real-Time-Traffic-Speed- Data/xsat-x5sa/data [18] S. Hochreiter, J. Schmidhuber, ”Long short- term memory,” Neural computation, vol. 9, no.8, pp. 1735-1780, 1997. [19] D. P. Kingma, J. Ba, ”Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.