=Paper=
{{Paper
|id=Vol-2841/BMDA_6
|storemode=property
|title=A tutorial on network-wide multi-horizon traffic forecasting with deep learning
|pdfUrl=https://ceur-ws.org/Vol-2841/BMDA_6.pdf
|volume=Vol-2841
|authors=Giovanni Buroni,Gianluca Bontempi,Karl Determe
|dblpUrl=https://dblp.org/rec/conf/edbt/BuroniBD21
}}
==A tutorial on network-wide multi-horizon traffic forecasting with deep learning==
A tutorial on network-wide multi-horizon traffic forecasting with deep learning Giovanni Buroni Gianluca Bontempi Karl Determe Machine Learning Group, ULB Machine Learning Group, ULB Bruxelles Mobilite Bruxelles, Belgium Bruxelles, Belgium Bruxelles, Belgium gburoni@ulb.be gbonte@ulb.be ABSTRACT In the recent literature for multi-horizon forecasting, DL meth- Traffic flow forecasting is fundamental to today’s Intelligent ods can be categorised into (i) iterated approaches using autore- Transportation Systems (ITS). It involves the task of learning gressive models or (ii) direct methods based on sequence-to- traffic complex dynamics in order to predict future conditions. sequence models [7]. The former approaches use one-step-ahead This is particularly challenging when it comes to predict the traf- prediction models where each model’s output is recursively fed fic status for multiple horizons into the future and at the same into future inputs to obtain multi-step predictions. On contrary, time for the entire transportation network. In this context deep direct methods (ii) are trained to explicitly generate forecasts learning models have recently shown promising results. This for multiple predefined horizons in a single step. These models models can inherently capture the non-linear space-temporal generally present better forecasting accuracy than the iterative- correlations (ST) in traffic by taking advantage of the huge vol- based methods. In this paper, we present a tutorial to build and ume of data available. train a Direct LSTM encoder-decoder model. The model performs In this study the authors present a LSTM encoder-decoder for predictions for traffic data related to the entire Belgian motorway multi-horizon traffic flow predictions. We adopted a direct ap- network. The model is tested over two-weeks period in an online proach in which the model simultaneously predict traffic con- fashion. The results show the DL model obtained better predic- ditions for the entire Belgian motorway transport network at tive accuracy when compared to advanced seasonal persistence each time step. The results clearly show the superiority of this model and other deep learning models. The data and models code model when compared with other deep learning models. In the are available at https://www.kaggle.com/giobbu/. workshop, conference attendees will learn how to process and visualize mobility data, obtain optimal features for traffic flow 2 NETWORK-WIDE FORECASTING forecasting, build a LSTM encoder-decoder and perform predic- The aim of traffic forecasting is to predict future traffic conditions tions in an online manner. given a sequence of historical traffic observations. These obser- vations are detected by sensors, such as GPS, radio frequency identification devices, multi-sensors, cameras and Internet tech- 1 INTRODUCTION nology, that monitor the traffic status of roads in real-time. In Traffic forecasting systems are crucial in modern cities. These our study, at each time step t, the traffic flow is monitored at S systems can potentially provide accurate and timely informa- street segments of the entire transportation network. Hence, the tion to public and private organizations by relying on data col- road network is modelled as multiple parallel time-series and lected in real-time. These organizations can in turn take action represented by the matrix X [i,k ] : through policy and lead to more sustainable mobility. These aspects have enormous economic, social and environmental im- x 1,1 ... x 1,k ... x 1,S plications. Nowadays, thanks to the development of cutting-edge . .. .. technologies and advances in artificial intelligence, traffic fore- .. . . casting has reached results never seen before. In particular, the x i,1 ... x i,k ... x i,S (1) advent of deep learning (DL) models allowed to take advantage .. .. .. of the enormous volume of mobility data to capture the complex . . . non-linear space-temporal relationships governing road traffic. x t,1 ... x t,k ... x t,S Moreover, they proved to be particularly effective when it comes where x i,k is the value of traffic flow at time i and street to predict traffic for both multiple forecast horizons and for multi- segment k. ple streets. In ITS context, the advantages of DL when compared The prediction task can be seen as the process of learning a with traditional machine learning models are summarised as mapping function, f from X previously observed features to Z follow [8, 9]: future streets’ features, f : X → Z . • model huge volume of space-temporal traffic data; • perform multi-horizon road traffic predictions on large 3 MULTI-HORIZON FORECASTING scale transportation networks; In the study, the goal is to perform multi-horizon forecasting of • achieve greater forecasting accuracy; the traffic flow based on the current and past traffic conditions • comply to real-time requirements characterizing online of the entire transportation network. To achieve so, a direct forecasting systems in ITS; strategy is implemented for multivariate multi-horizon time- series forecasts as discussed in Bontempi et al. [2]. © 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed- ings of the EDBT/ICDT 2021 Joint Conference (March 23-26, 2021, Nicosia, Cyprus) At each time step t, we obtain: on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) f : X t → Z t +1 (2) where Z t +1 is the matrix containing the multi-horizon traffic 4.2 Direct LSTM encoder-decoder flow predictions, (t + 1, t + 2, ..., t + h) for all S street segments, Architecture and h is the forecast horizon. At first we frame our prediction problem by diving the time- dependent input features into two categories as shown in Figure 1 : (i) the observed inputs (traffic flow of streets) which can only be retrieve at time step t and are unknown beforehand, and the time-based covariate inputs (features such as day-of-the-week at time t) which can be predetermined. The Direct LSTM encoder-decoder (D_lstm_ED) Architecture is shown in Figure 2. As the figure shows, the encoder module takes as input all past features (both observed roads’ traffic flows and known temporal features) and encode them into a latent space. This encoded "context" is then fed as internal state to the decoder module together with the known future covariates as inputs. After being trained, the model perform the forecast of the future (multiple) traffic flow values for the whole road network at each time step. Figure 1: Feature engineering for multivariate multi- horizon time-series forecasts with LSTM encoder-decoder. 4 LSTM ENCODER-DECODER 4.1 Preliminaries 4.1.1 Sequence-to-Sequence Learning. In deep learning sequence- to-sequence learning (Seq2Seq) is about training DL models to convert sequences from one domain to sequences in another domain. Multi-horizon traffic flow forecasting at transportation network scale can be seen as Seq2Seq learning problem, where the sequence of traffic flow observations from the past is learned to predict the sequence of observations in the future. The use DL models offers several advantages [4, 9]: • natively support sequence input data (sequence of flow observations); • directly support multiple parallel input sequences for mul- tivariate inputs (multiple street segments); Figure 2: Direct LSTM encoder-decoder architecture. • map input data directly to an output vector ( multi-horizon predictions). 4.1.2 Long Short-Term Memory (LSTM). Long Short-Term 5 EXPERIMENTAL SETTINGS Memory (LSTM) network is a special kind of recurrent neural 5.0.1 OBU Data . As from 2016, all owners of Belgian lorries network (RNN), which is capable of learning short and long-term having a Maximum Authorized Mass in excess of 3.5 tonnes must dependencies of the input data [5] LSTM unit is composed of pay a kilometre charge. Every road user who is not exempt from three gates: input, forget and output gate. These gates determine the toll must then install an On Board Unit (OBU) recording the whether or not to let new input in (input gate), delete the infor- distance that a lorry travels on Belgian public roads. Because of mation because it is not important (forget gate) or to let it impact their value as mobility indicator, the OBU data are made avail- the output at the current time step (output gate). LSTMs are now able to Bruxelles Mobilite’, the public administration responsible widely used to deal with sequence data for learning the temporal for equipment, infrastructure and mobility issues in Bruxelles- dependency of the space-temporal data. Capital Region. Each truck device sends a message approximately 4.1.3 Encoder-Decoder architecture. The encoder-decoder mod- every 30 seconds (from 3 a.m. to 2.59 a.m. of the following day). els present a particular type of architecture particularly effective Each OBU record contains an anonymous identifier (ID resetting for Seq2Seq learning problem [9]. Here’s how it works: every day at 3 a.m.), the timestamp, the GPS position (latitude, lon- gitude), the speed (engine) and the direction (compass). Moreover, • Encoder Module: the input sequence of the encoder is OBU data includes vehicle data characteristics: weight category fed to the module. This generates an internal state that is (MAM), country code and European emission standards classi- passed as the "context" to the decoder. fication of the engine (EURO value). The large volume and the • Decoder Module: it is trained to predict the target se- streaming nature of the OBU data required the set up of a big quence, given the input sequence of the decoder and the data platform for an efficient collection, storage and analysis [3]. memory state vectors from the encoder. The encoder’s state allows the decoder to obtain information about what 5.0.2 Data Processing & Time-based Covariates. Before ob- it is supposed to generate. taining the matrix of Figure 1, the OBU data are processed. Firstly, we consider two months period OBU data, from the 1st of January (A_lstm_ED) where the decoder’s predictions are reinjected into to the 28th of February, 2019, and the major motorways related the decoder’s input. to Belgium retrieved from OpenStreetMap1 with Module osmnx 5.0.5 Forecasting Evaluation and Metrics. In order to fully ex- in Python. Then, we filter the records with respect to their street ploit the properties unique of real-time data, we adopt the so segments location and resample with a 30 minutes interval. The called interleaved-test-then-train evaluation scheme [1]. Each ob- trucks are considered on the road network if their location falls servation is used to test the model before it is used for training, within the polygons area representing the street segments 2 . We and from this the forecasting accuracy is incrementally updated. took into consideration street segments with an average traffic Therefore, the model is always being tested on instance it has not flow higher than 10 vehicles/half-hour. The number of street seen. To measure the forecasting accuracy of the competing fore- segments (and consequently the number of time series analyzed) casting approaches we use the well-known Root Mean Squared amounts to 5187 with 2832 observations (Fig. 5). Finally, starting Error, RMSE, and Mean Absolute Error, MAE. For multi-horizon from Timestamp variable, we create the following covariates: forecasts these metrics are defined as follows: • a Sine and Cosine transformation of the Hours of the Day. This will ensure that the 0 and 23 hour for example are s Pt PH i=1 (ei,h ) 2 close to each other, thus accounting the cyclical nature of RMSEi = h=1 (5) the variable; nH • the Days of the Week. Each day of week shows particular Pt PH i=1 |e | h=1 i,h pattern of traffic flow; MAEi = (6) • the Working/Week-end Days. There is a clear difference in nH traffic flow between the working days and the week-end where ei,h is the error of the forecast for period i and forecast days. horizon h. This features are deterministic and therefore are known in 5.0.6 Space-Temporal Resolution and Model Setting. We fore- advance for future traffic flow predictions. cast the traffic flow (vehicles/half-hour) of each street segment in the Belgian motorway network up to 6 hours ahead, forecast 5.0.3 Data Preparation for DL models. In order for the DL horizons from h = 1 to h = 12. From the whole data set we hold model to learn, the sequence of observations must be transformed the last two weeks (672 observations) for testing our models. The in the form of { nsampl es , nt imest eps , n f eatur es }. Therefore, we data must be scaled to values between 0 and 1 before we can use transform X to a three dimensional tensor Tt to represent the them to train the DL model. The predictive models are initially traffic flow data, txwxM, where t denotes the total number of fitted on the training set. Then, the models are incrementally samples, w refers to length of sequence observations and M is updated, while subsequently adding new data from the test set the total number of features: as described in previous section. ( ) Tt = X 1 , X 2 , . . . , X t (3) Model avgRMSE avgMAE where X i is the sample at time i. Thus, X t is the matrix: Baseline 11.67 7.25 D_lstm 11.28 7.03 x t −w,1 ... x t −w,k ... x t −w, M A_lstm 11.30 7.12 .. .. .. D_lstm_ED 10.66 6.63 . . . A_lstm_ED 11.08 6.93 x t −w +l,1 ... x t −w +l,k ... x t −w +l, M (4) .. .. .. Table 1: Average values of RMSE and MAE. . . . ... ... x t,1 x t,k x t, M The shape of each X t is what the model expects as input for 5.0.7 Results. The obtained results are summarised in Table each sample. 1. The table shows the average value of RMSE and MAE metrics 5.0.4 Seasonal Persistence and other DL models. As a base- for both models over the testing set. The Direct LSTM encoder- line to compare our DL model, we consider Seasonal Window decoder presents better forecasting accuracy for both metrics. Persistence Model (SW). Within a sliding window, observations Further, since traffic over the road network may be extremely at the same time and same day in previous three-weeks seasons heterogeneous in terms of volatility, we estimated the normalised are collected and the mean of those observations is returned as metrics against the baseline model [6]. For example, N RMSE is persisted forecast. As an example, if the data are hourly and the the ratio between the RMSE of the predictor and the baseline forecasting target is 9 a.m. on Monday, then if the window size RMSE. A ratio below 1 denotes that the considered predictor is is 1 the observation of last Monday at 9 a.m. will be returned as at least more accurate than the baseline model. In Fig. 3-4, the forecast. A window of size 2 means returning the average of the average value of normalised metrics is computed for all forecast observations of the last two Mondays at the same hour. horizons separately. These figures highlight once again the su- Moreover, we compare our model with other DL models: two periority of Direct LSTM encoder-decoder when compared with LSTM models that performs predictions respectively with a di- other DL models especially for long-range forecast horizons. rect approach (D_lstm) and with an iterated approach (A_lstm) and a LSTM-based seq2seq architecture with iterated approach 6 OUTLINE 1 https://www.openstreetmap.org In the workshop, conference attendees will learn how to perform 2 OBU Data processing is available at https://www.kaggle.com/giobbu/obu-data- traffic flow predictions at transportation network scale by em- preprocessing ploying Direct LSTM encoder-decoder model (Fig. 5). A Jupyter Figure 3: NRMSE Metric. Figure 4: NMAE Metric. Figure 5: Demostration Outline: from raw data to traffic flow online predictions. Notebook will be provided in Kaggle3 so attendees can copy it and icity.brussels11 for MOBIAID Project12 . The authors are also and readily run the code and interact. grateful to Bruxelles Mobilite for having provided the OBU data The demonstration will be organise as follow: necessary for the work. (1) retrieve transportation networks from OpenStreetMap4 and OSMNX5 in Python; REFERENCES (2) process OBU data with Geopandas6 with different time [1] Albert Bifet and Richard Kirkby. 2009. Data stream mining a practical approach. (2009). granularities and road networks; [2] Gianluca Bontempi, Souhaib Ben Taieb, and Yann-Aël Le Borgne. 2012. Machine (3) create a dataframe in Pandas7 for traffic flow predictions learning strategies for time series forecasting. In European business intelligence summer school. Springer, 62–77. and visualize it on Folium8 ; [3] Giovanni Buroni, Yann-Ael Le Borgne, Gianluca Bontempi, and Karl Determe. (4) add temporal features and prepare the data in Tensorflow9 ; 2018. On-Board-Unit Data: A Big Data Platform for Scalable storage and Pro- (5) build and train a Direct LSTM encoder-decoder model in cessing. In 2018 4th International Conference on Cloud Computing Technologies and Applications (Cloudtech). IEEE, 1–5. Tensorflow; [4] Felix A Gers, Douglas Eck, and Jürgen Schmidhuber. 2002. Applying LSTM to (6) perform online predictions and visualise the results with time series predictable through time-window approaches. In Neural Nets WIRN Matplotlib10 ; Vietri-01. Springer, 193–200. [5] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. (7) compare the results with a Seasonal Persistence and other Neural computation 9, 8 (1997), 1735–1780. DL models. [6] Rob J Hyndman and Anne B Koehler. 2006. Another look at measures of forecast accuracy. International journal of forecasting 22, 4 (2006), 679–688. [7] Bryan Lim, Sercan O Arik, Nicolas Loeff, and Tomas Pfister. 2019. Temporal ACKNOWLEDGMENTS fusion transformers for interpretable multi-horizon time series forecasting. arXiv preprint arXiv:1912.09363 (2019). The authors acknowledge the support of Programme Opera- [8] Yisheng Lv, Yanjie Duan, Wenwen Kang, Zhengxi Li, and Fei-Yue Wang. 2014. tionnel FEDER 2014-2020 de la Region de Bruxelles Capitale Traffic flow prediction with big data: a deep learning approach. IEEE Transac- tions on Intelligent Transportation Systems 16, 2 (2014), 865–873. 3 https://www.kaggle.com [9] Senzhang Wang, Jiannong Cao, and Philip Yu. 2020. Deep learning for spatio- 4 https://www.openstreetmap.org/ temporal data mining: A survey. IEEE Transactions on Knowledge and Data 5 https://github.com/gboeing/osmnx Engineering (2020). 6 https://geopandas.org 7 https://pandas.pydata.org 8 hhttps://python-visualization.github.io/folium/ 9 https://www.tensorflow.org 11 https://icity.brussels 10 https://matplotlib.org 12 https://mlg.ulb.ac.be/