=Paper=
{{Paper
|id=Vol-2841/BMDA_6
|storemode=property
|title=A tutorial on network-wide multi-horizon traffic forecasting with deep learning
|pdfUrl=https://ceur-ws.org/Vol-2841/BMDA_6.pdf
|volume=Vol-2841
|authors=Giovanni Buroni,Gianluca Bontempi,Karl Determe
|dblpUrl=https://dblp.org/rec/conf/edbt/BuroniBD21
}}
==A tutorial on network-wide multi-horizon traffic forecasting with deep learning==
<pdf width="1500px">https://ceur-ws.org/Vol-2841/BMDA_6.pdf</pdf>
<pre>
    A tutorial on network-wide multi-horizon traffic forecasting
                        with deep learning
                Giovanni Buroni                                        Gianluca Bontempi                                        Karl Determe
        Machine Learning Group, ULB                              Machine Learning Group, ULB                                  Bruxelles Mobilite
             Bruxelles, Belgium                                       Bruxelles, Belgium                                      Bruxelles, Belgium
              gburoni@ulb.be                                           gbonte@ulb.be

ABSTRACT                                                                                  In the recent literature for multi-horizon forecasting, DL meth-
Traffic flow forecasting is fundamental to today’s Intelligent                         ods can be categorised into (i) iterated approaches using autore-
Transportation Systems (ITS). It involves the task of learning                         gressive models or (ii) direct methods based on sequence-to-
traffic complex dynamics in order to predict future conditions.                        sequence models [7]. The former approaches use one-step-ahead
This is particularly challenging when it comes to predict the traf-                    prediction models where each model’s output is recursively fed
fic status for multiple horizons into the future and at the same                       into future inputs to obtain multi-step predictions. On contrary,
time for the entire transportation network. In this context deep                       direct methods (ii) are trained to explicitly generate forecasts
learning models have recently shown promising results. This                            for multiple predefined horizons in a single step. These models
models can inherently capture the non-linear space-temporal                            generally present better forecasting accuracy than the iterative-
correlations (ST) in traffic by taking advantage of the huge vol-                      based methods. In this paper, we present a tutorial to build and
ume of data available.                                                                 train a Direct LSTM encoder-decoder model. The model performs
In this study the authors present a LSTM encoder-decoder for                           predictions for traffic data related to the entire Belgian motorway
multi-horizon traffic flow predictions. We adopted a direct ap-                        network. The model is tested over two-weeks period in an online
proach in which the model simultaneously predict traffic con-                          fashion. The results show the DL model obtained better predic-
ditions for the entire Belgian motorway transport network at                           tive accuracy when compared to advanced seasonal persistence
each time step. The results clearly show the superiority of this                       model and other deep learning models. The data and models code
model when compared with other deep learning models. In the                            are available at https://www.kaggle.com/giobbu/.
workshop, conference attendees will learn how to process and
visualize mobility data, obtain optimal features for traffic flow                      2    NETWORK-WIDE FORECASTING
forecasting, build a LSTM encoder-decoder and perform predic-                          The aim of traffic forecasting is to predict future traffic conditions
tions in an online manner.                                                             given a sequence of historical traffic observations. These obser-
                                                                                       vations are detected by sensors, such as GPS, radio frequency
                                                                                       identification devices, multi-sensors, cameras and Internet tech-
1    INTRODUCTION                                                                      nology, that monitor the traffic status of roads in real-time. In
Traffic forecasting systems are crucial in modern cities. These                        our study, at each time step t, the traffic flow is monitored at S
systems can potentially provide accurate and timely informa-                           street segments of the entire transportation network. Hence, the
tion to public and private organizations by relying on data col-                       road network is modelled as multiple parallel time-series and
lected in real-time. These organizations can in turn take action                       represented by the matrix X [i,k ] :
through policy and lead to more sustainable mobility. These
aspects have enormous economic, social and environmental im-                                            x 1,1         ...   x 1,k    ...   x 1,S 
plications. Nowadays, thanks to the development of cutting-edge                                          .                     ..             .. 
technologies and advances in artificial intelligence, traffic fore-                                       ..                    .              . 
casting has reached results never seen before. In particular, the
                                                                                                           x
                                                                                                            i,1       ...   x i,k    ...   x i,S    (1)
advent of deep learning (DL) models allowed to take advantage                                                ..                 ..             .. 
of the enormous volume of mobility data to capture the complex                                                .                  .              . 
non-linear space-temporal relationships governing road traffic.                                                x t,1   ...   x t,k    ...   x t,S 
Moreover, they proved to be particularly effective when it comes                          where x i,k is the value of traffic flow at time i and street
to predict traffic for both multiple forecast horizons and for multi-                  segment k.
ple streets. In ITS context, the advantages of DL when compared                        The prediction task can be seen as the process of learning a
with traditional machine learning models are summarised as                             mapping function, f from X previously observed features to Z
follow [8, 9]:                                                                         future streets’ features, f : X → Z .
     • model huge volume of space-temporal traffic data;
     • perform multi-horizon road traffic predictions on large                         3    MULTI-HORIZON FORECASTING
       scale transportation networks;                                                  In the study, the goal is to perform multi-horizon forecasting of
     • achieve greater forecasting accuracy;                                           the traffic flow based on the current and past traffic conditions
     • comply to real-time requirements characterizing online                          of the entire transportation network. To achieve so, a direct
       forecasting systems in ITS;                                                     strategy is implemented for multivariate multi-horizon time-
                                                                                       series forecasts as discussed in Bontempi et al. [2].
© 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed-
ings of the EDBT/ICDT 2021 Joint Conference (March 23-26, 2021, Nicosia, Cyprus)
                                                                                       At each time step t, we obtain:
on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0)
                                                                                                                        f : X t → Z t +1                 (2)
   where Z t +1 is the matrix containing the multi-horizon traffic        4.2     Direct LSTM encoder-decoder
flow predictions, (t + 1, t + 2, ..., t + h) for all S street segments,           Architecture
and h is the forecast horizon.
                                                                          At first we frame our prediction problem by diving the time-
                                                                          dependent input features into two categories as shown in Figure
                                                                          1 : (i) the observed inputs (traffic flow of streets) which can only
                                                                          be retrieve at time step t and are unknown beforehand, and the
                                                                          time-based covariate inputs (features such as day-of-the-week at
                                                                          time t) which can be predetermined.
                                                                          The Direct LSTM encoder-decoder (D_lstm_ED) Architecture
                                                                          is shown in Figure 2. As the figure shows, the encoder module
                                                                          takes as input all past features (both observed roads’ traffic flows
                                                                          and known temporal features) and encode them into a latent
                                                                          space. This encoded "context" is then fed as internal state to the
                                                                          decoder module together with the known future covariates as
                                                                          inputs. After being trained, the model perform the forecast of the
                                                                          future (multiple) traffic flow values for the whole road network
                                                                          at each time step.
Figure 1: Feature engineering for multivariate multi-
horizon time-series forecasts with LSTM encoder-decoder.


4 LSTM ENCODER-DECODER
4.1 Preliminaries
   4.1.1 Sequence-to-Sequence Learning. In deep learning sequence-
to-sequence learning (Seq2Seq) is about training DL models to
convert sequences from one domain to sequences in another
domain. Multi-horizon traffic flow forecasting at transportation
network scale can be seen as Seq2Seq learning problem, where
the sequence of traffic flow observations from the past is learned
to predict the sequence of observations in the future. The use DL
models offers several advantages [4, 9]:
    • natively support sequence input data (sequence of flow
      observations);
    • directly support multiple parallel input sequences for mul-
      tivariate inputs (multiple street segments);
                                                                                Figure 2: Direct LSTM encoder-decoder architecture.
    • map input data directly to an output vector ( multi-horizon
      predictions).

   4.1.2 Long Short-Term Memory (LSTM). Long Short-Term                   5     EXPERIMENTAL SETTINGS
Memory (LSTM) network is a special kind of recurrent neural
                                                                             5.0.1 OBU Data . As from 2016, all owners of Belgian lorries
network (RNN), which is capable of learning short and long-term
                                                                          having a Maximum Authorized Mass in excess of 3.5 tonnes must
dependencies of the input data [5] LSTM unit is composed of
                                                                          pay a kilometre charge. Every road user who is not exempt from
three gates: input, forget and output gate. These gates determine
                                                                          the toll must then install an On Board Unit (OBU) recording the
whether or not to let new input in (input gate), delete the infor-
                                                                          distance that a lorry travels on Belgian public roads. Because of
mation because it is not important (forget gate) or to let it impact
                                                                          their value as mobility indicator, the OBU data are made avail-
the output at the current time step (output gate). LSTMs are now
                                                                          able to Bruxelles Mobilite’, the public administration responsible
widely used to deal with sequence data for learning the temporal
                                                                          for equipment, infrastructure and mobility issues in Bruxelles-
dependency of the space-temporal data.
                                                                          Capital Region. Each truck device sends a message approximately
   4.1.3 Encoder-Decoder architecture. The encoder-decoder mod-           every 30 seconds (from 3 a.m. to 2.59 a.m. of the following day).
els present a particular type of architecture particularly effective      Each OBU record contains an anonymous identifier (ID resetting
for Seq2Seq learning problem [9]. Here’s how it works:                    every day at 3 a.m.), the timestamp, the GPS position (latitude, lon-
                                                                          gitude), the speed (engine) and the direction (compass). Moreover,
    • Encoder Module: the input sequence of the encoder is
                                                                          OBU data includes vehicle data characteristics: weight category
      fed to the module. This generates an internal state that is
                                                                          (MAM), country code and European emission standards classi-
      passed as the "context" to the decoder.
                                                                          fication of the engine (EURO value). The large volume and the
    • Decoder Module: it is trained to predict the target se-
                                                                          streaming nature of the OBU data required the set up of a big
      quence, given the input sequence of the decoder and the
                                                                          data platform for an efficient collection, storage and analysis [3].
      memory state vectors from the encoder. The encoder’s
      state allows the decoder to obtain information about what              5.0.2 Data Processing & Time-based Covariates. Before ob-
      it is supposed to generate.                                         taining the matrix of Figure 1, the OBU data are processed. Firstly,
we consider two months period OBU data, from the 1st of January                                  (A_lstm_ED) where the decoder’s predictions are reinjected into
to the 28th of February, 2019, and the major motorways related                                   the decoder’s input.
to Belgium retrieved from OpenStreetMap1 with Module osmnx
                                                                                                    5.0.5 Forecasting Evaluation and Metrics. In order to fully ex-
in Python. Then, we filter the records with respect to their street
                                                                                                 ploit the properties unique of real-time data, we adopt the so
segments location and resample with a 30 minutes interval. The
                                                                                                 called interleaved-test-then-train evaluation scheme [1]. Each ob-
trucks are considered on the road network if their location falls
                                                                                                 servation is used to test the model before it is used for training,
within the polygons area representing the street segments 2 . We
                                                                                                 and from this the forecasting accuracy is incrementally updated.
took into consideration street segments with an average traffic
                                                                                                 Therefore, the model is always being tested on instance it has not
flow higher than 10 vehicles/half-hour. The number of street
                                                                                                 seen. To measure the forecasting accuracy of the competing fore-
segments (and consequently the number of time series analyzed)
                                                                                                 casting approaches we use the well-known Root Mean Squared
amounts to 5187 with 2832 observations (Fig. 5). Finally, starting
                                                                                                 Error, RMSE, and Mean Absolute Error, MAE. For multi-horizon
from Timestamp variable, we create the following covariates:
                                                                                                 forecasts these metrics are defined as follows:
     • a Sine and Cosine transformation of the Hours of the Day.
       This will ensure that the 0 and 23 hour for example are
                                                                                                                            s
                                                                                                                                Pt       PH
                                                                                                                                   i=1          (ei,h ) 2
       close to each other, thus accounting the cyclical nature of                                                RMSEi =                 h=1
                                                                                                                                                                 (5)
       the variable;                                                                                                                     nH
     • the Days of the Week. Each day of week shows particular                                                                Pt      PH
                                                                                                                                i=1       |e |
                                                                                                                                      h=1 i,h
       pattern of traffic flow;                                                                                     MAEi =                                       (6)
     • the Working/Week-end Days. There is a clear difference in                                                                     nH
       traffic flow between the working days and the week-end                                      where ei,h is the error of the forecast for period i and forecast
       days.                                                                                     horizon h.
   This features are deterministic and therefore are known in                                        5.0.6 Space-Temporal Resolution and Model Setting. We fore-
advance for future traffic flow predictions.                                                     cast the traffic flow (vehicles/half-hour) of each street segment
                                                                                                 in the Belgian motorway network up to 6 hours ahead, forecast
   5.0.3 Data Preparation for DL models. In order for the DL
                                                                                                 horizons from h = 1 to h = 12. From the whole data set we hold
model to learn, the sequence of observations must be transformed
                                                                                                 the last two weeks (672 observations) for testing our models. The
in the form of { nsampl es , nt imest eps , n f eatur es }. Therefore, we
                                                                                                 data must be scaled to values between 0 and 1 before we can use
transform X to a three dimensional tensor Tt to represent the
                                                                                                 them to train the DL model. The predictive models are initially
traffic flow data, txwxM, where t denotes the total number of
                                                                                                 fitted on the training set. Then, the models are incrementally
samples, w refers to length of sequence observations and M is
                                                                                                 updated, while subsequently adding new data from the test set
the total number of features:
                                                                                                 as described in previous section.
                                      (                        )
                                  Tt = X 1 , X 2 , . . . , X t                             (3)
                                                                                                                  Model     avgRMSE avgMAE
   where X i is the sample at time i. Thus, X t is the matrix:                                                   Baseline     11.67      7.25
                                                                                                                 D_lstm       11.28      7.03
          x t −w,1              ...      x t −w,k       ...       x t −w, M                                  A_lstm       11.30      7.12
                     ..                        ..                      ..                                  D_lstm_ED      10.66     6.63
                      .                         .                       .        
                                                                                                               A_lstm_ED      11.08      6.93
             x
                  t −w     +l,1   ...     x t −w +l,k     ...      x t −w +l, M         (4)
             
                        ..                        ..                      ..                            Table 1: Average values of RMSE and MAE.
                         .                         .                       .
                                                                                   
                                  ...                     ...
               
                   x t,1                     x t,k                    x t, M
                                                                                      

  The shape of each X t is what the model expects as input for
                                                                                                    5.0.7 Results. The obtained results are summarised in Table
each sample.
                                                                                                 1. The table shows the average value of RMSE and MAE metrics
   5.0.4 Seasonal Persistence and other DL models. As a base-                                    for both models over the testing set. The Direct LSTM encoder-
line to compare our DL model, we consider Seasonal Window                                        decoder presents better forecasting accuracy for both metrics.
Persistence Model (SW). Within a sliding window, observations                                    Further, since traffic over the road network may be extremely
at the same time and same day in previous three-weeks seasons                                    heterogeneous in terms of volatility, we estimated the normalised
are collected and the mean of those observations is returned as                                  metrics against the baseline model [6]. For example, N RMSE is
persisted forecast. As an example, if the data are hourly and the                                the ratio between the RMSE of the predictor and the baseline
forecasting target is 9 a.m. on Monday, then if the window size                                  RMSE. A ratio below 1 denotes that the considered predictor is
is 1 the observation of last Monday at 9 a.m. will be returned as                                at least more accurate than the baseline model. In Fig. 3-4, the
forecast. A window of size 2 means returning the average of the                                  average value of normalised metrics is computed for all forecast
observations of the last two Mondays at the same hour.                                           horizons separately. These figures highlight once again the su-
Moreover, we compare our model with other DL models: two                                         periority of Direct LSTM encoder-decoder when compared with
LSTM models that performs predictions respectively with a di-                                    other DL models especially for long-range forecast horizons.
rect approach (D_lstm) and with an iterated approach (A_lstm)
and a LSTM-based seq2seq architecture with iterated approach                                     6   OUTLINE
1 https://www.openstreetmap.org                                                                  In the workshop, conference attendees will learn how to perform
2 OBU Data processing is available at https://www.kaggle.com/giobbu/obu-data-                    traffic flow predictions at transportation network scale by em-
preprocessing                                                                                    ploying Direct LSTM encoder-decoder model (Fig. 5). A Jupyter
                  Figure 3: NRMSE Metric.                                                                 Figure 4: NMAE Metric.


                               Figure 5: Demostration Outline: from raw data to traffic flow online predictions.


Notebook will be provided in Kaggle3 so attendees can copy it             and icity.brussels11 for MOBIAID Project12 . The authors are also
and readily run the code and interact.                                    grateful to Bruxelles Mobilite for having provided the OBU data
The demonstration will be organise as follow:                             necessary for the work.
  (1) retrieve transportation networks from OpenStreetMap4
      and OSMNX5 in Python;                                               REFERENCES
  (2) process OBU data with Geopandas6 with different time                [1] Albert Bifet and Richard Kirkby. 2009. Data stream mining a practical approach.
                                                                              (2009).
      granularities and road networks;                                    [2] Gianluca Bontempi, Souhaib Ben Taieb, and Yann-Aël Le Borgne. 2012. Machine
  (3) create a dataframe in Pandas7 for traffic flow predictions              learning strategies for time series forecasting. In European business intelligence
                                                                              summer school. Springer, 62–77.
      and visualize it on Folium8 ;                                       [3] Giovanni Buroni, Yann-Ael Le Borgne, Gianluca Bontempi, and Karl Determe.
  (4) add temporal features and prepare the data in Tensorflow9 ;             2018. On-Board-Unit Data: A Big Data Platform for Scalable storage and Pro-
  (5) build and train a Direct LSTM encoder-decoder model in                  cessing. In 2018 4th International Conference on Cloud Computing Technologies
                                                                              and Applications (Cloudtech). IEEE, 1–5.
      Tensorflow;                                                         [4] Felix A Gers, Douglas Eck, and Jürgen Schmidhuber. 2002. Applying LSTM to
  (6) perform online predictions and visualise the results with               time series predictable through time-window approaches. In Neural Nets WIRN
      Matplotlib10 ;                                                          Vietri-01. Springer, 193–200.
                                                                          [5] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.
  (7) compare the results with a Seasonal Persistence and other               Neural computation 9, 8 (1997), 1735–1780.
      DL models.                                                          [6] Rob J Hyndman and Anne B Koehler. 2006. Another look at measures of forecast
                                                                              accuracy. International journal of forecasting 22, 4 (2006), 679–688.
                                                                          [7] Bryan Lim, Sercan O Arik, Nicolas Loeff, and Tomas Pfister. 2019. Temporal
ACKNOWLEDGMENTS                                                               fusion transformers for interpretable multi-horizon time series forecasting.
                                                                              arXiv preprint arXiv:1912.09363 (2019).
The authors acknowledge the support of Programme Opera-                   [8] Yisheng Lv, Yanjie Duan, Wenwen Kang, Zhengxi Li, and Fei-Yue Wang. 2014.
tionnel FEDER 2014-2020 de la Region de Bruxelles Capitale                    Traffic flow prediction with big data: a deep learning approach. IEEE Transac-
                                                                              tions on Intelligent Transportation Systems 16, 2 (2014), 865–873.
3 https://www.kaggle.com
                                                                          [9] Senzhang Wang, Jiannong Cao, and Philip Yu. 2020. Deep learning for spatio-
4 https://www.openstreetmap.org/                                              temporal data mining: A survey. IEEE Transactions on Knowledge and Data
5 https://github.com/gboeing/osmnx                                            Engineering (2020).
6 https://geopandas.org
7 https://pandas.pydata.org
8 hhttps://python-visualization.github.io/folium/
9 https://www.tensorflow.org                                              11 https://icity.brussels
10 https://matplotlib.org                                                 12 https://mlg.ulb.ac.be/

</pre>