=Paper=
{{Paper
|id=Vol-1392/paper-12
|storemode=property
|title=Car-Traffic Forecasting: A Representation Learning Approach
|pdfUrl=https://ceur-ws.org/Vol-1392/paper-12.pdf
|volume=Vol-1392
|dblpUrl=https://dblp.org/rec/conf/icml/ZiatCBD15
}}
==Car-Traffic Forecasting: A Representation Learning Approach==
<pdf width="1500px">https://ceur-ws.org/Vol-1392/paper-12.pdf</pdf>
<pre>
                Car-traffic forecasting: A representation learning approach


Ali Ziat, Gabriella Contardo, Nicolas Baskiotis, Ludovic Denoyer               FIRSTNAME . LASTNAME @ LIP 6. FR
Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France


                          Abstract                                   allowing a fast optimization over large scale datasets.
     We address the problem of learning over multiple
     inter-dependent temporal sequences where de-
     pendencies are modeled by a graph. We propose                   2. Context and Model
     a model that is able to simultaneously fill in miss-
                                                                     2.1. Notations and Tasks
     ing values and predict future ones. This approach
     is based on representation learning techniques,                 Let us consider a set of n temporal sequences x1 , ..xn such
                                                                            (t)
     where temporal data are represented in a latent                 that xi ∈ X is the value of the i-th sequence at time t
     vector space. Information completion (missing                                           (1)   (T )
                                                                     defined by xi = (xi , .., xi ) . In the case where X is
     values) and prediction are then performed on this                 m
                                                                     R , the context corresponds to multiple multivariate time
     latent representation. In particular, the model al-             series. The sequences contain missing values so we also
     lows us to perform both tasks using a unique for-                                   (t)            (t)              (t)
                                                                     define a mask mi such that mi = 1 if value xi is ob-
     malism, whereas most often they are addressed
                                                                     served - and thus available for training the system - and
     separately using different methods. The model                     (t)           (t)
                                                                     mi = 0 if xi is missing - and thus has to be predicted
     has been tested for a concrete application: car-
                                                                     by the model. In addition, we consider that there exists a
     traffic forecasting where each time series char-
                                                                     set of relations between the sequences which correspond to
     acterizes a particular road and where the graph
                                                                     an external information, like spatial proximity for example
     structure corresponds to the road map of the city.
                                                                     when X is discrete. The sequences are thus organized in a
                                                                     graph G = {ei,j } such that ei,j = 1 means that xi and xj
                                                                     are related, and ei,j = 0 elsewhere.
1. Introduction
Traffic data has particular characteristics that can not be          2.2. Model
fully handled by classical sequential and temporal models:
                                                                     The RepresentAtIoN-baSed TempOral Relational Model
they contain multiple missing values, and one has to con-
                                                                     (RAINSTORM) is a loss-based model which is described
sider simultaneously multiple sources that can be somehow
                                                                     through a continuous derivable loss function that will be
related, by spatial proximity for example.
                                                                     optimized using classical optimization techniques.
We propose a novel method that aims at integrating these
                                                                     Let us define L(θ, γ, z) the loss function to minimize where
aspects in one single model. The proposed approach is                                                   (t)
                                                                     z is the set of all the vectors zi for i ∈ [1..n] and t ∈
based on representation learning techniques aiming at
                                                                     [1..T ], T being the size of the observed time windows i.e.
projecting the observations in a continuous latent space,
                                                                     the history of the time series. We define L as:
each sequence being modeled at each time-step by a
point in this space. It has many advantages w.r.t existing
                                                                                    1 X X (t)
                                                                                       n     T
                                                                                                      (t)   (t)
techniques: it is able to simultaneously learn how to fill           L(θ, γ, z) =            m ∆(fθ (zi ), xi ) (term 1)
missing values and to predict the future of the observed                            O i=1 t=1 i
temporal data, avoiding to use two different models, and                                 X X
                                                                                         n T −1
                                                                                                      (t+1)            (t)
it naturally allows one to deal with information sources                        + λdyn             ||zi        − hγ (zi )||2 (term 2)
that are organized among a graph structure. Moreover,                                    i=1 t=1
the model is based on continuous optimization schemes,                                           X        X
                                                                                                          T
                                                                                                                     (t)     (t)
                                                                                + λstruct                      ei,j ||zi − zj ||2 (term 3)
Proceedings of the 2 nd International Workshop on Mining Urban                              i,j∈[1..N ]2 t=1
Data, Lille, France, 2015. Copyright c 2015 for this paper by its                                                           (1)
authors. Copying permitted for private and academic purposes.
                                                                     where O is the number of observed values i.e. values such


                                                                    38
                                 Car-traffic forecasting: A representation learning approach

      (t)
that mi = 1.                                                     2014).
                                                                 For the prediction task, we consider:
This loss function contains three terms, each one associ-
                                                                 NeuralNetwork: This is the classical baseline method
ated with one of the constraints that have been presented
                                                                 used in traffic forecasting based on a neural network archi-
previously:
                                                                 tecture, described for instance in (Dougherty & Cobbett,
                                                                 1997).
  • Term 1 aims at simultaneously learn z and a function
                                                      (t)        SAE: This is the method described in (Lv et al., 2014).
    fθ - called decoding function - such that, from zi ,
                                         (t)
    fθ can be used to predict the value xi . The function        We also compare RAINSTORM with a model based on a
         (t)                                                     heuristic able to perform both completion and prediction
    fθ (zi ) is defined as fθ : RN → X . ∆ is used to
                                              (t)
                                                                 that we call RoadMean and can be described as follow:
    measure the error between predicting fθ (zi ) instead        this model predicts and fills missing value with the mean
         (t)   (t)
    of xi , mi playing the role of a mask restricting to         of observed values on the sequence.
    compute this function only on the observed values.
                                        (.)
  • Term 2 aims at finding values zi and a dynamic
                                             (t)
                                                                 3.2. Experiments and Results
    model hγ such that, when applied to zi , hγ allows us
    to predict the representation of the next state of time      We compare our model with baselines approach for the two
                   (t+1)
    series i i.e. zi     . hγ is the dynamic function which      tasks of completion and prediction. Results are reported in
    models the dynamicity of each series directly in the         Table 1. and Table 2.
    latent space: hγ : RN → RN . The parameters γ will           Table 1. Prediction at T + 1, comparison between described base-
    be learned to minimize the mean square error between         lines models and the RAINSTORM model for different size of
                           (t)       (t+1)
    the prediction hγ (zi ) and zi         .                     latent space N with a root mean square error (RMSE)
                                                                     N    Model/Dataset     Beijing         Warsaw
  • At last, term 3 corresponds to a structural regular-                                    Volume      Volume Speed
    ity over the graph structure that encourages the model
                                                                           RoadMean          5.51        5.09    11.02
    to learn closer representations for time series that are
                                                                          NeuralNetwork      4.77        4.27     8.05
    related. This will force the model to learn representa-
                                                                              SAE            4.75        4.27     7.85
    tions that reflect the structure of the considered graph.
                                                                     5    RAINSTORM          4.82        4.28     7.74
λdyn and λstruct are manually defined coefficients that              10   RAINSTORM          4.78        4.20     7.21
weight the importance of the different elements in the loss          20   RAINSTORM          4.54        4.21     7.19
function.                                                            50   RAINSTORM          4.66        4.20     7.60

The learning problem aims at minimizing the loss function        Table 2. Completion for 50% missing data, comparison between
L(θ, γ, z) simultaneously on θ, γ and z. By restricting the      described baselines models and the RAINSTORM model for dif-
fθ and hγ to be continuous derivable functions, we can use       ferent sizes N of the latent space with a root mean square error
gradient-descent based optimization approaches.                  (RMSE)
                                                                     N    Model/Dataset    Beijing         Warsaw
3. Traffic Forecasting and Experiments                                                     Volume      Volume Speed
                                                                           RoadMean         5.55        5.00    11.10
Experiences have been made on two datasets from the                           MF            3.58        3.16     6.80
cities of Beijing and Warsaw. The dataset are provided by                   MF-Geo          3.24        2.99     6.49
(Zheng, 2011) and (ICDM) and are not described here for              5    RAINSTORM         2.99        3.12     6.49
sake of space.                                                       10   RAINSTORM         3.03        3.00     6.24
                                                                     20   RAINSTORM         3.22        2.94     6.23
3.1. Models                                                          50   RAINSTORM         2.97        2.93     6.70
We propose to compare the RAINSTORM approach to the
following baseline models, some baselines being used for         4. Conclusion
data completion, and some others for prediction. For the
completion problem we consider the following models:             We have presented a new way to learn over incomplete
MF: This correspond to the classical matrix factorization        multiple sources of temporal relational data sources. The
framework for data completion.                                   RAINSTORM approach is based on representation learn-
MF-with geographic context: This method is the one               ing techniques and aims at integrating in a latent space the
named TSE (traffic speed estimation) in (Shang et al.,           observed information, the dynamicity of the sequences of


                                                                3e
                                Car-traffic forecasting: A representation learning approach

data, and their relations. In comparison to baselines mod-
els that have been developed for prediction only or com-
pletion only, our approach shows interesting performance
and is able to simultaneously complete missing values and
predict the future evolution of the data.

References
Dougherty, Mark S and Cobbett, Mark R. Short-term inter-
  urban traffic forecasts using neural networks. Interna-
  tional journal of forecasting, 13(1):21–31, 1997.

ICDM. I.E.E.E icdm contest: Tomtom traffic prediction for
  intelligent gps navigation.
Lv, Yisheng, Duan, Yanjie, Kang, Wenwen, Li, Zhengxi,
  and Wang, F-Y. Traffic flow prediction with big data: A
  deep learning approach. 2014.

Shang, Jingbo, Zheng, Yu, Tong, Wenzhu, Chang, Eric, and
  Yu, Yong. Inferring gas consumption and pollution emis-
  sion of vehicles throughout a city. In Proceedings of the
  20th ACM SIGKDD international conference on Knowl-
  edge discovery and data mining, 2014.
Zheng, Yu.   T-drive trajectory data sample, August
  2011.    URL http://research.microsoft.
  com/apps/pubs/default.aspx?id=152883.


                                                              3d

</pre>