<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>tional journal of forecasting</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Car-traffic forecasting: A representation learning approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ali Ziat</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriella Contardo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolas Baskiotis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ludovic Denoyer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lv</institution>
          ,
          <addr-line>Yisheng, Duan, Yanjie, Kang, Wenwen, Li, Zhengxi</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Shang</institution>
          ,
          <addr-line>Jingbo, Zheng, Yu, Tong, Wenzhu, Chang, Eric</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sorbonne Universite ́s</institution>
          ,
          <addr-line>UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1997</year>
      </pub-date>
      <abstract>
        <p>We address the problem of learning over multiple inter-dependent temporal sequences where dependencies are modeled by a graph. We propose a model that is able to simultaneously fill in missing values and predict future ones. This approach is based on representation learning techniques, where temporal data are represented in a latent vector space. Information completion (missing values) and prediction are then performed on this latent representation. In particular, the model allows us to perform both tasks using a unique formalism, whereas most often they are addressed separately using different methods. The model has been tested for a concrete application: cartraffic forecasting where each time series characterizes a particular road and where the graph structure corresponds to the road map of the city.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Traffic data has particular characteristics that can not be
fully handled by classical sequential and temporal models:
they contain multiple missing values, and one has to
consider simultaneously multiple sources that can be somehow
related, by spatial proximity for example.</p>
      <p>We propose a novel method that aims at integrating these
aspects in one single model.</p>
      <p>The proposed approach is
based on representation learning techniques aiming at
projecting the observations in a continuous latent space,
each sequence being modeled at each time-step by a
point in this space. It has many advantages w.r.t existing
techniques: it is able to simultaneously learn how to fill
missing values and to predict the future of the observed
temporal data, avoiding to use two different models, and
it naturally allows one to deal with information sources
that are organized among a graph structure.</p>
      <p>Moreover,
the model is based on continuous optimization schemes,
Proceedings of the 2 nd International Workshop on Mining Urban
authors. Copying permitted for private and academic purposes.
allowing a fast optimization over large scale datasets.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Context and Model</title>
      <sec id="sec-2-1">
        <title>2.1. Notations and Tasks</title>
        <p>Let us consider a set of n temporal sequences x1, ..xn such
that xi(t) ∈ X is the value of the i-th sequence at time t
defined by x</p>
        <p>i = (xi(1), .., xi(T )) . In the case where X is
Rm, the context corresponds to multiple multivariate time
series. The sequences contain missing values so we also
define a mask m(t) such that m(t) = 1 if value x(t) is
obi i i
served - and thus available for training the system - and
m(t) = 0 if x
i
(t) is missing - and thus has to be predicted
i
by the model. In addition, we consider that there exists a
set of relations between the sequences which correspond to
an external information, like spatial proximity for example
when X is discrete. The sequences are thus organized in a
graph G = {ei,j } such that ei,j = 1 means that xi and xj
are related, and ei,j = 0 elsewhere.
2.2. Model
The RepresentAtIoN-baSed TempOral Relational Model
(RAINSTORM) is a loss-based model which is described
through a continuous derivable loss function that will be
optimized using classical optimization techniques.
Let us define L(θ, γ, z) the loss function to minimize where
z is the set of all the vectors zi(t) for i ∈ [1..n] and t ∈
[1..T ], T being the size of the observed time windows i.e.
the history of the time series. We define L as:
L(θ, γ, z) =</p>
        <p>X mi(t)Δ(fθ(zi(t)), xi(t)) (term 1)
1
O
n
X</p>
        <p>T
i=1 t=1
n T −1
i=1 t=1</p>
        <p>T
i,j∈[1..N]2 t=1
+ λstruct</p>
        <p>X</p>
        <p>X ei,j ||zi(t) − zj(t)||2 (term 3)
(1)
where O is the number of observed values i.e. values such
that m(t) = 1.</p>
        <p>i
previously:
This loss function contains three terms, each one
associated with one of the constraints that have been presented
• Term 1 aims at simultaneously learn z and a function
fθ - called decoding function - such that, from zi(t),
fθ can be used to predict the value xi(t). The function
fθ(zi(t)) is defined as fθ : RN → X . Δ is used to
measure the error between predicting fθ(zi(t)) instead
of xi(t), m(t) playing the role of a mask restricting to
i
compute this function only on the observed values.
• Term 2 aims at finding values zi(.) and a dynamic
model hγ such that, when applied to zi(t), hγ allows us
to predict the representation of the next state of time
series i i.e. zi</p>
        <p>(t+1). hγ is the dynamic function which
models the dynamicity of each series directly in the
latent space: hγ : RN → RN . The parameters γ will
be learned to minimize the mean square error between
the prediction hγ (zi(t)) and zi(t+1).
• At last, term 3 corresponds to a structural
regularity over the graph structure that encourages the model
to learn closer representations for time series that are
related. This will force the model to learn
representations that reflect the structure of the considered graph.
λdyn and λstruct are manually defined coefficients that
weight the importance of the different elements in the loss
function.</p>
        <p>The learning problem aims at minimizing the loss function
L(θ, γ, z) simultaneously on θ, γ and z. By restricting the
fθ and hγ to be continuous derivable functions, we can use
gradient-descent based optimization approaches.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Traffic Forecasting and Experiments</title>
      <p>Experiences have been made on two datasets from the
cities of Beijing and Warsaw. The dataset are provided by
(Zheng, 2011) and (ICDM) and are not described here for
sake of space.</p>
      <sec id="sec-3-1">
        <title>3.1. Models</title>
        <p>We propose to compare the RAINSTORM approach to the
following baseline models, some baselines being used for
data completion, and some others for prediction. For the
completion problem we consider the following models:
MF: This correspond to the classical matrix factorization
framework for data completion.</p>
        <p>MF-with geographic context: This method is the one
named TSE (traffic speed estimation) in (Shang et al.,
used in traffic forecasting based on a neural network
architecture, described for instance in (Dougherty &amp; Cobbett,
SAE: This is the method described in (Lv et al., 2014).
We also compare RAINSTORM with a model based on a
heuristic able to perform both completion and prediction
that we call RoadMean and can be described as follow:
this model predicts and fills missing value with the mean
of observed values on the sequence.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experiments and Results</title>
        <p>We compare our model with baselines approach for the two
tasks of completion and prediction. Results are reported in
described baselines models and the RAINSTORM model for
different sizes N of the latent space with a root mean square error
(RMSE)</p>
        <p>N</p>
        <p>Model/Dataset
5
10
20
50
5
10
20
50</p>
        <sec id="sec-3-2-1">
          <title>RoadMean</title>
          <p>NeuralNetwork</p>
          <p>SAE
RAINSTORM
RAINSTORM
RAINSTORM
RAINSTORM</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>RoadMean</title>
          <p>MF</p>
          <p>MF-Geo
RAINSTORM
RAINSTORM
RAINSTORM
RAINSTORM</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Beijing</title>
          <p>Volume
5.51
4.77
4.75
4.82
4.78
4.54
4.66</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>Beijing</title>
          <p>Volume
5.55
3.58
3.24
2.99
3.03
3.22
2.97
Volume
5.09
4.27
4.27
4.28
4.20
4.21
4.20
Speed
11.02
8.05
7.85
7.74
7.21
7.19
7.60</p>
        </sec>
        <sec id="sec-3-2-5">
          <title>Warsaw</title>
          <p>Volume
5.00
3.16
2.99
3.12
3.00
2.94
2.93
Speed
11.10
6.80
6.49
6.49
6.24
6.23
6.70</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>We have presented a new way to learn over incomplete
multiple sources of temporal relational data sources. The
RAINSTORM approach is based on representation
learning techniques and aims at integrating in a latent space the
observed information, the dynamicity of the sequences of
data, and their relations. In comparison to baselines
models that have been developed for prediction only or
completion only, our approach shows interesting performance
and is able to simultaneously complete missing values and
predict the future evolution of the data.
Dougherty, Mark S and Cobbett, Mark R. Short-term
interurban traffic forecasts using neural networks.
InternaICDM. I.E.E.E icdm contest: Tomtom traffic prediction for
intelligent gps navigation.
and Wang, F-Y. Traffic flow prediction with big data: A
deep learning approach. 2014.
Yu, Yong. Inferring gas consumption and pollution
emission of vehicles throughout a city. In Proceedings of the
20th ACM SIGKDD international conference on
Knowledge discovery and data mining, 2014.</p>
      <sec id="sec-4-1">
        <title>Zheng, Yu. T-drive trajectory data sample, August 2011. URL</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>