<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>c Forecasting Using PaddlePaddle?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>PaddlePaddle GPU</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Krasovskii Institute of Mathematics and Mechanics</institution>
          ,
          <addr-line>Yekaterinburg</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ural Federal University</institution>
          ,
          <addr-line>Yekaterinburg</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>102</fpage>
      <lpage>111</lpage>
      <abstract>
        <p>Tra c forecasting problem is considered. A new tra c prediction algorithm is designed. The algorithm based on an original deep neural network model is implemented with PaddlePaddle deep learning framework using a long-short-term memory layer to improve the prediction accuracy. All experiments have been performed on Ural Federal University cluster with Nvidia Tesla K20 GPUs.</p>
      </abstract>
      <kwd-group>
        <kwd>forecasting</kwd>
        <kwd>deep learning</kwd>
        <kwd>LSTM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In this paper, we describe the problem as it was stated and present our
approach and results.
1.1</p>
      <sec id="sec-1-1">
        <title>Existing Solutions Overview</title>
        <p>Among the parametric methods, one of the most successful is ARIMA2, which
generated a whole class of methods (ARIMA with own subset, seasonal ARIMA,
ARIMA with exogenous factors, ARIMA with Kohonen maps, vector ARIMA).
All these methods are based on the assumption of stationary dispersion and mean
of time series. The ARIMA method shows better accuracy than predecessors in
predicting short-term tra c changes on highways.</p>
        <p>Parametric models have a number of advantages. First, such models are easy
to build and understand. Second, the solution is simpler and takes small amount
of computational time. However, due to the nonlinearity and stochastic nature of
the tra c, the parametric models are not able to take into account the uniqueness
of data of this nature in the whole and have a large prediction error in comparison
with nonparametric models.</p>
        <p>
          Recently, ITS3 have started to utilize full-connected architectures of deep
training models for predicting short-term tra c ow. Researchers of this eld
have built a DNN4 to capture the space-time features of the transport stream and
developed a multi-tasking architecture for forecasting stationary and dynamic
road tra c [
          <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
          ]. Other researchers suggested using the SAE5 model based on
predictions of short-term tra c ow [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. These approaches allowed one to
accurately predict the future transport ow to some extent, however, they did not
use the local topology of the road network and long-term data on the transport
ow, which signi cantly reduced their predicting capabilities.
        </p>
        <p>
          A graph-based neural network model was also developed and showed an
improvement in predicting long-term dependencies while taking into account
spatial data features [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. However, such a model gave low accuracy in forecasting
short term tra c.
        </p>
        <p>
          In recent studies, a model was developed that combines the architectures of
a convolutional neural network and LSTM6, showing a slight improvement in
accuracy with regard to spatial features [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The convolutional neural network
layer processed spatial features, and several layers of LSTM processed short-term
variations and the frequency of the transport stream.
2 Autoregressive integrated moving average
3 Intelligent transportation systems
4 Deep neural network
5 Stacked autoencoder
6 Long short-term memory recurrent neural network
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Data Samples Representation</title>
        <p>A city can be viewed as a set of connected roads, each road at any given time has
a numerical congestion characteristic Xui;t 2 f0; 1; 2; 3; 4g, i.e. a number which
represents how \severe" the congestion on the current road is (see Table 1).</p>
        <p>It may look like the tra c characteristic has been simpli ed too much, but in
this case we nd it more suitable than some real physical quantity like average
speed because of the next reasons:
{ the tra c forecasting results in this particular case is targeted for human
use (road users themselves). We nd a short-scale congestion characteristic
is much more intuitive for people because it is easy to understand and, most
importantly, easy to compare current road condition to a \normal" tra c or
to what it was like before;
{ the congestion characteristic incorporates roads' parameters such as speed
limits and road quality. For example, average speed of 40km/h can be
considered good in busy downtown or on a eld road, but it is absolutely inadequate
for a highway. So in the rst case the congestion value can be de ned as 1
and in the second as 3, even though the average speed is the same. So, users
do not need to take into consideration any additional parameters; they can
understand how \good" or \bad" the tra c on the particular road is right
away;
{ the collected data samples are usually not evenly distributed throughout the
time, which can introduce instability to the system. For example, if speed
data is acquired through the drivers' cellphones, amount of collected data
is proportional to the amount of drivers that decided to drive through the
particular road. Coarsening the data, we are getting rid of its uctuations
and making it easy to interpolate in the case of insu cient data.</p>
        <p>Additionally to the collected time-dependent tra c data, we also consider
road connectivity information, which is represented by the oriented graph G(V; A),
where V is a road set, A is a set of ordered pairs of vertices ui; uj 2 V denoting
an intersections of roads.</p>
      </sec>
      <sec id="sec-1-3">
        <title>Metric</title>
        <p>In order to be able to compare di erent prediction results and reduce the task
to a minimization problem, a representative metric is to be chosen. In this case,
the results were evaluated by RMSE7. The RMSE is very common choice for
many minimization problems. While its main advantages are continuousness and
di erentiability, we also nd it very intuitive at representing of how \good" the
result is. Simply analyzing the construct of the problem, we can determine a few
things about RMSE: in the worst case scenario, when the prediction and target
as far away from each other as possible, RMSE = 3 (since Xactual;i 2 f1; 2; 3; 4g
and Xmodel;i 2 f1; 2; 3; 4g); in the best case scenario, RMSE = 0.
Now the forecasting problem can be reduced to the minimization problem of
nding m tra c states of ui node in V using n previous states
where Xui;t is observed value of ui node at the instant t, while Xui;t is predicted
value of ui node at the instant t.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Initial Data Analysis</title>
      <p>In the course of our work, we had only one data source for all the experiments;
but its spatial resolution was su cient to conduct a number of independent tests
(by splitting it to several non-overlapping training and testing samples). The size
of the whole provided data relates to the size of the prediction as 400 to 1.
3.1</p>
      <sec id="sec-2-1">
        <title>Data Format</title>
        <p>The data is aggregated in 5-min intervals each from 00:00 a.m. on March 1st to
8:00 a.m. on May 25th, 2016. Every measurement is denoted by four states, as
was described earlier. A tra c intensity map is shown in Fig. 1. Our task was to
predict the tra c in the following 2 hours from 8:05 a.m. to 10:00 on May 25th.
3.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Data Analysis</title>
        <p>Initial data contain several anomaly regions. There are: periodic absences of
data (white regions) from 5:00 a.m. Saturday to 5:00 a.m. on Monday (Fig. 1),
stochastic anomalies and nonuniformness of values (Fig. 2). Small anomalies
were approximated by neighbor values, but big regions were just removed from
the training dataset.</p>
        <sec id="sec-2-2-1">
          <title>7 Root-mean-square error</title>
          <p>No data</p>
          <p>Fluency</p>
          <p>Slow</p>
          <p>Congested</p>
          <p>Extreme congested
300
250
200
150
100
50
0
160
150
140
130
120
2016-03-08 00:00
2016-03-15 00:00
2016-03-22 00:00
2016-03-29 00:00
2016-04-05 00:00
2016-04-12 00:00</p>
          <p>2016-04-19 00:00
The initial data contain both useful data for training the neural network (tra c
congestion values) and the ller values (zeros) denoting the instants when there
is no data available. If you feed such data to the neural network input during
learning without preprocessing it, a good result is not to be expected, since
blocks of missing data will disrupt the learning process.</p>
          <p>In order to improve the quality of tra c forecasting, all the data gaps should
be eliminated. We can split this task in two stages: the elimination of large
periodic groups of gaps and the elimination of relatively-isolated gaps in random
places. In the case of periodic blocks, we simply cut these blocks out of the
original data and concatenate the remaining parts in such a way that there is
no gaps in timestamps of the day. The random data gaps are somewhat more
di cult to handle because they can arise at arbitrary places and have an
arbitrary length in time. The processing consists in interpolating such intervals with
averaged values from several closest surrounding points of known data. At the
top of Fig. 4, a part of initial data are shown, at the bottom the data are already
preprocessed; interpolated values are marked with red.</p>
          <p>After the preprocessing has been applied to the initial data, it has reduced
in size from, approximately, 85 days to 61 (due to 24 days of missing data).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Implementation</title>
      <sec id="sec-3-1">
        <title>Design of the Algorithm</title>
        <p>Proposed algorithm is based on deep recurrent neural network with long
shortterm memory layer. As shown in Fig. 5, the model consists of 5 layers: input
data (n neurons), full-connected layer (k neurons), LSTM layer (k neurons),
full-connected layer (4 neurons), and output data layer (4 neurons). The main
idea of the algorithm is that the intensity values of the neighbor nodes a ect the
current node and, therefore, one should consider those values to predict tra c
intensity of the current node.</p>
        <p>
          At each time instant, the neural network input is fed with the tra c
intensity values from the neighboring roads or from the entire graph (if computing
capabilities are su cient) at the previous point of time. Training (and
prediction) is conducted for the current road at m time points after the time point,
from which the data are fed to the input. All m points of time are predicted in
parallel. This can be seen in Fig. 5. The nal layer of the neural network outputs
a set of m values corresponding to each predicted instant. For implementation
of the neural network the PaddlePaddle [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] framework has been used.
Table 2 shows di erent model con gurations. There are 3 models denoted LST M ,
where is mean radius of neighboring nodes. Model LST Mv uses all the graph
nodes for training and predicting the following values; n is number of input
values (for each to be predicted node), k is number of hidden neurons (for each to
be predicted node), epoch is number of epochs, learning rate is learning rate.
During training, we used sliding window method to predict the next m values.
        </p>
        <p>Even though our approach is designed and tested using PaddlePaddle, the
reader is advised to keep in mind that there is just one of the many
implementations of the ANN8 algorithms, and all the described methods can be adapted for
any other ANN implementation without having any e ect on the output result
whatsoever.</p>
        <sec id="sec-3-1-1">
          <title>8 Arti cial neural network</title>
          <p>A new architecture of neural network and new preprocessing algorithm for
shortterm tra c forecasting were proposed. Experiments with di erent types of neural
network layers showed that the simple full-connected layers with one LSTM layer
yield the best result for the task. The constructed implementation provides the
task to be easily scaled in number of road graph nodes by limiting the radius
of neighboring nodes. The PaddlePaddle framework allowed one to utilize in
implementation the power of modern high-performance GPU solutions without
modifying the source code.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>ASC</given-names>
            <surname>Student Supercomputer</surname>
          </string-name>
          <article-title>Challenge</article-title>
          . http://www.asc-events.org/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Huang</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Xie</surname>
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Deep architecture for tra c ow prediction: deep belief networks with multitask learning</article-title>
          ,
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          . Vol.
          <volume>5</volume>
          , no.
          <issue>5</issue>
          , p.
          <volume>2191</volume>
          {
          <issue>2201</issue>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hinton</surname>
            <given-names>G. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osindero</surname>
            <given-names>S.</given-names>
          </string-name>
          , and Teh Y.-W.:
          <article-title>A fast learning algorithm for deep belief nets, Neural computation</article-title>
          . Vol.
          <volume>7</volume>
          , no.
          <issue>18</issue>
          , p.
          <volume>1527</volume>
          {
          <issue>1554</issue>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lv</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duan</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            <given-names>W.</given-names>
          </string-name>
          , Li
          <string-name>
            <given-names>Z.</given-names>
            , and
            <surname>Wang F</surname>
          </string-name>
          .-Y.:
          <article-title>Tra c ow prediction with big data: a deep learning approach</article-title>
          ,
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          . Vol.
          <volume>2</volume>
          , no.
          <issue>16</issue>
          , p.
          <volume>865</volume>
          {
          <issue>873</issue>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Shahsavari</surname>
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Short-term tra c forecasting: Modeling and learning spatio-temporal relations in transportation networks using graph neural networks: mscs</article-title>
          . University of California, Berkeley (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wu</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Short-term tra c ow forecasting with spatial-temporal correlation in a hybrid deep learning framework</article-title>
          . https://arxiv.org/pdf/1612.01022
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <article-title>PaddlePaddle: parallel distributed deep learning platform</article-title>
          . http://doc.paddlepaddle.
          <source>org/release doc/0</source>
          .9.0/doc/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>