<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dynamic Gated Spatial Temporal Graph Neural Networks for Traffic Forecasting</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ziyan Gui</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Changhui Liu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Li Xiong</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zuoquan Xie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liang Wu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Central China Normal University</institution>
          ,
          <addr-line>Wuhan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Southwestern University of Finance and Economics</institution>
          ,
          <addr-line>Chengdu</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Wuhan Institute of Technology</institution>
          ,
          <addr-line>Wuhan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <fpage>187</fpage>
      <lpage>193</lpage>
      <abstract>
        <p>Traffic forecasting is crucial to intelligent transportation system, and very challenging due to the uncertainty and complexity of spatial-temporal dependencies in real-world traffic network. Many existing approaches use the pre-defined graph to model spatial correlations, but they fail to capture the latent spatial evolution. Then some dynamic graph-based methods are proposed to address this issue, however they separately model spatial and temporal dependencies without internal connection. In this paper, we propose a novel Dynamic gated Spatial Temporal Graph Neural Network (DSTGNN) for traffic forecasting, which can capture time-varying spatial correlations and temporal dependencies jointly. Besides, we apply gate mechanism into residual connection between extracted spatial and temporal features. Experimental results on two real-world traffic datasets have demonstrated the effectiveness of DSTGNN, and that the proposed DSTGNN can compete with state-of-the-art baselines.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Traffic forecasting</kwd>
        <kwd>DSTGNN</kwd>
        <kwd>Spatial-Temporal correlation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction 1</p>
      <p>
        As a core component of Intelligent Transportation System (ITS)[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], real-time and accurate traffic
forecasting is curial for road resource planning and public traffic safety. The key of traffic forecasting
is to capture dynamic and uncertain spatial-temporal dependencies from historical data.
      </p>
      <p>
        In early deep learning approaches, convolutional neural networks (CNNs) are used to extract spatial
correlations and RNNs are used to model temporal dependencies. However, CNNs are only suitable for
capturing spatial features in grid-data and perform poorly in non-Euclidean space. Recently, graph
neural networks (GNNs)[
        <xref ref-type="bibr" rid="ref1">1,12,13</xref>
        ] have been generalized as convolution on graph-based data, which
can extract the intrinsic spatial topological information of graphs.
      </p>
      <p>
        To model both temporal and spatial dependencies, many early approaches [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]combined GNNs with
RNN-based sequence models and achieved improvement. However, most of them model spatial
correlations based on predefined static graph and cannot capture spatial dynamics. With the advent of
self-attention in Transformer[13]spatial and temporal attention are adopted in these methods[
        <xref ref-type="bibr" rid="ref11 ref6 ref9">6,9,11</xref>
        ]to
model dynamic spatial-temporal correlations and improved a lot. However, spatial-temporal
correlations are not captured interactively, resulting in learning irrelevant or redundant information.
      </p>
      <p>In this paper, we proposed an end-to-end model named Dynamic gated Spatial Temporal Graph
Neural Network (DSTGNN), which models spatial-temporal dependencies jointly to address the above
issues. Specifically, we stack multiple the proposed dynamic gated spatial-temporal(DGST) blocks to
extract spatial-temporal features from historical traffic data. Each DGST block consists of a dynamic
graph convolution module for extracting dynamic and static spatial correlations, a temporal attention
module for modeling time dynamics and a gated residual connection for interaction between extracted
spatial and temporal features. The contribution of this paper can be summarized as:
• We propose the dynamic gated spatial-temporal block that jointly models the spatial and
temporal dependencies of traffic data, and then make prediction in a non-autoregressive way.
• Experiments on two real-world traffic datasets are conducted to evaluate the effectiveness of our
model, and results show that our model can compete with the state-of-the-arts.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
    </sec>
    <sec id="sec-3">
      <title>2.1. Graph Convolutional Network</title>
      <p>
        As an efficient variant of CNNs on graph-structured data, graph convolutional networks (GCNs)
have been applied into various areas and achieved state-of-the-art results. In the spectral perspective
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], GCNs need to compute the eigen-decomposition of the Laplacian matrix, which leads to a huge
consumption of computation resources. Subsequently, methods[12,14]based on Chebyshev polynomial
approximation are proposed to improve the computing efficiency. In addition, GCNs based on the
spatial perspective[15,16]not only avoid the eigen-decomposition of Laplacian matrix but also can learn
the vertices representations inductively.[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]proposes STGCN combined GCN with standard temporal
1D CNN to tackle the traffic time series prediction. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]proposes Graph WaveNet which learns static
adjacency matrices for spatial-temporal modeling but failed to capture dynamic spatial correlations.
2.2.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Attention based Traffic forecasting</title>
      <p>
        Attention mechanism has been extensively utilized in many domains such as natural language
processing and graph learning (GAT[16]). Recently, researchers apply attention mechanism to model
spatial-temporal dependencies for traffic forecasting. GMAN[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proposes the ST-Attention block,
which adds a transform attention layer between the encoder and decoder and models dynamic
spatialtemporal correlations in both the encoder and decoder. For learning spatial-temporal features,
ASTGCN[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]employs attention mechanisms in the spatial and temporal dimensions, respectively. A
general dynamic graph neural network is created by STTN[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]to model the spatial dependencies that
change over time. LSGCN[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]combines GCN with graph attention for long short-term traffic prediction.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3. Methodology</title>
    </sec>
    <sec id="sec-6">
      <title>3.1. Problem Definition</title>
      <p>In this study, a traffic network is denoted as a directed weighted graph G = (V , E, A) , where V is a set
of N =| V | nodes representing sensors in traffic network; E is a set of edges indicating the connectivity
among the nodes; and A ∈ N×N is the weighted adjacency matrix of graph G representing the proximity
measured by the Euclidean distances between sensors via Gaussian kernel.</p>
      <p>At each time step t , the traffic conditions can be represented as a graph signal Xt ∈ N×C on graph G ,
where C is the number of traffic conditions observed such as traffic speed, traffic density and so on.</p>
      <p>Given a traffic network graph G and traffic conditions of historical P time steps Xt−P+1,, Xt ∈ P×N×C
observed by the N nodes, we aim to learn a function f to forecast the traffic conditions of the next F
time steps over all nodes. The process can be formulated as:</p>
      <p>Xˆ t+1,, Xˆ t+F = f ( X t−P+1,, X t ;G ) ,
(1)
where  Xˆ t+1,, Xˆ t+F  ∈ F×N×C .
3.2.</p>
    </sec>
    <sec id="sec-7">
      <title>Overall Architecture</title>
      <p>The overall architecture of the proposed DSTGNN is shown in Figure 1, which consists of three
main components including input layer, multiple stacked dynamic gated spatial-temporal (DGST)
blocks and output layer. We first use two-layer fully-connected network before traffic data enters the
model to project the data to high-dimension space, then the multiple stacked DGST blocks extract
spatial and temporal dependencies jointly from input. Finally, the output layer transforms the features
with spatial-temporal information from high-dimension space to traffic speed. Besides, DSTGNN
predicts future traffic conditions in a multi-step manner. The detailed modules will be introduced later.</p>
    </sec>
    <sec id="sec-8">
      <title>Dynamic Graph Convolution Module</title>
      <p>(2)
(3)
module (for DGC module in the first DGST block, X S(T0) = X′S ). We first view the temporal dimension of
X′S as a batch dimension, and then X′S is transformed to the query, key and value subspaces by three
different linear projection, obtaining QS′KS ,VS ∈ P×N×d ，formally:</p>
      <p>QS = X S(lT−1)Wq , KS = X S(lT−1)Wk ,VS = X S(lT−1)W ,
v
where Wq ,Wk ,Wv ∈ d×d are all learnable weight matrix, X S(Tl−1) ∈ P×N×d is input of DGC module in lth
DGST block, and d is the dimension of feature space.</p>
      <p>Next, we use ReLU (⋅) to sparse the dense adjacency matrix derived from dot product as follows:
A = Re LU (QS KST ) + I N ,
where A ∈ N×N is the sparse spatial correlation matrix, IN is an identity matrix, and add it to enhance
the self-connection ability. Furthermore, to capture both dynamic and static spatial correlations, we
combine A and the predefined adjacency matrix A by element-wise product with broadcasting
mechanism. A softmax(⋅) function is then adopted to normalize the combined adjacency matrix to avoid
gradient disappearing or explosion, the final graph convolution operation can be expressed formally as:</p>
      <p>HS(l) = softmax( A  A)VS ,
element-wise product.
3.4.</p>
    </sec>
    <sec id="sec-9">
      <title>Gated Residual Connection</title>
      <p>As shown in Figure 1, we apply a gate function to regulate the flow of residual information, drawing
inspiration from the gate mechanism in the Gated Recurrent Unit (GRU). How much the previous
residual information can influence the following module is regulated by this gate mechanism.
Specifically, a gated residual shortcut path is added to DGC module, fusing the spatial correlation of
the output HS(l) and the input's spatial-temporal features X S(Tl−1) ∈ P×N×d in the lth DGST block with X S(T0) = X′S ,
which can be formulated as:
(4)
(5)
(6)
(7)
where XT(l) ∈ P×N×d is the following temporal attention module's input, gs ∈ P×N×d denotes the gate,
Wres ∈ d×d denotes the linear projection weight, and Ug ∈ d×d denotes the state-to-state weight matrix, 
is the element-wise product, and  stands for the  ⋅ non-linear activation function.</p>
      <p>Similar to DGC module, we also added a gated residual shortcut path to the following temporal
attention module, which combines the spatial correlation of input XT(l) with the temporal dependence of
output HT(l) , building an interaction between the extracted spatial and temporal information that can be
learned to extract spatial-temporal representations X S(Tl) ∈ P×N×d . This process can be written as：
where HT(l) ∈ P×N×d is output of the following temporal attention module, and gt is the gate function.
where HS(l) ∈ P×N×d denotes the output of the DGC module in the lth DGST block, and  stands for
QT = XT(l)Wq′ , KT = XT(l)Wk′ ,VT = XT(l)Wv′ ,</p>
      <p> 
VT′ = LN  softmax  QT KTT VT + XT  ,</p>
      <p>  d  
HT(l) = LN (FFN (VT′ ) + VT′ ) ,
X S(lT) = gt  HT(l) + XT(l)Wres ,</p>
      <p>gt =σ ( XT(l)U g ) ,
3.5.</p>
    </sec>
    <sec id="sec-10">
      <title>Temporal Attention Module</title>
      <p>The temporal attention module takes the output XT(l) ∈ P×N×d of gated residual connection in DGC
module as input and views the spatial dimension of XT(l) as a batch dimension. Then it uses the
transformer-based encoder to model temporal dependence from XT(l) with spatial correlation. It mainly
consists of attention aggregation and FFN refining, with the former highlighting relative temporal cues
and the latter updating refined features. The output tensor HT(l) ∈ P×N×d of the temporal attention module
can be computed as follows:
where Wq′，W′，Wv′ ∈ d×d are all weight matrix, LN indicates layer normalization, and FFN is the feed
k
forward network. The above calculation can also be extended in a multi-head manner.
3.6.</p>
    </sec>
    <sec id="sec-11">
      <title>Loss Function</title>
      <p>The output layer regards the last DGST block's output X S(Tl) ∈ P×N×d as input, then transforms it to final
prediction results Yˆ ∈ F×N×C . The proposed DSTGNN can be trained in an end-to-end style via
backpropagation by minimizing the mean absolute error between predicted values and ground truths:
(Θ) =
1 F N</p>
      <p>  Yˆt,n − Yt,n</p>
      <p>F × N t=1n=1
where Θ denotes all the learnable parameters in our model, and  denotes the ground truth.
(8)</p>
    </sec>
    <sec id="sec-12">
      <title>4. Experiments</title>
    </sec>
    <sec id="sec-13">
      <title>4.1. Experimental Settings</title>
    </sec>
    <sec id="sec-14">
      <title>4.1.1. Datasets</title>
      <p>
        As detailed below, we evaluate our DSTGNN on two public real-world traffic datasets, namely
PeMSD7 and PEMS-BAY. PeMSD7 collects 2-month traffic data on 228 sensors during the weekdays
of May and June 2012 in California's District 7. PEMS-BAY contains six months of traffic data from
325 sensors in the Bay area from January 1st to May 31st, 2017. Following STGCN[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], PeMSD7 uses
the first 34 days as a training set and the remaining days as a validation and test set. As with DCRNN[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
PEMS-BAY is divided into three sets: training (70%), validation (10%), and test (20%).
      </p>
    </sec>
    <sec id="sec-15">
      <title>4.1.2. Evaluation Metric</title>
      <p>Metrices. To evaluate the performance of our model, we employ three metrics: Mean Absolute Error
(MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE).</p>
      <p>
        Baselines. Our DSTGNN is compared to the following baselines: ARIMA[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], FC-LSTM[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
DCRNN[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], STGCN[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], GMAN[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], ASTGCN[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], STTN[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Graph WaveNet[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], LSGCN[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
4.2.
      </p>
    </sec>
    <sec id="sec-16">
      <title>Experimental Results and Analysis</title>
      <p>
        Table 1 shows the experimental performance of DSTGNN and baselines for 15, 30 and 60 minutes
ahead prediction on the PeMSD7 and PEMS-BAY datasets. The prediction performance comparison
results show that DSTGNN can compete with the state-of-the-art methods in both long-term and
shortterm predictions on both datasets, while outperforming predefined graph-based spatial-temporal models,
namely STGCN[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]and DCRNN[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In terms of short-term prediction (&lt;= 30 min), DSTGNN outperforms STTN[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]and GMAN[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]on
PEMS-BAY but falls short of Graph WaveNet[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. It is superior than Graph WaveNet in long-term
prediction (60 min), and competitive with STTN, weaker to GMAN. On PeMSD7, DSTGNN displays
similar results. This fact specifies that DSTGNN performs better in long-term forecasting due to its use
of gated residual connections in DGST block, which sufficiently incorporates related spatial-temporal
information and alleviates the accumulation of errors over time.
      </p>
    </sec>
    <sec id="sec-17">
      <title>5. Conclusion</title>
      <p>In this paper, we propose a novel framework named dynamic gated spatial temporal graph neural
network (DSTGNN) for long and short-term traffic forecasting. In DSTGNN, we adopt the DGC
module to precisely integrate both static and dynamic spatial correlations and at the same use temporal
attention module to capture evolution cues in time series. Besides, we add the gated residual connection
to the proposed DGST block for fusing extracted spatial and temporal features. The experiments on two
real traffic datasets verifies the effectiveness of DSTGNN on modeling spatial-temporal correlations
from time series. In the future, we will consider our DSTGNN for more general spatial-temporal
structural graph sequence forecasting tasks, such as preference prediction in recommendation systems.</p>
    </sec>
    <sec id="sec-18">
      <title>6. Acknowledgements</title>
      <p>This work was supported by Wuhan Institute of Technology under Grant No.CX2021277.
7. References
[12] Micha ̈el Defferrard, Xavier Bresson, and Pierre Van dergheynst. Convolutional neural networks
on graphs with fast localized spectral filtering. Advances in neural information processing systems,
29, 2016.
[13] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information
processing systems, 30, 2017.
[14] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional
networks. arXiv preprint arXiv:1609.02907, 2016.
[15] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs.</p>
      <p>Advances in neural information processing systems, 30, 2017.
[16] Petar Veliˇckovi ́c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua
Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Bruna</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Zaremba</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Szlam</surname>
            , and
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Lecun</surname>
          </string-name>
          .
          <article-title>Spectral networks and locally connected networks on graphs</article-title>
          .
          <source>Computer Science</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Rongzhou</given-names>
            <surname>Huang</surname>
          </string-name>
          , Chuyin Huang, Yubao Liu, Genan Dai,and
          <string-name>
            <given-names>Weiyang</given-names>
            <surname>Kong</surname>
          </string-name>
          .
          <article-title>Lsgcn: Long shortterm traffic prediction with graph convolutional networks</article-title>
          .
          <source>In IJCAI</source>
          , pages
          <fpage>2355</fpage>
          -
          <lpage>2361</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Yaguang</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Rose</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Cyrus</given-names>
            <surname>Shahabi</surname>
          </string-name>
          , and Yan Liu.
          <article-title>Diffusion convolutional recurrent neural network: Data-driven traffic forecasting</article-title>
          .
          <source>arXiv preprint arXiv:1707</source>
          .
          <year>01926</year>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Usue</given-names>
            <surname>Mori</surname>
          </string-name>
          , Alexander Mendiburu, Maite ́Alvarez, and Jose A Lozano.
          <article-title>A review of travel time estimation and forecasting for advanced traveller information systems</article-title>
          .
          <source>Transportmetrica A: Transport Science</source>
          ,
          <volume>11</volume>
          (
          <issue>2</issue>
          ):
          <fpage>119</fpage>
          -
          <lpage>157</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Bing</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Haoteng</given-names>
            <surname>Yin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Zhanxing</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <article-title>Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting</article-title>
          .
          <source>arXiv preprint arXiv:1709.04875</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Chuanpan</given-names>
            <surname>Zheng</surname>
          </string-name>
          , Xiaoliang Fan, Cheng Wang, and
          <string-name>
            <given-names>Jianzhong</given-names>
            <surname>Qi</surname>
          </string-name>
          .
          <article-title>Gman: A graph multi-attention network for traffic prediction</article-title>
          .
          <source>In Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>34</volume>
          , pages
          <fpage>1234</fpage>
          -
          <lpage>1241</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Spyros</given-names>
            <surname>Makridakis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michele</given-names>
            <surname>Hibon</surname>
          </string-name>
          .
          <article-title>Arma models and the box-jenkins methodology</article-title>
          .
          <source>Journal of forecasting</source>
          ,
          <volume>16</volume>
          (
          <issue>3</issue>
          ):
          <fpage>147</fpage>
          -
          <lpage>163</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , Oriol Vinyals, and Quoc V Le.
          <article-title>Sequence to sequence learning with neural networks</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>27</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Shengnan</given-names>
            <surname>Guo</surname>
          </string-name>
          , Youfang Lin, Ning Feng, Chao Song, and
          <string-name>
            <given-names>Huaiyu</given-names>
            <surname>Wan</surname>
          </string-name>
          .
          <article-title>Attention based spatialtemporal graph convolutional networks for traffic flow forecasting</article-title>
          .
          <source>In Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>33</volume>
          , pages
          <fpage>922</fpage>
          -
          <lpage>929</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Zonghan</surname>
            <given-names>Wu</given-names>
          </string-name>
          , Shirui Pan,
          <string-name>
            <given-names>Guodong</given-names>
            <surname>Long</surname>
          </string-name>
          , Jing Jiang, and Chengqi Zhang.
          <article-title>Graph wavenet for deep spatial-temporal graph modeling</article-title>
          .
          <source>arXiv preprint arXiv:1906.00121</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Mingxing</surname>
            <given-names>Xu</given-names>
          </string-name>
          , Wenrui Dai, Chunmiao Liu, Xing Gao,
          <string-name>
            <given-names>Weiyao</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Guo-Jun Qi</surname>
            , and
            <given-names>Hongkai</given-names>
          </string-name>
          <string-name>
            <surname>Xiong</surname>
          </string-name>
          .
          <article-title>Spatial- temporal transformer networks for traffic flow forecasting</article-title>
          .
          <source>arXiv preprint arXiv:2001.02908</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>