<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ow prediction for vehicle emission calculation based on graph convolutional networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peng Jiang</string-name>
          <email>jiangpenghz@163.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Igor Bychkov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jun Liu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tianjiao Li</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexei Hmelnov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Science and Technology Cooperation, Westlake University</institution>
          ,
          <addr-line>No.18, Shilong Mountain Street, Xihu District, Hangzhou</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Matrosov Institute for System Dynamics and Control Theory of Siberian Branch of Russian Academy of Sciences</institution>
          ,
          <addr-line>134 Lermontov st. Irkutsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Automation (Arti cial Intelligence), Hangzhou Dianzi University</institution>
          ,
          <addr-line>No.1158, Number Two Street, Jianggan District, Hangzhou</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>\the Belt and Road" Institute for Information Technology, Hangzhou Dianzi University</institution>
          ,
          <addr-line>No.115, Wenyi Road, Xihu District, Hangzhou</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Monitoring the distribution of vehicle exhaust emissions within the city is a very challenging problem since it is a ected by many complex factors, such as spatial-temporal correlation and the other environment conditions. In addition, the technology of using sensors to directly monitor vehicle exhaust emissions is still in the initial stage, and it is hard to implement direct monitoring in a large area. Thus, we use the existing environmental theory to measure the distribution of vehicle exhaust emissions in cities by tra c volume. In this paper, the problem we need to solve is how to use the data of sparse monitoring stations and inherent tra c network to infer the spatial-temporal distribution of tra c volume. In order to solve this problem, we propose a graph convolutional network model to extract the characteristics of tra c data and other features. We have done a lot of experiments on real tra c data sets. The experimental results show that the proposed method performs better than the existing methods.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the rapid growth of vehicle ownership in China, a mass of N Ox, CO, H C, P Mx and
other harmful gases emitted by vehicles have aggravated urban air pollution, resulting in the
deterioration of air quality and increasingly frequent haze weather. The precondition of vehicle
exhaust pollution control is e ective monitoring of them, so we need some monitoring means
to quantify vehicle exhaust emissions. However, it is di cult for us to measure the emission
of vehicles directly in a large area, so we calculate those emissions by the COPERT model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
which only needs to provide the urban context data and tra c status of each road section to
calculate the vehicle emissions. Urban context data can be obtained through statistics, while
tra c information must be obtained through real-time monitoring of stations which cannot be
deployed in all road segments.
      </p>
      <p>
        To determine the optimal location of new monitoring stations it is required to maximize
the inference performance of the tra c volume distribution model on the resulting monitoring
network. This seems to be a reasonable and practical idea. After all, the layout of monitoring
stations is very sparse. It is very important for us to accurately infer the tra c volume
distribution on the unobserved road segments using the data monitored by the existing stations.
However, without the monitoring data on the unobserved road segments, it is di cult for us to
know on which road segments the stations can be placed to maximize the inference accuracy.
To approximately achieve this, Hsieh et al propose a two-stage framework on deployment of
air quality monitoring stations, which uses the inference model to estimate the distribution of
air quality index (AQI), and then obtains the location of K new stations through the location
selection model to minimize the assessment uncertainty [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, this novel approach can't
be directly applied to our problem, since dividing the tra c network into several grids causes
the overlook of spatial correlation.
      </p>
      <p>In order to achieve the above purpose, we use graph convolutional neural network to deal
with this problem. It makes the training model of higher prediction accuracy and at the same
time of smaller uncertainty.</p>
      <p>The contributions of this paper are summarized as follows:
(i) The proposed approach is not only able to forecast the spatial-temporal distribution of
tra c volume but also to provide a basis for selecting the location of new stations and
maximizing the reliability of tra c inference.
(ii) We entirely use the graph convolution to learn spatial-temporal correlation of structured
time series.
(iii) We conduct extensive experiments on two real-world data sets. The MAE (mean absolute
error) and RMSE (Root Mean Square Error) of the inference model are 49:82 and 71:74
respectively, which outperforms the baseline methods.</p>
      <p>The other parts of this paper are as follows: the second section is the introduction of data and
features, the theory of graph convolutional neural network and the problem description of this
paper. The third section introduces the structure of spatial-temporal graph convolutional neural
network in detail. The fourth section presents the experimental results. Finally, the full text is
summarized and the future work is prospected.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data and methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Data description</title>
        <p>The data utilized in this paper comes from a competition about urban computing. In the data
set there are 35 roads with tra c ow records. Of them 27 roads are used to train the prediction
model and other 8 roads are utilized to test its performance. The data consists of the following
data sets:
(i) Road network features
(ii) Point of interests (POI) features
(iii) Speed pattern features
(iv) Weather features
(v) Time features
(vi) Volume Records</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Graph convolution</title>
        <p>Given an undirected graph G = (V; E ; A) with N vertices i 2 V, where E is the edge set and
A 2 RN N denotes the binary adjacency matrix. De errard et al built a graph convolution
de ned as:
g</p>
        <p>
          G x
where x 2 RN is the signal on the graph, G is the convolution operator, g denotes the spectral
1 1
lter, Le = m2ax L IN , L = IN D 2 AD 2 , Dii = Pj Aij , max denotes the largest eigenvalue
of L and k is the Chebyshev coe cient [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The Chebyshev polynomials Tk(x) are recursively
de ned as Tk(x) = 2xTk 1(x) Tk 2(x) with T1(x) = x and T0(x) = 1.
        </p>
        <p>
          Kipf et al proposed a rst-order approximate graph convolution operation [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], which simpli ed
this model by limiting K to 1 and approximating max by 2, which allows us to rewrite the
convolution the following way:
        </p>
        <p>max</p>
        <p>Then we constrain the number of parameters: let
normalization trick to the convolution matrix:
g</p>
        <p>G x
0x + 1(</p>
        <p>L</p>
        <p>IN )x</p>
        <p>0x
2</p>
        <p>1 1
1(D 2 AD 2 )x
=
0 =</p>
        <sec id="sec-2-2-1">
          <title>1 and further apply a</title>
          <p>(2)
(3)
(4)
which gives the following form of the matrix of the convolution operation:
g</p>
          <p>G x</p>
          <p>1 1 1 1
(IN + D 2 AD 2 )x = (De 2 AeDe 2 )x
where Ae = A + IN and Deii = Pj Aeij .</p>
          <p>The above de nition of graph convolution is extended to data with Cin input channels, i.e.,
X 2 RN Cin (each vertex is a Cin-dimensional feature vector), and the propagation rule of this
simpli ed model is given by:</p>
          <p>H(l+1) = (De 2 AeDe 21 H(l)W (l))
1
(5)
where H(l) is the output and W (l) is the trainable weight matrix of the lth layer, H(0) = X and
( ) is an activation function.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Methodology</title>
        <p>Our real purpose is to monitor the spatial-temporal distribution of vehicle exhaust emissions
in urban tra c network. However, the existing measurement technology is di cult to monitor
emissions directly in a large scale. Fortunately, given the spatial-temporal distribution of tra c
conditions and tra c network data, the distribution of emissions can be calculated by the
existing COPERT model, so that our goal turned to volume monitoring. Since the tra c
monitoring stations can't cover the whole city, we can only obtain tra c data of partial road
segments. Therefore, according to the urban context data, tra c speed and volume acquired by
established monitoring stations, we will infer the tra c volume of any road in the city at any
time stamp. Thereafter, spatial-temporal distribution of tra c volume can be further employed
to estimate the distribution of vehicle exhaust emissions according to the COPERT model.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Inference of the model</title>
      <p>In this section, we describe the structure of proposed model (called STGC-LD) in detail, which
includes spatial-temporal learning block, two attributes extraction block and a label distribution
learning block, as shown in Figure 1. The spatial-temporal learning block is employed to learn
the spatial correlations and temporal dependencies from tra c travel speed. First attribute
block is responsible for processing external factors (e.g. time of the day and weather), while
second attribute block is used to extract structural features of tra c network. These blocks are
all connected by residuals, which makes it easier for them to be added and deleted. Finally,
the label distribution learning block estimates the spatial-temporal distribution of tra c volume
within the city, but also reveal the con dence of its inference.</p>
      <sec id="sec-3-1">
        <title>3.1. Spatial-temporal learning block</title>
        <p>There is a certain correlation between tra c volume and travel speed, and nearby roads with
similar travel speed follow the same volume patterns in all probability. Accordingly, we design a
spatial-temporal learning block, containing a layer spatial graph convolution (SGC) and a layer
temporal graph convolution (TGC), to extract the spatial-temporal properties of travel speed,
which is a 3-dimensional structured time series.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.1.1. SGC for extracting spatial features.</title>
        <sec id="sec-3-2-1">
          <title>We deal with the adjacency matrix as:</title>
          <p>Ab = (Ae</p>
          <p>Wembed)
where Wembed is the learnable matrix that can be adjusted to a ect the degree of closeness, and
denotes the element-wise matrix product. Then we put Ab and Dbii = Pj Abij into the graph
convolutional network, and get the adaptive graph convolutional network as:
(6)
(7)
H(l+1) = (Db 2 AbDb 21 H(l)W (l))</p>
          <p>1
The above formula can adjust the weight of edges adaptively based on the graph structure and
the attributes of each vertex, and learn the in uence of di erent adjacent vertices.</p>
          <p>We set the travel speed to Attspeed 2 Rt n Cspeed and adjacency matrix of tra c network
to As 2 Rn n, where t, n, Cspeed are the number of time steps, the number of road segments
in the tra c network and dimension of the speed feature, respectively. The graph convolution
described above can only process two-dimensional data, but travel speed is a 3-dimensional
tensor. Hence, we share parameters on the time axis, that is, we do the same convolution on
each time stamp. After a convolution operation, the output Zs 2 Rt n Cout is de ned as:
ZSi = DcS
1
2 AcSDcS
1
2 AttispeedWS , i 2 f1; 2; :::; tg
(8)
Where Attispeed 2 Rn Cspeed , WS 2 RCspeed Cout is a kernel of spatial graph convolution.
3.1.2. TGC for extracting temporal features. Nowadays, although the model based on recurrent
neural network is widely used in time series analysis, its application in tra c forecasting task
still su ers from the complexity of gate mechanisms, time-consuming iterations and low response
to dynamic changes. Such networks cannot simulate very long-range temporal dependencies
(e.g. period and trend), and training becomes harder as depth increases. In this paper,
graph convolution is employed to encode the temporal correlation directly, avoiding the explicit
smoothing regularization in the loss function. Firstly, we need to construct an a nity graph for
the time series. Since the tra c volume does not change abruptly on the time axis and follows
a strong periodicity, we connect neighbor and periodic timestamps on the time series of each
road section to construct the time a nity graph. For a time stamp node Ti of a time series, the
time neighbors of the point can be expressed as
fTi p Pweek ; :::; Ti Pweek ; Ti p Pday ; :::; Ti Pday ; :::; Ti p; :::; Ti 1;</p>
          <p>Ti; Ti+1; :::; Ti+p; Ti+Pday ; :::; Ti+p Pday ; Ti+Pweek ; :::; Ti+p Pweek g (9)
where p is a super-parameter, Pday and Pweek represent the period of one day and one week
respectively. Besides, we set the temporal edge weights as 1.</p>
          <p>We transpose the output of the SGC to Q = ZST 2 Rn t Cout , and set the temporal adjacency
matrix as AT 2 Rt t. Then, we share parameters in the space, that is, we do the same
convolution on the time series for each vertex. After the convolution operation, the features
are mapped as follows:</p>
          <p>
            ZTi = DcT
1
2 AcT DcT
1
2 QiWT , i 2 f1; 2; :::; ng
(10)
where WT 2 RCout Cout is a kernel of the temporal graph convolution.
3.1.3. Spatial-temporal learning. In order to extract the spatial correlations and temporal
dependencies of structured sequences of data simultaneously, we design a spatial-temporal
Learning block which stacks a SGC layer and a TGC layer. Too many convolution layers
could converge the features of interconnected vertices to the same values [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. Moreover, layer
normalization is equipped with the spatial-temporal Learning block to prevent over tting. The
output of this block is denoted as XST .
          </p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.2. Attribute block 1</title>
        <p>
          In this block, we preprocess and integrate the weather features and time features. The time
range 6 : 00-23 : 00 is divided into 17 timeslots, each timeslot corresponds to an hour, namely
TimeAtt 2 f1; 2; :::; 17g. Since the dimension of TimeAtt is large, the one-hot coding would lead
to a high computing cost, so we adopt the embedding method to transform these categorical
features into low-dimensional vectors. Speci cally, the embedding method is to multiply each
categorical value 2 R1 C by a learnable parameter matrix W 2 RC O. Usually we have
O C, so that the embedding method can e ectively reduce the dimension of input features
and make model calculation more e cient. Furthermore, a signi cant property of embedding
method is that the categorical values with similar semantic meaning are usually very close in
the embedding space [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The output of this block is denoted as XAtt.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.3. Attribute block 2</title>
        <p>
          Tra c network attributes mainly include road network structure, road section features, POI
features, etc. We utilize the embedding method to process the number of lanes, road grade
and other categories of road network features, and normalize the road length, POI features and
so on. Then, the preprocessed features are concatenated and fed into SGC to extract spatial
correlation. In our model, we connect blocks by residuals to make them easier to add and
remove. He et al has shown that training the neural networks with residual connections is easier
and more robust [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The output of this block is denoted as XNet.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>3.4. Label distribution learning block</title>
        <p>
          We adopt label distributed learning [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] on a single model. The input of LDL is set to X 2 Rt n d,
where t, n, d are the number of time steps, the number of road segments and the number of
feature dimensions of each road respectively. The task of LDL is to estimate the tra c volume
distribution vector y( ; j) = fy0; y1; :::; yqmax g 2 Rqmax+1 of a road v at a timestamp j, where
qmax is determined by the maximum average tra c volume per lane in the training data. In
this problem, we quantify the real volume value from existing station as a normal distribution
vector, whose expectation is the real value and variance is a super-parameter. Then, the model
is learned by minimizing the symmetric Kullback-Leibler divergence of the estimated and the
observed label distributions:
        </p>
        <p>t
LossL = min 1 X
t
j 2 jLj 2L i=0
1</p>
        <p>qmax</p>
        <p>X X KL( ; j)[i]</p>
        <p>KL( ; j)[i] = y( ; j)[i] log yb( ; j)[i] + yb( ; j)[i] log y( ; j)[i] (12)
where L is a set of observed roads, yb( ; ) is the estimated label distribution. If we need to
know the speci c value of tra c volume, we can compute the expectation of the probability
distribution vector, namely:
qmax
X iyb( ; )[i]
i=0
(11)
(13)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Inferring performance comparison</title>
        <p>To demonstrate the e ectiveness of proposed inference model and deployment model, we further
compare them with several existing approaches using the real tra c data described in the
Section 2.1. The parameters of all the models are ne-tuned through the grid search. In the
following experiments, we repeat each of them 50 times to obtain the average results.
4.1.1. Training data usage. The tra c network contains 793 road segments, in which 35 road
segments are equipped with loop detectors (i.e. have volume values), while the remaining roads
are unknown. The tra c volumes were collected every hour from March 16 to April 1, 2016 (17
days in total) and sampled each day from 6 : 00 to 23 : 00. In the experiment, we randomly
divide the set of 35 road sections into the two subsets of 27 and 8 roads, the former contain
27 17 17 instances which are used as the training set and the latter contain 8 17 17 instances
to be used as the testing set. All the experiments had been repeated 50 times and the training
and testing sets were randomly shu ed in each repetition.
4.1.2. Model settings. For the inference model each fully connected layer has 64 channels. The
temporal neighbor parameter is set to 3, and the variance of the normal distribution in LDL is
set to 2. Besides, we set the initial learning rate as 10 3 with a decay rate of 0:9 after every 40
epochs.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Competitors</title>
        <p>(i) Gradient Boosting Decision Tree (GBDT). In our problem, we neglect the spatial and
temporal correlation of data, simply treats all historical observed data from all stations as
the training data to build a supervised learning model.
(ii) Support Vector Regression (SVR). SVR is an important application branch of Support
Vector Machine (SVM), and it is used for regression task of tra c volume. The experimental
setup of SVR is consistent with GBDT.</p>
        <p>(a) mean absolute errors</p>
        <p>
          (b) root mean square errors
(iii) Spatial-Temporal Semi-Supervised Learning (ST-SSL) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. This method constructs the
spatial-temporal a nity graph and determines the spatial and temporal edge weights
respectively. Finally, the change rate of the spatial neighbor and the value of the temporal
neighbor are smoothed.
(iv) Graph Convolutional Recurrent Neural Network (GC-GRU). Referring to literature [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ],
we rst use one layer graph convolution for feature extraction and put the new feature into
GRU for time correlation analysis.
(v) STGC-Regression (STGC-R). In order to verify the e ectiveness of LDL, we set output of
the proposed network structure to a single node, the other structures remain unchanged.
And the corresponding loss function is changed to the loss function of the regression task,
namely
        </p>
        <p>Loss = min 1 Xt 1
t j jLj 2L</p>
        <p>X (y( ; j)
y( ; j))2
b</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. The obtained errors for di erent inference models</title>
        <p>The experimental results show that the performance of the proposed algorithm is better than
that of other algorithms as shown in the Figure 2. The supervised learning algorithms GBDT and
SVR perform worse than the other four semi-supervised learning algorithms, since the scarcity
of training samples makes it di cult to train a supervised model with good generalization
performance. In addition, we use the same network structure to regress this problem, and nd
that its performance is not as good as LDL, which indicates that LDL can better overcome
the challenge of poor prediction performance of regression method due to insu cient labeled
samples.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Evaluating inference models with various time spans</title>
        <p>We experimented with data of various time spans, ranging from 1 day to 17 days. The estimate
results of all methods are shown in Table 1 and Table 2, which shows that the prediction accuracy
of the proposed algorithm is always better than that of other algorithms. With the increase of
the time spans of input data, the inference performance of each algorithm decreases gradually,
but the performance of the proposed algorithm is more stable. This is because it uses the graph
1d
convolution to extract time features, which is well suited for the periodicity of long time series
data and improves the inference performance of the model for long structure sequence data.
4.5. The in uence of di erent experimental settings
(i) We use one-hot coding and embedding to process categorical features (TimeAtt, POI
features and so on) respectively. We have found that the embedding method is better
than the one-hot method.
(ii) E ect of Layer Normalization: After introducing layer normalization, the performance has
been improved.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we propose a spatio-temporal semi-supervised graph convolutional network model.
The model can predict the temporal and spatial distribution of tra c ow on the road section
without monitors by using urban environmental data and observation data from existing sites.
We have carried out experiments on real tra c data, and the results of the suggested model are
better than that of the other comparison methods, indicating that our method is more suitable
for the inference of tra c volume. In our future work we will further optimize the network
structure and parameters to get better results. In addition, the proposed model can also be
used in some practical applications, such as urban population monitoring.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported in part by the Leading Talents of Science and Technology Innovation
in Zhejiang Province 10 Thousands Plan under Grant 2018R52040, in part by the National
Key Research and Development Program of China under Grant 2016YFC0201400, in part by
the Provincial Key Research and Development Program of Zhejiang Province under Grant
2017C03019, and in part by the International Science and Technology Cooperation Program
of Zhejiang Province for Joint Research in High-tech Industry under Grant 2016C54007.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Shang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tong</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            <given-names>E</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yu</surname>
            <given-names>Y 2014</given-names>
          </string-name>
          <article-title>Inferring gas consumption and pollution emission of vehicles throughout a city In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining</article-title>
          .
          <source>ACM 1027-1036</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Hsieh</surname>
            <given-names>H P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            <given-names>S D</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zheng</surname>
            <given-names>Y 2015</given-names>
          </string-name>
          <article-title>Inferring air quality for station location recommendation based on urban big data</article-title>
          <source>In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM 437-446</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>De errard</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bresson</surname>
            <given-names>X</given-names>
          </string-name>
          and
          <string-name>
            <surname>Vandergheynst</surname>
            <given-names>P 2016</given-names>
          </string-name>
          <article-title>Convolutional neural networks on graphs with fast localized spectral ltering</article-title>
          <source>In Advances in neural information processing systems 3844-3852</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Kipf</surname>
            <given-names>T N</given-names>
          </string-name>
          and
          <string-name>
            <surname>Welling</surname>
            <given-names>M 2016</given-names>
          </string-name>
          <article-title>Semi-supervised classi cation with graph convolutional networks</article-title>
          .
          <source>arXiv preprint arXiv:1609.02907</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Li</surname>
            <given-names>Q</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Han</surname>
            <given-names>Z</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wu X M 2018</surname>
          </string-name>
          <article-title>Deeper insights into graph convolutional networks for semi-supervised learning</article-title>
          <source>In Thirty-Second AAAI Conference on Arti cial Intelligence</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Gal</surname>
            <given-names>Y</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ghahramani Z 2016</surname>
          </string-name>
          <article-title>A theoretically grounded application of dropout in recurrent neural networks</article-title>
          <source>In Advances in neural information processing systems 1019-1027</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>He</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            <given-names>S</given-names>
          </string-name>
          and
          <string-name>
            <surname>Sun</surname>
            <given-names>J 2016</given-names>
          </string-name>
          <article-title>Deep residual learning for image recognition In Proceedings of the IEEE conference on computer vision and pattern recognition 770-778</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Gao</surname>
            <given-names>B B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xing</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            <given-names>C W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            <given-names>J</given-names>
          </string-name>
          and
          <string-name>
            <surname>Geng</surname>
            <given-names>X 2017</given-names>
          </string-name>
          <article-title>Deep label distribution learning with label</article-title>
          ambiguity
          <source>IEEE Transactions on Image Processing</source>
          <volume>26</volume>
          (
          <issue>6</issue>
          )
          <fpage>2825</fpage>
          -
          <lpage>2838</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Meng</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yi</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            <given-names>J</given-names>
          </string-name>
          and
          <string-name>
            <surname>Zheng</surname>
            <given-names>Y 2017</given-names>
          </string-name>
          <article-title>City-wide tra c volume inference with loop detector data and taxi trajectories</article-title>
          <source>In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM 1</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Cui</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henrickson</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ke</surname>
            <given-names>R</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wang Y 2018 Tra</surname>
          </string-name>
          <article-title>c graph convolutional recurrent neural network: A deep learning framework for network-scale tra c learning and</article-title>
          forecasting arXiv preprint arXiv:
          <year>1802</year>
          .07007
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>