1. Introduction

ow prediction for vehicle emission calculation based on graph convolutional networks

Peng Jiang

jiangpenghz@163.com 0

Igor Bychkov

Jun Liu

2 3

Tianjiao Li

Alexei Hmelnov

1 0 Department of Science and Technology Cooperation, Westlake University , No.18, Shilong Mountain Street, Xihu District, Hangzhou , China 1 Matrosov Institute for System Dynamics and Control Theory of Siberian Branch of Russian Academy of Sciences , 134 Lermontov st. Irkutsk , Russia 2 School of Automation (Arti cial Intelligence), Hangzhou Dianzi University , No.1158, Number Two Street, Jianggan District, Hangzhou , China 3 \the Belt and Road" Institute for Information Technology, Hangzhou Dianzi University , No.115, Wenyi Road, Xihu District, Hangzhou , China

Monitoring the distribution of vehicle exhaust emissions within the city is a very challenging problem since it is a ected by many complex factors, such as spatial-temporal correlation and the other environment conditions. In addition, the technology of using sensors to directly monitor vehicle exhaust emissions is still in the initial stage, and it is hard to implement direct monitoring in a large area. Thus, we use the existing environmental theory to measure the distribution of vehicle exhaust emissions in cities by tra c volume. In this paper, the problem we need to solve is how to use the data of sparse monitoring stations and inherent tra c network to infer the spatial-temporal distribution of tra c volume. In order to solve this problem, we propose a graph convolutional network model to extract the characteristics of tra c data and other features. We have done a lot of experiments on real tra c data sets. The experimental results show that the proposed method performs better than the existing methods.

1. Introduction

With the rapid growth of vehicle ownership in China, a mass of N Ox, CO, H C, P Mx and other harmful gases emitted by vehicles have aggravated urban air pollution, resulting in the deterioration of air quality and increasingly frequent haze weather. The precondition of vehicle exhaust pollution control is e ective monitoring of them, so we need some monitoring means to quantify vehicle exhaust emissions. However, it is di cult for us to measure the emission of vehicles directly in a large area, so we calculate those emissions by the COPERT model [ 1 ], which only needs to provide the urban context data and tra c status of each road section to calculate the vehicle emissions. Urban context data can be obtained through statistics, while tra c information must be obtained through real-time monitoring of stations which cannot be deployed in all road segments.

To determine the optimal location of new monitoring stations it is required to maximize the inference performance of the tra c volume distribution model on the resulting monitoring network. This seems to be a reasonable and practical idea. After all, the layout of monitoring stations is very sparse. It is very important for us to accurately infer the tra c volume distribution on the unobserved road segments using the data monitored by the existing stations. However, without the monitoring data on the unobserved road segments, it is di cult for us to know on which road segments the stations can be placed to maximize the inference accuracy. To approximately achieve this, Hsieh et al propose a two-stage framework on deployment of air quality monitoring stations, which uses the inference model to estimate the distribution of air quality index (AQI), and then obtains the location of K new stations through the location selection model to minimize the assessment uncertainty [ 2 ]. However, this novel approach can't be directly applied to our problem, since dividing the tra c network into several grids causes the overlook of spatial correlation.

In order to achieve the above purpose, we use graph convolutional neural network to deal with this problem. It makes the training model of higher prediction accuracy and at the same time of smaller uncertainty.

The contributions of this paper are summarized as follows: (i) The proposed approach is not only able to forecast the spatial-temporal distribution of tra c volume but also to provide a basis for selecting the location of new stations and maximizing the reliability of tra c inference. (ii) We entirely use the graph convolution to learn spatial-temporal correlation of structured time series. (iii) We conduct extensive experiments on two real-world data sets. The MAE (mean absolute error) and RMSE (Root Mean Square Error) of the inference model are 49:82 and 71:74 respectively, which outperforms the baseline methods.

The other parts of this paper are as follows: the second section is the introduction of data and features, the theory of graph convolutional neural network and the problem description of this paper. The third section introduces the structure of spatial-temporal graph convolutional neural network in detail. The fourth section presents the experimental results. Finally, the full text is summarized and the future work is prospected.

2. Data and methodology 2.1. Data description

The data utilized in this paper comes from a competition about urban computing. In the data set there are 35 roads with tra c ow records. Of them 27 roads are used to train the prediction model and other 8 roads are utilized to test its performance. The data consists of the following data sets: (i) Road network features (ii) Point of interests (POI) features (iii) Speed pattern features (iv) Weather features (v) Time features (vi) Volume Records

2.2. Graph convolution

Given an undirected graph G = (V; E ; A) with N vertices i 2 V, where E is the edge set and A 2 RN N denotes the binary adjacency matrix. De errard et al built a graph convolution de ned as: g

G x where x 2 RN is the signal on the graph, G is the convolution operator, g denotes the spectral 1 1 lter, Le = m2ax L IN , L = IN D 2 AD 2 , Dii = Pj Aij , max denotes the largest eigenvalue of L and k is the Chebyshev coe cient [ 3 ]. The Chebyshev polynomials Tk(x) are recursively de ned as Tk(x) = 2xTk 1(x) Tk 2(x) with T1(x) = x and T0(x) = 1.

Kipf et al proposed a rst-order approximate graph convolution operation [ 4 ], which simpli ed this model by limiting K to 1 and approximating max by 2, which allows us to rewrite the convolution the following way:

max

Then we constrain the number of parameters: let normalization trick to the convolution matrix: g

G x 0x + 1(

IN )x

0x 2

1 1 1(D 2 AD 2 )x = 0 =

1 and further apply a

(2) (3) (4) which gives the following form of the matrix of the convolution operation: g

G x

1 1 1 1 (IN + D 2 AD 2 )x = (De 2 AeDe 2 )x where Ae = A + IN and Deii = Pj Aeij .

The above de nition of graph convolution is extended to data with Cin input channels, i.e., X 2 RN Cin (each vertex is a Cin-dimensional feature vector), and the propagation rule of this simpli ed model is given by:

H(l+1) = (De 2 AeDe 21 H(l)W (l)) 1 (5) where H(l) is the output and W (l) is the trainable weight matrix of the lth layer, H(0) = X and ( ) is an activation function.

2.3. Methodology

Our real purpose is to monitor the spatial-temporal distribution of vehicle exhaust emissions in urban tra c network. However, the existing measurement technology is di cult to monitor emissions directly in a large scale. Fortunately, given the spatial-temporal distribution of tra c conditions and tra c network data, the distribution of emissions can be calculated by the existing COPERT model, so that our goal turned to volume monitoring. Since the tra c monitoring stations can't cover the whole city, we can only obtain tra c data of partial road segments. Therefore, according to the urban context data, tra c speed and volume acquired by established monitoring stations, we will infer the tra c volume of any road in the city at any time stamp. Thereafter, spatial-temporal distribution of tra c volume can be further employed to estimate the distribution of vehicle exhaust emissions according to the COPERT model.

3. Inference of the model

In this section, we describe the structure of proposed model (called STGC-LD) in detail, which includes spatial-temporal learning block, two attributes extraction block and a label distribution learning block, as shown in Figure 1. The spatial-temporal learning block is employed to learn the spatial correlations and temporal dependencies from tra c travel speed. First attribute block is responsible for processing external factors (e.g. time of the day and weather), while second attribute block is used to extract structural features of tra c network. These blocks are all connected by residuals, which makes it easier for them to be added and deleted. Finally, the label distribution learning block estimates the spatial-temporal distribution of tra c volume within the city, but also reveal the con dence of its inference.

3.1. Spatial-temporal learning block

There is a certain correlation between tra c volume and travel speed, and nearby roads with similar travel speed follow the same volume patterns in all probability. Accordingly, we design a spatial-temporal learning block, containing a layer spatial graph convolution (SGC) and a layer temporal graph convolution (TGC), to extract the spatial-temporal properties of travel speed, which is a 3-dimensional structured time series.

3.1.1. SGC for extracting spatial features. We deal with the adjacency matrix as:

Ab = (Ae

Wembed) where Wembed is the learnable matrix that can be adjusted to a ect the degree of closeness, and denotes the element-wise matrix product. Then we put Ab and Dbii = Pj Abij into the graph convolutional network, and get the adaptive graph convolutional network as: (6) (7) H(l+1) = (Db 2 AbDb 21 H(l)W (l))

1 The above formula can adjust the weight of edges adaptively based on the graph structure and the attributes of each vertex, and learn the in uence of di erent adjacent vertices.

We set the travel speed to Attspeed 2 Rt n Cspeed and adjacency matrix of tra c network to As 2 Rn n, where t, n, Cspeed are the number of time steps, the number of road segments in the tra c network and dimension of the speed feature, respectively. The graph convolution described above can only process two-dimensional data, but travel speed is a 3-dimensional tensor. Hence, we share parameters on the time axis, that is, we do the same convolution on each time stamp. After a convolution operation, the output Zs 2 Rt n Cout is de ned as: ZSi = DcS 1 2 AcSDcS 1 2 AttispeedWS , i 2 f1; 2; :::; tg (8) Where Attispeed 2 Rn Cspeed , WS 2 RCspeed Cout is a kernel of spatial graph convolution. 3.1.2. TGC for extracting temporal features. Nowadays, although the model based on recurrent neural network is widely used in time series analysis, its application in tra c forecasting task still su ers from the complexity of gate mechanisms, time-consuming iterations and low response to dynamic changes. Such networks cannot simulate very long-range temporal dependencies (e.g. period and trend), and training becomes harder as depth increases. In this paper, graph convolution is employed to encode the temporal correlation directly, avoiding the explicit smoothing regularization in the loss function. Firstly, we need to construct an a nity graph for the time series. Since the tra c volume does not change abruptly on the time axis and follows a strong periodicity, we connect neighbor and periodic timestamps on the time series of each road section to construct the time a nity graph. For a time stamp node Ti of a time series, the time neighbors of the point can be expressed as fTi p Pweek ; :::; Ti Pweek ; Ti p Pday ; :::; Ti Pday ; :::; Ti p; :::; Ti 1;

Ti; Ti+1; :::; Ti+p; Ti+Pday ; :::; Ti+p Pday ; Ti+Pweek ; :::; Ti+p Pweek g (9) where p is a super-parameter, Pday and Pweek represent the period of one day and one week respectively. Besides, we set the temporal edge weights as 1.

We transpose the output of the SGC to Q = ZST 2 Rn t Cout , and set the temporal adjacency matrix as AT 2 Rt t. Then, we share parameters in the space, that is, we do the same convolution on the time series for each vertex. After the convolution operation, the features are mapped as follows:

ZTi = DcT 1 2 AcT DcT 1 2 QiWT , i 2 f1; 2; :::; ng (10) where WT 2 RCout Cout is a kernel of the temporal graph convolution. 3.1.3. Spatial-temporal learning. In order to extract the spatial correlations and temporal dependencies of structured sequences of data simultaneously, we design a spatial-temporal Learning block which stacks a SGC layer and a TGC layer. Too many convolution layers could converge the features of interconnected vertices to the same values [ 5 ]. Moreover, layer normalization is equipped with the spatial-temporal Learning block to prevent over tting. The output of this block is denoted as XST .

3.2. Attribute block 1

In this block, we preprocess and integrate the weather features and time features. The time range 6 : 00-23 : 00 is divided into 17 timeslots, each timeslot corresponds to an hour, namely TimeAtt 2 f1; 2; :::; 17g. Since the dimension of TimeAtt is large, the one-hot coding would lead to a high computing cost, so we adopt the embedding method to transform these categorical features into low-dimensional vectors. Speci cally, the embedding method is to multiply each categorical value 2 R1 C by a learnable parameter matrix W 2 RC O. Usually we have O C, so that the embedding method can e ectively reduce the dimension of input features and make model calculation more e cient. Furthermore, a signi cant property of embedding method is that the categorical values with similar semantic meaning are usually very close in the embedding space [ 6 ]. The output of this block is denoted as XAtt.

3.3. Attribute block 2

Tra c network attributes mainly include road network structure, road section features, POI features, etc. We utilize the embedding method to process the number of lanes, road grade and other categories of road network features, and normalize the road length, POI features and so on. Then, the preprocessed features are concatenated and fed into SGC to extract spatial correlation. In our model, we connect blocks by residuals to make them easier to add and remove. He et al has shown that training the neural networks with residual connections is easier and more robust [ 7 ]. The output of this block is denoted as XNet.

3.4. Label distribution learning block

We adopt label distributed learning [ 8 ] on a single model. The input of LDL is set to X 2 Rt n d, where t, n, d are the number of time steps, the number of road segments and the number of feature dimensions of each road respectively. The task of LDL is to estimate the tra c volume distribution vector y( ; j) = fy0; y1; :::; yqmax g 2 Rqmax+1 of a road v at a timestamp j, where qmax is determined by the maximum average tra c volume per lane in the training data. In this problem, we quantify the real volume value from existing station as a normal distribution vector, whose expectation is the real value and variance is a super-parameter. Then, the model is learned by minimizing the symmetric Kullback-Leibler divergence of the estimated and the observed label distributions:

t LossL = min 1 X t j 2 jLj 2L i=0 1

qmax

X X KL( ; j)[i]

KL( ; j)[i] = y( ; j)[i] log yb( ; j)[i] + yb( ; j)[i] log y( ; j)[i] (12) where L is a set of observed roads, yb( ; ) is the estimated label distribution. If we need to know the speci c value of tra c volume, we can compute the expectation of the probability distribution vector, namely: qmax X iyb( ; )[i] i=0 (11) (13)

4. Experiments 4.1. Inferring performance comparison

To demonstrate the e ectiveness of proposed inference model and deployment model, we further compare them with several existing approaches using the real tra c data described in the Section 2.1. The parameters of all the models are ne-tuned through the grid search. In the following experiments, we repeat each of them 50 times to obtain the average results. 4.1.1. Training data usage. The tra c network contains 793 road segments, in which 35 road segments are equipped with loop detectors (i.e. have volume values), while the remaining roads are unknown. The tra c volumes were collected every hour from March 16 to April 1, 2016 (17 days in total) and sampled each day from 6 : 00 to 23 : 00. In the experiment, we randomly divide the set of 35 road sections into the two subsets of 27 and 8 roads, the former contain 27 17 17 instances which are used as the training set and the latter contain 8 17 17 instances to be used as the testing set. All the experiments had been repeated 50 times and the training and testing sets were randomly shu ed in each repetition. 4.1.2. Model settings. For the inference model each fully connected layer has 64 channels. The temporal neighbor parameter is set to 3, and the variance of the normal distribution in LDL is set to 2. Besides, we set the initial learning rate as 10 3 with a decay rate of 0:9 after every 40 epochs.

4.2. Competitors

(i) Gradient Boosting Decision Tree (GBDT). In our problem, we neglect the spatial and temporal correlation of data, simply treats all historical observed data from all stations as the training data to build a supervised learning model. (ii) Support Vector Regression (SVR). SVR is an important application branch of Support Vector Machine (SVM), and it is used for regression task of tra c volume. The experimental setup of SVR is consistent with GBDT.

(a) mean absolute errors

(b) root mean square errors (iii) Spatial-Temporal Semi-Supervised Learning (ST-SSL) [ 9 ]. This method constructs the spatial-temporal a nity graph and determines the spatial and temporal edge weights respectively. Finally, the change rate of the spatial neighbor and the value of the temporal neighbor are smoothed. (iv) Graph Convolutional Recurrent Neural Network (GC-GRU). Referring to literature [ 10 ], we rst use one layer graph convolution for feature extraction and put the new feature into GRU for time correlation analysis. (v) STGC-Regression (STGC-R). In order to verify the e ectiveness of LDL, we set output of the proposed network structure to a single node, the other structures remain unchanged. And the corresponding loss function is changed to the loss function of the regression task, namely

Loss = min 1 Xt 1 t j jLj 2L

X (y( ; j) y( ; j))2 b

4.3. The obtained errors for di erent inference models

The experimental results show that the performance of the proposed algorithm is better than that of other algorithms as shown in the Figure 2. The supervised learning algorithms GBDT and SVR perform worse than the other four semi-supervised learning algorithms, since the scarcity of training samples makes it di cult to train a supervised model with good generalization performance. In addition, we use the same network structure to regress this problem, and nd that its performance is not as good as LDL, which indicates that LDL can better overcome the challenge of poor prediction performance of regression method due to insu cient labeled samples.

4.4. Evaluating inference models with various time spans

We experimented with data of various time spans, ranging from 1 day to 17 days. The estimate results of all methods are shown in Table 1 and Table 2, which shows that the prediction accuracy of the proposed algorithm is always better than that of other algorithms. With the increase of the time spans of input data, the inference performance of each algorithm decreases gradually, but the performance of the proposed algorithm is more stable. This is because it uses the graph 1d convolution to extract time features, which is well suited for the periodicity of long time series data and improves the inference performance of the model for long structure sequence data. 4.5. The in uence of di erent experimental settings (i) We use one-hot coding and embedding to process categorical features (TimeAtt, POI features and so on) respectively. We have found that the embedding method is better than the one-hot method. (ii) E ect of Layer Normalization: After introducing layer normalization, the performance has been improved.

5. Conclusion

In this paper, we propose a spatio-temporal semi-supervised graph convolutional network model. The model can predict the temporal and spatial distribution of tra c ow on the road section without monitors by using urban environmental data and observation data from existing sites. We have carried out experiments on real tra c data, and the results of the suggested model are better than that of the other comparison methods, indicating that our method is more suitable for the inference of tra c volume. In our future work we will further optimize the network structure and parameters to get better results. In addition, the proposed model can also be used in some practical applications, such as urban population monitoring.

Acknowledgments

This work was supported in part by the Leading Talents of Science and Technology Innovation in Zhejiang Province 10 Thousands Plan under Grant 2018R52040, in part by the National Key Research and Development Program of China under Grant 2016YFC0201400, in part by the Provincial Key Research and Development Program of Zhejiang Province under Grant 2017C03019, and in part by the International Science and Technology Cooperation Program of Zhejiang Province for Joint Research in High-tech Industry under Grant 2016C54007.

[1] Shang

, Zheng

, Tong

, Chang

and Yu

Y 2014

Inferring gas consumption and pollution emission of vehicles throughout a city In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining . ACM 1027-1036

[2] Hsieh

H P

, Lin

S D

and Zheng

Y 2015

Inferring air quality for station location recommendation based on urban big data

In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM 437-446

[3] De errard

, Bresson

and Vandergheynst

P 2016

Convolutional neural networks on graphs with fast localized spectral ltering

In Advances in neural information processing systems 3844-3852

[4] Kipf

T N

and Welling

M 2016

Semi-supervised classi cation with graph convolutional networks . arXiv preprint arXiv:1609.02907

[5] Li

, Han

and Wu X M 2018

Deeper insights into graph convolutional networks for semi-supervised learning

In Thirty-Second AAAI Conference on Arti cial Intelligence

[6] Gal

and Ghahramani Z 2016

A theoretically grounded application of dropout in recurrent neural networks

In Advances in neural information processing systems 1019-1027

[7] He

, Zhang

, Ren

and Sun

J 2016

Deep residual learning for image recognition In Proceedings of the IEEE conference on computer vision and pattern recognition 770-778

[8] Gao

B B

, Xing

, Xie

C W

, Wu

and Geng

X 2017

Deep label distribution learning with label ambiguity IEEE Transactions on Image Processing 26 ( 6 ) 2825 - 2838

[9] Meng

, Yi

, Su

, Gao

and Zheng

Y 2017

City-wide tra c volume inference with loop detector data and taxi trajectories

In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM 1

[10] Cui

, Henrickson

, Ke

and Wang Y 2018 Tra c graph convolutional recurrent neural network: A deep learning framework for network-scale tra c learning and forecasting arXiv preprint arXiv: 1802 .07007