Short-Term Traffic Forecasting: A Dynamic ST-KNN Model Considering Spatial Heterogeneity and Temporal Non-Stationarity Shifen Cheng, Feng Lu State Key Lab of Resources and Environment Information System Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences 11A, Datun Road, Chaoyang District, Beijing 100101, P. R. China {chengsf, luf}@lreis.ac.cn ABSTRACT categories: parametric models and nonparametric models. A para- Accurate and robust short-term traffic forecasting is a critical metric model uses an explicit parametric function to quantify the issue in intelligent transportation systems and real-time traffic relationship between historical traffic data and predicted traffic related applications. Existing short-term traffic forecasting ap- data. Considering the stochastic and nonlinear characteristics of proaches are used to adopt global and static model structures and traffic, constructing a mathematical model with high accuracy assume the traffic correlations between adjacent road segments for characterizing traffic characteristics in practice is difficult within assigned time periods. Due to the inherent characteristics [1]. Nonparametric models, such as data-driven methods, do not of spatial heterogeneity and temporal non-stationarity of city require a priori knowledge and explicit expression of mechanism; traffic, it is rather difficult for these approaches to obtain stable thus, they are more suitable for short-term traffic forecasting and satisfying results. To overcome the problems of static model problems [22] [23] [7] [31]. structures and quantitatively unclear spatiotemporal dependency As a typical nonparametric method, the k-nearest neighbors relationships, this paper proposes a dynamic spatiotemporal k- (KNN) model has received considerable attention. Many scholars nearest neighbor model, named D-ST-KNN, for short-term traffic have successfully applied the traditional KNN model to short- forecasting. It comprehensively considers the spatial heterogene- term traffic prediction [2][17][19][4][16][30][28]. Considering ity and temporal non-stationarity of city traffic with dynamic that the evolution of traffic is a spatiotemporal interaction pro- spatial neighbors, time windows, spatiotemporal weights and cess, traffic conditions of road segments are spatially and mutu- other parameters. First, the sizes of spatial neighbors and the ally affected [6]. Therefore, spatiotemporal relationships between lengths of time windows for traffic influence are automatically multiple road segments in road networks are considered to im- determined by cross-correlation and autocorrelation functions, prove traffic prediction [25][21][20]. Based on the traditional respectively. Second, dynamic spatiotemporal weights are intro- KNN model, [26] realized an enhanced model with the support duced into the distance functions to optimize the search mecha- of spatiotemporal information and argued that it achieves better nism. Then, dynamic spatiotemporal parameters are established performance than the model that employs only temporal informa- to adapt the continuous change in traffic conditions, including tion. [27] considered upstream and downstream traffic informa- the dynamic number of candidate neighbors and dynamic weight tion and proposed a distributed architecture of a spatiotemporal- allocation parameters. Finally, the D-ST-KNN model is evaluated weighted KNN model for short-term traffic prediction. [3] em- using two vehicular speed datasets collected on expressways in ployed a spatiotemporal state matrix instead of the traditional California, U.S. and city roads in Beijing, China. Four traditional time series to describe the traffic state while using a Gaussian prediction models are compared with the D-ST-KNN model in weight distance to select the nearest neighbor to improve the terms of the forecasting accuracy and the generalization ability. KNN model. However, the disadvantages of these ST-KNNs are The results demonstrate that the D-ST-KNN model outperforms that the spatiotemporal relation cannot be accurately quantified, existing models in all time periods, especially in the morning which is primarily reflected in the modeling process, the size of period and evening peak period. In addition, the generalization the spatial dimension m and the length of time window n of the ability of the D-ST-KNN model is also proved. state space cannot be automatically determined, and some values are artificially set. For example, for m=3, three adjacent road segments are selected; for n=2, the historical data of the first two 1 INTRODUCTION time steps of the current time step are used to construct samples. When the time series problem is transformed into a supervised Short-term traffic forecasting, which has an important role in machine learning problem, the values of m and n determine the intelligent transportation systems, enables traffic managers to number of selected features. Therefore, manually engineered fea- formulate reasonable and efficient strategies for alleviating traffic tures can easily cause dimensional disaster prevent the guarantee congestion and optimizing traffic assignments. Short-term traffic of the prediction accuracy of the model [15]. forecasting also enables the public to achieve accurate vehicular The prediction model is usually static, thus, it cannot describe path planning [29][10]. the characteristics of the dynamic change in traffic, which are In the past few decades, researchers have proposed several primarily reflected in the following three aspects: 1) existing stud- short-term traffic forecasting models that can be divided into two ies usually assume that the spatial neighbors and time windows are globally fixed, which indicates that once the number of road © 2018 Copyright held by the owner/author(s). Published in the Workshop Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna, segments m associated with the predicted road segment and the Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted length of the time window n are determined, they do not change under the terms of the Creative Commons license CC-by-nc-nd 4.0. 133 in the spatiotemporal range. Considering the dynamic character- neighbor and the dynamic time window instead of the traditional istics of an urban road network, traffic flow in the road network time series or the static spatiotemporal matrix to characterize is not a static point but is a moving process from one location the state space. Finally, we introduced the dynamic spatiotempo- to another location. The spatial neighbors of the road segment ral weight, dynamic spatiotemporal parameters, and Gaussian primarily rely on the current traffic conditions. The number of weight function to improve the KNN model to adapt to the dy- spatial neighbors is very small if traffic congestion exists but is namic and heterogeneous characteristics of the traffic. large during flat peak periods [5]. From the perspective of ur- The remainder of this paper is organized as follows: Section 2 ban road network heterogeneity, the number of relevant road proposes a D-ST-KNN model that considers the spatial hetero- segments for different road segments also differs; thus, sharing geneity and temporal non-stationarity of city road traffic. The parameter m is difficult in the entire spatial range [29]. The selec- construction of the dynamic spatiotemporal state matrix, weights, tion of a time window based on a time series is used to determine and other parameters are also introduced in this section. In Sec- the length of the historical traffic data to match similar traffic tion 3, the dynamic characteristics, prediction performance, and patterns. The traffic data in the historical time step and the cur- computational efficiency of the presented model are compre- rent time step must be relevant in the selection process [18]. Due hensively validated. The experimental results are also discussed. to the dynamic and heterogeneous nature of the road network, Section 4 concludes the paper and provides an outlook of future even the same road segment, a significant difference is observed work. in the time series of traffic data in different time periods (such as morning and evening peak periods). That causes the selection of 2 METHODOLOGY the time window to be dynamic [8]. Thus, the spatial neighbors In this section, we propose a D-ST-KNN model. Our method is and time windows that dynamically change over time and space divided into five phases: the data bucket partition, state space are not easily described with globally fixed spatiotemporal state definition, distance function definition, optimal neighbor selec- matrices; thus, there is a need for a dynamic spatiotemporal KNN tion, and prediction function definition, which corresponds to model to adapt to the characteristics of traffic changes. 2) Existing Sections 2.1-2.5. First, considering the dynamic nature of traffic, research considers that different historical data for different time the original spatiotemporal data sets are partitioned according periods have different contributions to the prediction of future to different time periods to form different data buckets. Second, traffic conditions. When calculating the distance between two considering the spatial heterogeneity, each segment of a data state spaces, the weight distance criterion is usually adopted to bucket is separately processed, and the appropriate spatial neigh- assign different weights to each component in the state space. bors and time windows are selected. The spatiotemporal state The closer the time window is to the predicted time, the larger the matrix is constructed to describe the traffic conditions. Then, we allocated weight; the closer the spatial distance is to the predicted introduce the spatiotemporal weight matrix to define the dis- road segment, the greater the assigned weight [3]. However, dy- tance function and measure the distance between the current namic changes in the spatial neighbor and the time window not spatiotemporal state matrix and the historical spatiotemporal only affect the dimension of the space-time matrix but also cause state matrix to select the K nearest neighbors. Finally, we inte- the intensity of the correlation among different positions to dy- grate these neighbors to obtain the predicted value of the target namically change over time. Therefore, the influence of different road segment. components of traffic data is difficult to characterize with global fixed spatiotemporal weight matrix. 3) To determine the value of 2.1 Data bucket the number of similar state spaces K, researchers usually employ a cross-validation method to select a suitable value, then share in Considering the non-stationarity and periodicity of traffic data, the entire range of space and time[26] [28]. Due to the difference there are significant differences in the traffic characteristics among in traffic patterns in the different time periods and space loca- different time periods, such as the morning peak period, inter- tions, the global fixed value of K cannot adapt to the dynamic peak period, and evening peak period. In the same period, the and heterogeneous nature of a road network. traffic data of same road segment has statistical homogeneity The key to short-term traffic forecasting models is the effective and the traffic pattern tends to be stable with periodic changes, use of the potential spatiotemporal dependencies in the traffic such as different days for the morning peak period, which results data. The existing KNN models usually assume that the traffic in the spatial neighbor, the time window, and spatiotemporal change is a static point process and often disregard its important parameters that can be shared. Therefore, we divide the origi- L dynamics and heterogeneous characteristics. As a result, the nal traffic data {volt j , j ∈ [1, N ], t ∈ [t 0 , tc ]} into different time structure of the prediction model is usually globally fixed in time periods to describe the homogeneity in same time period and and space, including the globally fixed spatial neighbor, time dynamics in different time periods, where t 0 and tc represent the window, spatiotemporal weights, and spatiotemporal parameters, start time step and the current time step of the time series, and such as the traditional KNN model and the spatiotemporal KNN L j denotes the jth road segment. model. In the study of urban traffic modeling and prediction, to distin- In this paper, we propose a dynamic spatiotemporal KNN guish the difference among the traffic characteristics in different model (D-ST-KNN) for short-term traffic prediction considering time periods, [24] divided a day into six time periods (period 1: spatial heterogeneity and temporal non-stationarity of city traf- midnight-6:30 am; period 2: 6:30-10:00; period 3: 10:00-13:30; pe- fic. First, we investigated the autocorrelation of road traffic to riod 4:13:30-17:00; period 5:17:00-20:30; period 6:20:30-midnight). determine the time window required for the traffic data. Second, The test reveals that the partition is statistically acceptable. Based we used the cross-correlation among different road segments to on this analysis and according to the same strategy, the original analyze the spatiotemporal dependencies of traffic and build a traffic data are divided into M different time periods (M= 6) ac- dynamic spatial neighbor for each road segment. The dynamic cording to the time dimension, which corresponds to different spatiotemporal state matrix is obtained by the dynamic spatial data buckets. Assuming that the entire traffic data set is BK, the 134 Lv data bucket division must be satisfied: where ψbk is the lag value that maximizes cross-correlation of i  the surrounding road segment Lv to the predicted road segment    BK = bk 1 ∪ bk 2 ∪ ... ∪ bk M Lv L in bki , and ψbk describes the maximum impact time range of bki = {volt j |1 ≤ j ≤ N , ∀t ∈ [tabki , tbbki )} (1)  i  bk ∩ bk = ϕ the surrounding segments in different buckets on the predicted  i o road segment, which can be employed for efficient selection of where i ∈ [1, M], o ∈ [1, M], i , o, bki is the ith bucket (i.e., spatial neighbors. Consider the predicted road segment L j in bki L and its predicted time interval ∆t. When the surrounding road bucket 1), and volt j is the traffic data of road segment L j at segments deliver the traffic flow to the predicted road segments time step t. t ∈ [tabki , tbbki ) indicates that time step t is within within a given time interval, they influence the predicted road the corresponding time period of the ith bucket (i.e.,[0:00-6:30), segments, and the road segments beyond this time interval are [6:30-10:00)). L j denotes the jth road segment (i.e., Link 1), and excluded. Its formal definition is expressed as N is the total number of road segments. Note that dividing the n o L Lv original traffic data into different buckets at the pre-processing Rbkj ← Lv |∀0 ≤ ψbk ≤ ∆t, v ∈ [1, N ] (4) i i stage does not have any impact on the analyses and conclusions L in this study because the same partitioning strategy were used where Rbkj is the set of spatial neighbors of the jth road segment for all the algorithms that are evaluated. i in the ith bucket. 2.2.2 Dynamic time windows. Considering that the selection 2.2 Dynamic spatiotemporal state matrix of the time window is based on the time series of the predicted 2.2.1 Dynamic spatial neighborhoods. The dynamic spatial neigh- road segment, we can select n historical traffic data that have a borhood is used to determine how the traffic conditions of the correlation with the predicted road segment. The autocorrelation predicted road segment are affected by the surrounding road seg- function is usually employed to measure the correlation between ments in different buckets to determine the correlation among the time series and its delayed version; thus, it can be used for the road segments. The traditional method usually calculates the cor- selection of the time window, i.e., the lag in which the prediction relation coefficients between the time series of the predicted road error is minimized can be set as the window size. Note that the segments and the time series of other road segments and sets the lag in the autocorrelation function describes the delay effect of threshold to select the relevant road segments [3]. Considering the time series, and the lag described in Section 2.2.1 is used to that a road network has multiple internal and external factors, characterize the delay effect between different time series. Given L such as the influence of traffic lights, the impact of surround- the time series of the jth road segment L j in bki , U = {volt j |∀t ∈ ing road segments on predicted road segments has a certain bk L [tabki , tb j )}, the autocorrelation function ρbkj (δ ) can be defined degree of lag. Therefore, the delayed spatiotemporal relation- i as follows: ships cannot be exactly expressed by correlation coefficients. The cross-correlation function is a delayed version of the correlation L E [(ut − µu ) (ut −δ − µu )] ρbkj (δ ) = , δ = 0, 1, 2, · · · , (5) coefficient function, which measures the correlation coefficients i σu2 of two time series at a specific lag [14]; therefore, it is more Using the autocorrelation function to set the time window suitable for describing the spatiotemporal dependence of traffic. entails three steps. First, consider the computational limitations, Assume that bki is the bucket of the predicted road segment it is necessary to determine the maximum range of lag. Second, L j at time step t, and t ∈ [tabki , tbbki ). Given the surrounding within the range, the parameters of the predictive model are road segments Lv , the time series of the traffic data for two road fixed, and cross-validation is performed with different lags. This L segments can be expressed as U = {volt j |∀t ∈ [tabki , tbbki )}, strategy is based on the fact that the value of the traffic data has a significant correlation within the maximum lag range. Finally, the Z = {voltLv |∀t ∈ [tabki , tbbki )}, j ∈ [1, N ], v ∈ [1, N ], and their lag that minimizes the prediction error is chosen as the optimal cross-correlation at lag φ is defined as follows: time window.   bk  γ zi (φ)  bk i (φ) = u, , φ = 0, ±1, ±2, · · · ,  2.3 Dynamic spatiotemporal weights cc fu,z    α u σz   γu,z  bk i (φ) = E q (ut − µu ) zt +φ − uz Considering the traffic conditions have significant differences  Í (2)   (ut − µu )2 at different time intervals, which results in a change in the spa-  q αu =   Í 2 tiotemporal weights with time; the historical data of different   σz = zt +φ − uz time and space will influence the future traffic conditions by a bk i different degree. The dimension of the spatiotemporal weight where γu,z (φ) is the correlation coefficient between time series is related to the spatiotemporal state matrix, and the dynamic U and time series Z at lag φ in bucket bki , µu and uz are the mean change in the spatiotemporal matrix causes the dimension of values of U and Z, respectively, and σu and σz are the standard the spatiotemporal weight matrix to change with different time deviations of U and Z, respectively. periods. Based on the traditional weight distance function, we In this definition, the cross-correlation function can be re- introduce a dynamic spatiotemporal weight in the distance func- garded as a function of lag, and the lag value that makes the tion and optimize the weight distance function to adapt the near- cross-correlation function obtain the maximum value is the av- est neighbor similarity measure of the dynamic spatiotemporal erage delay time of the surrounding segments to the predicted matrix. road segment [29]. The formal definition is expressed as In the temporal dimension, we use the time interval length (i.e.,   5 min interval) to characterize the contribution of different time Lv ψbk bk i = arдmax cc fu,z (φ) , v ∈ [1, N ] (3) steps. In the spatial dimension, the spatial correlation (such as i φ cross-correlation) is used to characterize the influence of different 135 spatial distances. The construction method is described as follows: prediction accuracy of the model. The K value is primarily em- assume that the predicted road segment L j at the current time ployed to determine the number of candidate neighbors. If the K step tc is in data bucket bki and the dimension of the spatiotempo- value is too small, the model becomes more complex and overfit- L L ral state matrix is mbkj ×nbkj , which is determined by the method ting is possible. If the K value is too large, the model is simpler i i provided in Section 2.2. Then, the spatiotemporal and under-fitting is possible. Considering that the selection of the  state matrix  of K value is significantly influenced by the finite sample nature of L L L the current time step can be expressed as χtcj mbkj , nbkj .The i i the problem, the assignment of its values is usually performed by spatiotemporal  matrix of the historical time step hi can be de- cross-validation to select the K value that minimizes the model L L L L fined as χh j mbkj , nbkj , where mbkj is the spatial dimension of error [27]. i i i i the spatiotemporal state matrix of the jth predicted road segment The existing methods usually assume that the K value is glob- in the ith bucket, which is related to the number of elements ally fixed. When the K value is determined, it is shared through- L L in the set of spatial neighbors Rbkj . Moreover, nbkj is the tem- out the entire space and time. In contrast to the existing method, i i the selection of the K value in the D-ST-KNN model considers the poral dimension of the spatiotemporal state matrix of the jth characteristics of dynamic changes of traffic. Instead of setting predicted road segment in the ith bucket, which is the size of a global fixed K value, we can select the optimal K value for the time window. The time-weighted matrix is defined as Wtbki , different buckets, i.e., Kbki , bki ∈ BK, i] ∈ [1, M]. and the space-weighted matrix is defined as Wsbki . The corre- To verify these assumptions, we use cross-validation to set the L L sponding elements are w tbki (ti, t j), ti ∈ [1, nbkj ], t j ∈ [1, nbkj ] range of K to [1, 40] and test the effect of different K values on i i L L MAPE of the model in different buckets, as shown in Fig. 1. and w sbki (si, sj), si ∈ [1, mbkj ], sj ∈ [1, mbkj ], which represent i i the time weight value and space weight value, respectively, as- signed to the jth predicted road segment in the ith bucket. The Bucket1 Bucket2 Bucket3 12.5 20.5 16.0 weight distribution is as follows: 12.0 19.5 15.0 , Lj 11.5 18.5 14.0 MAPE(%) MAPE(%)  MAPE(%) 11.0  17.5 13.0  Ínbki 10.5   16.5 12.0 10.0 15.5 11.0 ti t i=1 ti, ti = t j w tbki (ti, t j) = 9.5 (6)  14.5 10.0 9.0  0 10 20 30 40 0 10 20 30 40  0 10 20 30 40 K K  0, ti , t j K ,  Bucket4 Bucket5 Bucket6   Ímbki Lj  cc f si  19.5 22.5 15.0 21.5 14.5 si 18.5 14.0 w sbki (si, sj) = si=1 cc f Lv , L j , si = sj (7) 17.5 20.5 13.5 Lv , L j  MAPE(%) MAPE(%) MAPE(%) 19.5  13.0 16.5  18.5 12.5  0, si , sj 15.5 17.5 12.0 11.5 14.5 16.5 11.0 13.5 15.5 10.5 In this definition, the temporal and spatial weights are linearly 0 10 20 K 30 40 0 10 20 K 30 40 0 10 20 K 30 40 distributed according to the proximity of the current time step and the predicted road segments. cc f Lsi , L is the cross-correlation Figure 1: Impact of the number of candidate neighbors v j between the time series of the si spatial neighbor (whose road Kbki on the MAPEs of different data buckets. segment is Lv ) and the predicted road segment L j . The closer the value is to the predicted time, the greater the weight of As the K value increases, the prediction error is gradually the allocation; the greater the relation to the space of the pre- reduced. When the K value attains a certain value, the error dicted road segment, the greater the weight. By introducing spa- of the model begins to stabilize. Thus, the optimal K value for tiotemporal weights into the original spatiotemporal matrix, the each bucket can be determined (i.e., Kbk1 = 27, Kbk2 = 23). spatiotemporal-weighted state matrices of the current time step Compared with different buckets, the K values dynamically vary L Γtc j and the spatiotemporal-weighted state matrices of the his- with different time periods. The global fixed K value has difficulty L describing the dynamic change in traffic. Therefore, the dynamic torical time step Γh j are denoted by the following: K value proposed in this paper is reasonable. The parameters   i Lj L L L of the D-ST-KNN model also contain the parameters introduced Γtc = Wsbki × χtcj mbkj , nbkj × Wtbki (8) by the predicted generation function (refer to Section 2.5). The   i i L L L L calibration method of the parameter is shown in Section 3.2. Γh j = Wsbki × χh j mbkj , nbkj × Wtbki (9) i i i i L L By calculating the distance dbki (Γtc j , Γh j ) between the histor- 2.5 Predictive function i ical spatiotemporal state matrix and the current spatiotemporal Due to the spatiotemporal state space, the spatiotemporal weight, state matrix, candidate neighbors can be selected. The formula is and the spatiotemporal parameters dynamically change with dif- expressed as ferent buckets; to adapt to this change, the predictive generation   r      function should also dynamically change. This paper transforms L L L L L L ′ dbki Γtc j , Γh j = trac Γtc j − Γh j × Γtc j − Γh j (10) the four types of traditional weight distribution methods to en- able them to adapt to the dynamics of traffic, including the inverse i i i where trac represents the trace of the matrix. distance weight [23], rank-based weight[11][13], and Gaussian weight [3]. Selecting the best prediction function by comparing 2.4 Dynamic spatiotemporal parameters the performance of different predictive functions (refer to Section In the KNN model, the spatiotemporal parameters include the 3.2). Note that the weight referred to in this section is expressed K values and the parameters introduced during the method con- as the weight assigned by the candidate neighbor, whereas the struction (such as the prediction generation functions). The rea- weight in Section 2.3 represents the weight matrix of the weights sonableness of the parameters has substantial influence on the assigned to each element in the spatiotemporal state matrix. 136 Assuming that dbk k is the distance between the kth candi- i 3 EXPERIMENTS date neighbor and the predicted road segment in the ith bucket 3.1 Data preparation œ L obtained by formula (10), voltcj+1 the predicted value of the pre- In this study, two different data sets are used to evaluate the dicted road segment L j at time step tc + 1 is defined as performance of the prediction model. The first data set is PeMS, which is a high-quality data set with open access. PeMS is exten- ÍKbki L L œ L volh j +1 (k) × φbkj (k) sively applied in the field of traffic prediction. The traffic speed k =1 voltcj+1 = ÍKbki L j i i (11) data from 59 consecutive locations on the US 101 freeway from k =1 bk φ (k) i PeMS were downloaded for a total of 60 days; the time period is August 15, 2016, to October 14, 2016 and time interval is 5 where tc ∈ [tabki , tbbki ] is used to map the current time step into min (as shown in Table 1). Each detector represents a position; the corresponding bucket, is used to determine the number of the positional distribution is shown in Fig. 2. The second data candidate neighbors for the corresponding bucket, volh j +1 (k) L set is the floating car trajectory data obtained from the Beijing represents the traffic data of the kth candidate neighbor, and i road network, which is generated from more than 50,000 vehicles L equipped with GPS. The frequency of data acquisition is 5 min, hi ∈ [tabki , tbbki ]; and φbkj (k) and represent the weight of the kth i and the time period is March 1, 2012, to April 30, 2012 (as shown neighbor of the jth predicted road segment in the ith bucket. The in Table 1). In this study, a representative region that contains form is defined as follows: 30 road segments is used for the experiment with the position   1 distribution shown in Fig. 2. In the two data sets, the last ten    Kbki  days are used as the test data to evaluate the accuracy of the   1 L   k dbk model. The remaining days of data are employed as training data φbkj (k) = i (Kbki − rq + 1)2 (12) to construct the historical database of the predicting model.    i   2 In addition, we normalize the original traffic data and use the   k dbk  1 i 4π abki exp(− 4abki 2 ) ratio of the average traffic speed to the maximum speed limit of  each road segment to express the traffic conditions of the road Formula (12) corresponds to equal weights, inverse distance segment. The formal expression is as follows: weights, the rank-based weight and the Gaussian weight, where vi,t vdi,t = , i ∈ [1, N ], t ∈ [t 0 , tc ] (16) rq represents the order of the qth candidate neighbors, and abki fi,max is the spatiotemporal parameter whose value is similar to the where vdl,t is the normalized speed of the ith road segment at value of the previously discussed spatiotemporal parameter K, time step t, vi,t is the real average speed data of the road segment, which dynamically values with different time periods. The corre- and fi,max is the speed limit for the ith road segment. sponding parameter calibration is shown in Section 3.2. Table 1: Description of the experimental data sets 2.6 Accuracy metrics Three criteria are selected to verify the prediction accuracy of Data set PeMS Beijing the D-ST-KNN model, namely, mean absolute error (MAE), mean Time span 8/15/2016-10/14/2016 3/1/2012-4/30/2012 absolute percentage error (MAPE) and root-mean-square error Time interval 5 min 5 min (RMSE). These indicators depict the essential characteristics of errors from different perspectives. The RMSE indicates a fluctua- Number of links 59 30 tion in the error of the prediction model, and the MAPE indicates the difference between the predicted and the actual traffic data. In contrast, the MAE and RMSE provide a measure of the similarity between the predicted and the actual traffic data [12]. The MAE, MAPE, and RMSE are defined as follows: 1 ÕM Õ N Õ S L œ L M AE = vol tcj+1 (s) − vol tcj+1 (s) (13) M × N × S i =1 j =1 s =1 v u u   Figure 2: Spatial distribution of traffic data in the Beijing u u œ u u L L u t Õ M Õ N Õ S vol tcj+1 (s) − vol tcj+1 (s) and PeMS data set. 1 MAP E = (14) M × N × S i =1 j=1 s =1 œ L 3.2 Variable estimation vol tcj+1 (s) 3.2.1 Determining the optimal distance function. The distance func- v u t Õ M Õ S  N Õ 2 tion is used to measure the similarity among the spatiotemporal 1 œ RMS E = L L vol tcj+1 (s) − vol tcj+1 (s) (15) state matrices to obtain the historical spatiotemporal matrix, M × N × S i =1 j=1 s =1 which is similar to the spatiotemporal state matrix of the target where M is the number of buckets M = 6, N is the number road segment. Fig. 3 shows the performance differences of the dis- of predicted road segments, S is the number of test samples, tance function constructed with different weights. The traditional L œ L method directly calculates the Euclidean distance between two voltcj+1 (s) and voltcj+1 (s) indicate the actual traffic data and the spatiotemporal state matrices, which treats the elements in the predicted traffic data at the next time step of the jth predicted space state matrix equally. The influence of the historical traffic road segment at the current time step, and s indicates the sth test conditions of different time and space distribution on the pre- sample in the ith bucket. diction of future traffic conditions is difficult to describe, which 137 yields the lowest performance. The distance function constructed performance of the prediction model changes with Kbki (refer by the Gaussian function assigns weights in the time dimension to Section 2.4). Because the impact of parameter Kbki on the and space dimension; thus, the performance of the prediction prediction performance was discussed in Section 2.4, this section model is significantly improved. However, this method requires focuses on the calibration of parameter abki . additional introduction of the time-weighted parameter α 1 and Fig. 5 shows the impact of changes in abki on the performance the space-weighted parameter α 2 in the construction process, of the D-ST-KNN model in different buckets. The trend in Fig. 5 which makes calibration of its parameters and the global optimal reveals that the value of abki has a significant influence on the combination of parameters difficult. We adopt a similar strategy prediction performance. For the minimum abki , the prediction that uses the linear time distribution weight in the time dimen- error of the model attains the maximum abki . As abki increases, sion and the spatial correlation between the surrounding road the prediction error gradually decreases and begins to stabilize. segments and the target road segment to assign weights in spatial We compare the variation of the parameters among the different dimensions. Then, a dynamic spatiotemporal weight assignment buckets. For example, in bucket 1, the optimal value of abk1 is method is constructed that does not require any additional pa- 0.017, whereas the optimal value of abk2 in bucket 2 is 0.015. The rameters. The dynamic weight distribution has the lowest MAPE, value of abki also changes dynamically over time. Considering RMSE and MAE, which reflects the high efficiency of the method that Kbki also changes dynamically with time, the parameters of compared to that of the other two weight distribution methods. the D-ST-KNN model change with time. The calibration results of the entire model are listed in Table 2, and the values of Kbki are shown in Fig. 5. In this analysis, setting the global fixed param- 17.00 0.16 0.115 16.50 0.110 0.15 eters is unreasonable when constructing the prediction model. 16.00 0.105 0.14 0.100 15.50 MAPE(%) We propose the concept of the data bucket, and the prediction 0.095 RMSE MAE 15.00 0.13 0.090 14.50 model is constructed in different time periods, which causes the 0.12 0.085 14.00 0.080 0.11 13.50 model parameters to change with the time period to adapt to the 0.075 13.00 0.10 0.070 dynamic nature of traffic. Origin Gaussian Space-Time Origin Gaussian Space-Time Origin Gaussian Space-Time Figure 3: Comparison of different distance functions Table 2: Calibration results of the model parameters 3.2.2 Determining the optimal predictive function. Based on the discussion in the previous sections, we transform four types of Parameters Bucket weight distribution methods, including equal weight, inverse Kbki abki distance weight, rank-based weight, and Gaussian weight, which Bucket 1 27 0.017 are used to integrate the candidate neighbors to obtain the final Bucket 2 23 0.015 predicted value. In the process of cross-validation, we fix the other Bucket 3 28 0.011 parameters of the model, such as Kbki and abki , and calculate Bucket 4 18 0.017 the influence of different weight distribution methods on the Bucket 5 17 0.014 prediction accuracy of the D-ST-KNN model to obtain the average Bucket 6 25 0.019 error of the entire test data set for different weight distribution methods. The results are shown in Fig. 4. The MAPE, RMSE and MAE of the Gaussian weight method are lower than the MAPE, 15.0 Bucket1 18.5 Bucket2 15.0 Bucket3 RMSE and MAE of the other three weight distribution methods. 14.0 14.5 18.0 14.0 MAPE(%) 13.0 In the D-ST-KNN model, we employ the Gaussian function as MAPE(%) 17.5 13.5 MAPE(%) 12.0 13.0 17.0 12.5 the weight distribution method for candidate neighbors. 11.0 16.5 12.0 10.0 11.5 16.0 11.0 9.0 15.5 10.5 0 0.01 0.02 0.03 0.04 0 0.01 0.02 0.03 0.04 0 0.01 0.02 0.03 0.04 13.75 0.116 0.0850 a a a 13.70 0.115 0.0845 13.65 Bucket4 Bucket5 Bucket6 MAPE(%) 0.114 RMSE MAE 13.60 0.0840 18.0 20.5 14.5 0.113 17.5 20.0 13.55 14.0 0.0835 17.0 19.5 13.5 0.112 MAPE(%) MAPE(%) MAPE(%) 13.50 16.5 19.0 13.0 16.0 18.5 13.45 0.111 0.0830 18.0 12.5 15.5 Equal Inverse distance Equal Inverse distance Equal Inverse distance 15.0 17.5 12.0 14.5 17.0 11.5 Rank-based Gaussian Rank-based Gaussian Rank-based Gaussian 14.0 16.5 11.0 0 0.01 0.02 0.03 0.04 0 0.01 0.02 0.03 0.04 0 0.01 0.02 0.03 0.04 a a a Figure 4: Comparison of different weight allocation meth- ods. Figure 5: Impact of the weight parameter abki on MAPEs for different data buckets. 3.2.3 Calibrating hyper-parameters In the D-ST-KNN model, the hyper-parameters primarily include the number of candidate neighbors Kbki and the Gaussian weight parameter abki . In the 3.3 Accuracy evaluation parameter calibration process, to find the best combination of 3.3.1 Overall results. Based on the variable estimation, we compare Kbki and abki that enables the prediction model to obtain the our model with several existing traffic prediction models, includ- minimum MAPE, we set the range of Kbki to [1, 40] and the range ing the historical average model (HA), Elman neural network of abki to [0.001, 0.04]. We apply the cross-validation method to (Elman-NN) [9], traditional KNN model (Original-KNN), and spa- obtain the optimal combination of the parameters for each bucket. tiotemporal KNN model (ST-KNN). Fig. 6 shows the prediction The effect of parameter variation on the prediction accuracy of performance of different models. The HA model, the Elman-NN the D-ST-KNN model can be tested by fixing other parameters of model, and the Original-KNN model regard the problem of the the model. For example, we can fix the values of abki and test the traffic prediction as a simple time series problem and disregard 138 the influence of the spatial factors on the predicted road segment. 31 29 Therefore, their prediction performance is lower than the predic- 27 tion performance of the ST-KNN model and the D-ST-KNN model 25 proposed in this paper by comparing the values of MAPE. The ST- 23 MAPE(%) KNN model introduces the spatiotemporal state matrix, which 21 19 improves the prediction performance of the model. However, 17 this matrix ignores the spatial heterogeneity and the temporal 15 non-stationarity of the road network and cannot describe the 13 11 essential characteristics of the traffic dynamics using a static 9 ST- KNN model (including global fixed spatiotemporal matrix Bucket1 Bucket2 Bucket3 Bucket4 Bucket5 Bucket6 and global fixed parameters). The D-ST-KNN model constructs models for different time periods by introducing the concept HA Elman-NN Original-KNN ST-KNN D-ST-KNN of data buckets. Simultaneously, the dynamic space neighbor, dynamic time window, dynamic spatiotemporal weight, and dy- Figure 7: Accuracy comparison of the D-ST-KNN model on namic spatiotemporal parameters are introduced to construct the MAPEs for different data buckets. D-ST-KNN model, which can adequately adapt to the dynamic changes of traffic conditions. The experimental results indicate that the D-ST-KNN model proposed in this paper is superior to 3.4 Generalization ability evaluation other models. To evaluate the generalization ability of the D-ST-KNN model, we fix all parameters of the model and compare the performance 21 HA 0.22 HA 0.125 HA of the different methods with the test data set from PeMS; the experimental results are shown in Fig. 8. The results indicate Elman-NN Elman-NN 0.120 Elman-NN 20 Original-KNN 0.20 Original-KNN Original-KNN 0.115 that the prediction accuracy of the D-ST-KNN model on the 19 ST-KNN ST-KNN ST-KNN 0.18 0.110 18 D-ST-KNN D-ST-KNN D-ST-KNN MAPE(%) PeMS data set is significantly improved compared with that of 0.105 RMSE MAE 17 0.16 0.100 the Beijing floating car data set. The data quality of the PeMS 16 0.14 0.095 15 0.090 data set is relatively complete, and the data collection area is 0.12 14 0.085 the expressway. Compared with the traffic conditions of the 13 0.10 0.080 urban road network, the traffic mode is relatively simple with Figure 6: Accuracy comparison of different models in the minimal changes, which enables the prediction model to easily Beijing data set. represent the regular traffic pattern characteristics. However, the D-ST-KNN model maintains the same prediction trend; in all 3.3.2 Local results. To further evaluate the performance of the predicted models, its MAPE, RMSE, and MAE are lower than the D-ST-KNN model, we compare the MAPEs of different models in other models, which exhibit excellent predictive performance different data buckets by averaging the prediction performance and generalization ability. of different road segments in a single bucket. The experimen- tal results are displayed in Fig. 7. In terms of overall trends, the 8 HA 0.08 HA 0.070 HA Elman-NN Elman-NN Elman-NN 7 Original-KNN 0.07 Original-KNN Original-KNN performance of different models corresponds to the degree of con- 0.060 ST-KNN ST-KNN ST-KNN 6 0.06 D-ST-KNN D-ST-KNN 0.050 D-ST-KNN gestion of the traffic conditions. For example, in bucket 1, bucket MAPE(%) 5 0.05 RMSE MAE 0.040 3, and bucket 6, all models have a lower MAPE than other buck- 4 0.04 0.030 3 0.03 ets because the time periods that correspond to the three data 2 0.02 0.020 buckets are midnight-6:30 am, 10:00-13:30, and 20:30-midnight. 1 0.01 0.010 The traffic in Beijing during these three time periods belongs to the flat peak period, and road traffic has low congestion and Figure 8: Performance comparison of different models in exhibits regular changes. In buckets 2, 4, and 5, all models achieve the PeMS data set. a relatively poor performance. Bucket 2 corresponds to the time period of 6:30-10:00, bucket 4 corresponds to the time period of 13:30-17:00 and bucket 5 corresponds to the time period of 4 SUMMARY AND FUTURE WORK 17:00-20:30. These time periods correspond to the peak period in In this paper, we propose a D-ST-KNN model for short-term traffic Beijing. The changes of traffic conditions during these time peri- prediction. The proposed model considers the spatial heterogene- ods are more complicated than the traffic conditions of the other ity and temporal non-stationarity of road networks to adapt to buckets. In addition, in terms of the performance of different the dynamic characteristics of traffic, including dynamic spatial models in a single data bucket, the prediction trend of different neighbors, time windows, spatiotemporal weights, and spatiotem- models was similar to those of the overall results. For example, poral parameters. With cross-correlation and autocorrelation in bucket 1, the ST-KNN and D-ST-KNN models perform better function computation, the automatic selections of spatial neigh- than the HA, Elman-NN, and Original-KNN models, which is due bors and the time window are realized, which efficiently solve to the benefits of the introduction of spatial factors. However, the dimensionality disaster problem encountered in the existing the D-ST-KNN model considers the spatial heterogeneity and KNN models. The spatiotemporal weights are integrated into temporal non-stationarity of road networks to adapt to the dy- a distance function to help identify candidate neighbors. Time namic characteristics of traffic, making the model performance variable parameters are also introduced, including the dynamic better than other models in all time periods, especially in the peak number of candidate neighbors and dynamic weight allocation period. This also explains why the D-ST-KNN model is superior parameters, to further adapt to the dynamic and heterogeneous to the other models in the overall result. nature of road networks. 139 Using real traffic data collected from city roads and inter-city [11] Kamran Ghavamifar. 2009. A decision support system for project delivery expressways, we calculate the number of spatial neighbors and method selection in the transit industry. (2009). [12] Filmon G. Habtemichael and Mecit Cetin. 2016. Short-term traffic flow rate the time window size of each road segments, which reflects the forecasting based on identifying similar traffic patterns. Transportation Re- distinct heterogeneity and non-stationarity of urban road traffic. search Part C 66 (2016), 61–78. [13] Filmon G. Habtemichael, Mecit Cetin, and Khairul A. Anuar. 2015. METHOD- Then, we validate the performance of the proposed D-ST-KNN OLOGY FOR QUANTIFYING INCIDENT-INDUCED DELAYS ON FREEWAYS model with comparisons to HA, Elman-NN, traditional KNN and BY GROUPING SIMILAR TRAFFIC PATTERNS. Transportation Research Record spatiotemporal KNN models. The experimental results indicate Journal of the Transportation Research Board (2015). [14] James Douglas Hamilton. 1994. Time series analysis. 401–409 pages. that the D-ST-KNN model has a higher accuracy on short-term [15] Haikun Hong, Wenhao Huang, Xiabing Zhou, Sizhen Du, Kaigui Bian, and traffic prediction than the existing models. In addition, we ex- Kunqing Xie. 2016. Short-term traffic flow forecasting: Multi-metric KNN plore the local performance of different models in different data with related station discovery. In International Conference on Fuzzy Systems and Knowledge Discovery. 1670–1675. buckets and find that all models correspond to the degree of traf- [16] Xiaoyu Hou, Yisheng Wang, and Siyu Hu. 2013. Short-term Traffic Flow fic congestion, and the D-ST-KNN model performs better than Forecasting based on Two-tier K-nearest Neighbor Algorithm ąî. Procedia - Social and Behavioral Sciences 96 (2013), 2529–2536. other models in all time periods, especially in the morning pe- [17] Myung Jiwon, Dong Kyu Kim, Seung Young Kho, and Chang Ho Park. 2011. riod and evening peak period. To summarize, compared with the Travel Time Prediction Using k Nearest Neighbor Method with Combined existing models, the proposed D-ST-KNN model significantly im- Data from Vehicle Detector System and Automatic Toll Collection System. Transportation Research Record Journal of the Transportation Research Board proves the accuracy of short-term traffic prediction. Furthermore, 20, 2256 (2011), 51–59. we compare the performance of different models using the ac- [18] Arief Koesdwiady, Ridha Soua, and Fakhreddine Karray. 2016. Improving tual traffic data collected from PeMS. The D-ST-KNN model also Traffic Flow Prediction With Weather Information in Connected Cars: A Deep Learning Approach. IEEE Transactions on Vehicular Technology 65, 12 (2016), achieves the best performance, which verifies the generalization 9508–9517. ability of the proposed model. [19] Shuangshuang Li, Zhen Shen, and Gang Xiong. 2012. A k-nearest neighbor locally weighted regression method for short-term traffic flow forecasting. In In the follow-up study, the following problems need to be International IEEE Conference on Intelligent Transportation Systems. 1596–1601. investigated to further improve the D-ST-KNN model. The D- [20] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang. 2017. Learning Traffic as ST-KNN model behaves slightly differently in peak and off-peak Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction. Sensors 17, 4 (2017). time periods. Further improvement of the model performance [21] Wanli Min and Laura Wynter. 2011. Real-time road traffic prediction with during peak hours will be a constant challenge. Moreover, a multi- spatio-temporal correlations. Transportation Research Part C: Emerging Tech- threaded approach could be used to improve the efficiency of nologies 19, 4 (2011), 606–616. [22] Brian Lee Smith. 1997. Forecasting freeway traffic flow for intelligent trans- D-ST-KNN. A parallel P-D-ST-KNN model on an existing parallel portation systems application. Transportation Research Part A 1, 31 (1997), computing framework is expected to alleviate the pressure of 61. [23] Brian L Smith, Billy M Williams, and R Keith Oswald. 2002. Comparison of real-time computation. parametric and nonparametric models for traffic flow forecasting. Transporta- tion Research Part C: Emerging Technologies 10, 4 (2002), 303–321. 5 ACKNOWLEDGMENTS [24] Anthony Stathopoulos and Matthew G. Karlaftis. 2003. A multivariate state space approach for urban traffic flow modeling and prediction. Transportation This research is supported by the Key Research Program of the Research Part C Emerging Technologies 11, 2 (2003), 121–135. Chinese Academy of Sciences (Grant No. ZDRW-ZS-2016-6-3) [25] Eleni I. Vlahogianni, Matthew G. Karlaftis, and John C. Golias. 2007. Spatio- Temporal Short-Term Urban Traffic Volume Forecasting Using Genetically and the State Key Research Development Program of China Optimized Modular Networks. Computer-Aided Civil and Infrastructure Engi- (Grant No. 2016YFB0502104). Their supports are gratefully ac- neering 22, 5 (2007), 317ĺC325. knowledged. And we also thank the anonymous referees for their [26] Shanhua Wu, Zhongzhen Yang, Xiaocong Zhu, and Bin Yu. 2014. Improved k- nn for Short-Term Traffic Forecasting Using Temporal and Spatial Information. helpful comments and suggestions. Journal of Transportation Engineering 140, 7 (2014), 04014026. [27] Dawen Xia, Binfeng Wang, Huaqing Li, Yantao Li, and Zili Zhang. 2016. A REFERENCES distributed spatialĺCtemporal weighted model on MapReduce for short-term traffic flow forecasting. Neurocomputing 179, C (2016), 246–263. [1] J Scott Armstrong. 2006. Findings from evidence-based forecasting: Methods [28] Bin Yu, Xiaolin Song, Feng Guan, Zhiming Yang, and Baozhen Yao. 2016. k- for reducing forecast error. International Journal of Forecasting 22, 3 (2006), Nearest neighbor model for multiple-time-step prediction of short-term traffic 583–598. condition. Journal of Transportation Engineering 142, 6 (2016), 04016018. [2] Brenda I. Bustillos and Yi Chang Chiu. 2011. Real-Time Freeway-Experienced [29] Yang Yue and Anthony Gar-On Yeh. 2008. Spatiotemporal traffic-flow de- Travel Time Prediction Using N -Curve and k Nearest Neighbor Methods. pendency and short-term traffic forecasting. Environment and Planning B: Transportation Research Record Journal of the Transportation Research Board Planning and Design 35, 5 (2008), 762–771. 2243, -1 (2011), 127–137. [30] Lun Zhang, Qian Rao, Wenchen Yang, and Meng Zhang. 2013. An Improved [3] Pinlong Cai, Yunpeng Wang, Guangquan Lu, Peng Chen, Chuan Ding, and k-NN Nonparametric Regression-Based Short-Term Traffic Flow Forecasting Jianping Sun. 2016. A spatiotemporal correlative k -nearest neighbor model Model for Urban Expressways. In International Conference on Transportation for short-term traffic multistep forecasting. Transportation Research Part C Engineering. 1214–1223. Emerging Technologies 62 (2016), 21–34. [31] Zuduo Zheng and Dongcai Su. 2014. Short-term traffic volume forecasting: [4] H. Chang, Y. Lee, B. Yoon, and S. Baek. 2012. Dynamic near-term traffic flow A k -nearest neighbor approach enhanced by constrained linearly sewing prediction: systemoriented approach based on past experiences. Iet Intelligent principle component algorithm. Transportation Research Part C Emerging Transport Systems 6, 3 (2012), 292–305. Technologies 43 (2014), 143–157. [5] Tao Cheng, James Haworth, and Jiaqiu Wang. 2012. Spatio-temporal autocor- relation of road network data. Journal of Geographical Systems 14, 4 (2012), 389–413. [6] Tao Cheng, Jiaqiu Wang, James Haworth, Benjamin Heydecker, and Andy Chow. 2014. A Dynamic Spatial Weight Matrix and Localized SpaceĺCTime Au- toregressive Integrated Moving Average for Network Modeling. Geographical Analysis 46, 1 (2014), 75ĺC97. [7] Stephen Clark. 2003. Traffic Prediction Using Multivariate Nonparametric Regression. Journal of Transportation Engineering 129, 2 (2003), 161–168. [8] Peibo Duan, Guoqiang Mao, Shangbo Wang, Changsheng Zhang, and Bin Zhang. 2016. STARIMA-based Traffic Prediction with Time-varying Lags. (2016). [9] Jeffrey L Elman. 1990. Finding structure in time. Cognitive Science 14, 2 (1990), 179–211. [10] Gaetano Fusco, Chiara Colombaroni, and Natalia Isaenko. 2016. Short-term speed predictions exploiting big data on large urban road networks. Trans- portation Research Part C: Emerging Technologies 73 (2016), 183–201. 140