INTRODUCTION

Short-Term Trafic Forecasting: A Dynamic ST-KNN Model Considering Spatial Heterogeneity and Temporal Non-Stationarity

0 Shifen Cheng, Feng Lu State Key Lab of Resources and Environment Information System Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences 11A , Datun Road, Chaoyang District, Beijing 100101 , P. R. China

133 140

Accurate and robust short-term trafic forecasting is a critical issue in intelligent transportation systems and real-time trafic related applications. Existing short-term trafic forecasting approaches are used to adopt global and static model structures and assume the trafic correlations between adjacent road segments within assigned time periods. Due to the inherent characteristics of spatial heterogeneity and temporal non-stationarity of city trafic, it is rather dificult for these approaches to obtain stable and satisfying results. To overcome the problems of static model structures and quantitatively unclear spatiotemporal dependency relationships, this paper proposes a dynamic spatiotemporal knearest neighbor model, named D-ST-KNN, for short-term trafic forecasting. It comprehensively considers the spatial heterogeneity and temporal non-stationarity of city trafic with dynamic spatial neighbors, time windows, spatiotemporal weights and other parameters. First, the sizes of spatial neighbors and the lengths of time windows for trafic influence are automatically determined by cross-correlation and autocorrelation functions, respectively. Second, dynamic spatiotemporal weights are introduced into the distance functions to optimize the search mechanism. Then, dynamic spatiotemporal parameters are established to adapt the continuous change in trafic conditions, including the dynamic number of candidate neighbors and dynamic weight allocation parameters. Finally, the D-ST-KNN model is evaluated using two vehicular speed datasets collected on expressways in California, U.S. and city roads in Beijing, China. Four traditional prediction models are compared with the D-ST-KNN model in terms of the forecasting accuracy and the generalization ability. The results demonstrate that the D-ST-KNN model outperforms existing models in all time periods, especially in the morning period and evening peak period. In addition, the generalization ability of the D-ST-KNN model is also proved.

INTRODUCTION

Short-term trafic forecasting, which has an important role in intelligent transportation systems, enables trafic managers to formulate reasonable and eficient strategies for alleviating trafic congestion and optimizing trafic assignments. Short-term trafic forecasting also enables the public to achieve accurate vehicular path planning [ 29 ][ 10 ].

In the past few decades, researchers have proposed several short-term trafic forecasting models that can be divided into two categories: parametric models and nonparametric models. A parametric model uses an explicit parametric function to quantify the relationship between historical trafic data and predicted trafic data. Considering the stochastic and nonlinear characteristics of trafic, constructing a mathematical model with high accuracy for characterizing trafic characteristics in practice is dificult [ 1 ]. Nonparametric models, such as data-driven methods, do not require a priori knowledge and explicit expression of mechanism; thus, they are more suitable for short-term trafic forecasting problems [ 22 ] [ 23 ] [ 7 ] [ 31 ].

As a typical nonparametric method, the k-nearest neighbors (KNN) model has received considerable attention. Many scholars have successfully applied the traditional KNN model to shortterm trafic prediction [ 2 ][ 17 ][ 19 ][ 4 ][ 16 ][ 30 ][ 28 ]. Considering that the evolution of trafic is a spatiotemporal interaction process, trafic conditions of road segments are spatially and mutually afected [ 6 ]. Therefore, spatiotemporal relationships between multiple road segments in road networks are considered to improve trafic prediction [ 25 ][ 21 ][ 20 ]. Based on the traditional KNN model, [ 26 ] realized an enhanced model with the support of spatiotemporal information and argued that it achieves better performance than the model that employs only temporal information. [ 27 ] considered upstream and downstream trafic information and proposed a distributed architecture of a spatiotemporalweighted KNN model for short-term trafic prediction. [ 3 ] employed a spatiotemporal state matrix instead of the traditional time series to describe the trafic state while using a Gaussian weight distance to select the nearest neighbor to improve the KNN model. However, the disadvantages of these ST-KNNs are that the spatiotemporal relation cannot be accurately quantified, which is primarily reflected in the modeling process, the size of the spatial dimension m and the length of time window n of the state space cannot be automatically determined, and some values are artificially set. For example, for m=3, three adjacent road segments are selected; for n=2, the historical data of the first two time steps of the current time step are used to construct samples. When the time series problem is transformed into a supervised machine learning problem, the values of m and n determine the number of selected features. Therefore, manually engineered features can easily cause dimensional disaster prevent the guarantee of the prediction accuracy of the model [ 15 ].

The prediction model is usually static, thus, it cannot describe the characteristics of the dynamic change in trafic, which are primarily reflected in the following three aspects: 1) existing studies usually assume that the spatial neighbors and time windows are globally fixed, which indicates that once the number of road segments m associated with the predicted road segment and the length of the time window n are determined, they do not change in the spatiotemporal range. Considering the dynamic characteristics of an urban road network, trafic flow in the road network is not a static point but is a moving process from one location to another location. The spatial neighbors of the road segment primarily rely on the current trafic conditions. The number of spatial neighbors is very small if trafic congestion exists but is large during flat peak periods [ 5 ]. From the perspective of urban road network heterogeneity, the number of relevant road segments for diferent road segments also difers; thus, sharing parameter m is dificult in the entire spatial range [ 29 ]. The selection of a time window based on a time series is used to determine the length of the historical trafic data to match similar trafic patterns. The trafic data in the historical time step and the current time step must be relevant in the selection process [ 18 ]. Due to the dynamic and heterogeneous nature of the road network, even the same road segment, a significant diference is observed in the time series of trafic data in diferent time periods (such as morning and evening peak periods). That causes the selection of the time window to be dynamic [ 8 ]. Thus, the spatial neighbors and time windows that dynamically change over time and space are not easily described with globally fixed spatiotemporal state matrices; thus, there is a need for a dynamic spatiotemporal KNN model to adapt to the characteristics of trafic changes. 2) Existing research considers that diferent historical data for diferent time periods have diferent contributions to the prediction of future trafic conditions. When calculating the distance between two state spaces, the weight distance criterion is usually adopted to assign diferent weights to each component in the state space. The closer the time window is to the predicted time, the larger the allocated weight; the closer the spatial distance is to the predicted road segment, the greater the assigned weight [ 3 ]. However, dynamic changes in the spatial neighbor and the time window not only afect the dimension of the space-time matrix but also cause the intensity of the correlation among diferent positions to dynamically change over time. Therefore, the influence of diferent components of trafic data is dificult to characterize with global ifxed spatiotemporal weight matrix. 3) To determine the value of the number of similar state spaces K, researchers usually employ a cross-validation method to select a suitable value, then share in the entire range of space and time[ 26 ] [ 28 ]. Due to the diference in trafic patterns in the diferent time periods and space locations, the global fixed value of K cannot adapt to the dynamic and heterogeneous nature of a road network.

The key to short-term trafic forecasting models is the efective use of the potential spatiotemporal dependencies in the trafic data. The existing KNN models usually assume that the trafic change is a static point process and often disregard its important dynamics and heterogeneous characteristics. As a result, the structure of the prediction model is usually globally fixed in time and space, including the globally fixed spatial neighbor, time window, spatiotemporal weights, and spatiotemporal parameters, such as the traditional KNN model and the spatiotemporal KNN model.

In this paper, we propose a dynamic spatiotemporal KNN model (D-ST-KNN) for short-term trafic prediction considering spatial heterogeneity and temporal non-stationarity of city trafifc. First, we investigated the autocorrelation of road trafic to determine the time window required for the trafic data. Second, we used the cross-correlation among diferent road segments to analyze the spatiotemporal dependencies of trafic and build a dynamic spatial neighbor for each road segment. The dynamic spatiotemporal state matrix is obtained by the dynamic spatial neighbor and the dynamic time window instead of the traditional time series or the static spatiotemporal matrix to characterize the state space. Finally, we introduced the dynamic spatiotemporal weight, dynamic spatiotemporal parameters, and Gaussian weight function to improve the KNN model to adapt to the dynamic and heterogeneous characteristics of the trafic.

The remainder of this paper is organized as follows: Section 2 proposes a D-ST-KNN model that considers the spatial heterogeneity and temporal non-stationarity of city road trafic. The construction of the dynamic spatiotemporal state matrix, weights, and other parameters are also introduced in this section. In Section 3, the dynamic characteristics, prediction performance, and computational eficiency of the presented model are comprehensively validated. The experimental results are also discussed. Section 4 concludes the paper and provides an outlook of future work. 2

METHODOLOGY

In this section, we propose a D-ST-KNN model. Our method is divided into five phases: the data bucket partition, state space definition, distance function definition, optimal neighbor selection, and prediction function definition, which corresponds to Sections 2.1-2.5. First, considering the dynamic nature of trafic, the original spatiotemporal data sets are partitioned according to diferent time periods to form diferent data buckets. Second, considering the spatial heterogeneity, each segment of a data bucket is separately processed, and the appropriate spatial neighbors and time windows are selected. The spatiotemporal state matrix is constructed to describe the trafic conditions. Then, we introduce the spatiotemporal weight matrix to define the distance function and measure the distance between the current spatiotemporal state matrix and the historical spatiotemporal state matrix to select the K nearest neighbors. Finally, we integrate these neighbors to obtain the predicted value of the target road segment. 2.1

Data bucket

Considering the non-stationarity and periodicity of trafic data, there are significant diferences in the trafic characteristics among diferent time periods, such as the morning peak period, interpeak period, and evening peak period. In the same period, the trafic data of same road segment has statistical homogeneity and the trafic pattern tends to be stable with periodic changes, such as diferent days for the morning peak period, which results in the spatial neighbor, the time window, and spatiotemporal parameters that can be shared. Therefore, we divide the original trafic data {voltLj , j ∈ [1, N ], t ∈ [t0, tc ]} into diferent time periods to describe the homogeneity in same time period and dynamics in diferent time periods, where t0 and tc represent the start time step and the current time step of the time series, and Lj denotes the jth road segment.

In the study of urban trafic modeling and prediction, to distinguish the diference among the trafic characteristics in diferent time periods, [ 24 ] divided a day into six time periods (period 1: midnight-6:30 am; period 2: 6:30-10:00; period 3: 10:00-13:30; period 4:13:30-17:00; period 5:17:00-20:30; period 6:20:30-midnight). The test reveals that the partition is statistically acceptable. Based on this analysis and according to the same strategy, the original trafic data are divided into M diferent time periods (M= 6) according to the time dimension, which corresponds to diferent data buckets. Assuming that the entire trafic data set is BK, the data bucket division must be satisfied:    bki ∩ bko = ϕ  BK = bk1 ∪ bk2 ∪ ... ∪ bkM  bki = {voltLj |1 ≤ j ≤ N , ∀t ∈ [tabki , tbki b )} (1) bucket 1), and voltLj is the trafic data of road segment where i ∈ [1, M], o ∈ [1, M], i , o, bki is the ith bucket (i.e., Lj at time step t . t ∈ [tabki , tbki ) indicates that time step t is within b the corresponding time period of the ith bucket (i.e.,[0:00-6:30), [6:30-10:00)). Lj denotes the jth road segment (i.e., Link 1), and N is the total number of road segments. Note that dividing the original trafic data into diferent buckets at the pre-processing stage does not have any impact on the analyses and conclusions in this study because the same partitioning strategy were used for all the algorithms that are evaluated. 2.2

Dynamic spatiotemporal state matrix 2.2.1 Dynamic spatial neighborhoods. The dynamic spatial neighborhood is used to determine how the trafic conditions of the predicted road segment are afected by the surrounding road segments in diferent buckets to determine the correlation among road segments. The traditional method usually calculates the correlation coeficients between the time series of the predicted road segments and the time series of other road segments and sets the threshold to select the relevant road segments [ 3 ]. Considering that a road network has multiple internal and external factors, such as the influence of trafic lights, the impact of surrounding road segments on predicted road segments has a certain degree of lag. Therefore, the delayed spatiotemporal relationships cannot be exactly expressed by correlation coeficients. The cross-correlation function is a delayed version of the correlation coeficient function, which measures the correlation coeficients of two time series at a specific lag [ 14 ]; therefore, it is more suitable for describing the spatiotemporal dependence of trafic.

Assume that bki is the bucket of the predicted road segment Lj at time step t, and t ∈ [tabki , tbki ). Given the surrounding b road segments Lv , the time series of the trafic data for two road segments can be expressed as U = {voltLj |∀t ∈ [tabki , tbki )}, b Z = {voltLv |∀t ∈ [tabki , tbki )}, j ∈ [1, N ], v ∈ [1, N ], and their b cross-correlation at lag φ is defined as follows:               cc fub,kzi (φ) = γub,kzi (φ) , φ = 0, ±1, ±2, · · · ,

αu σz γub,kzi (φ) = E (ut − µ u ) zt +φ − uz αu = σz =

q q Í (ut − µ u )

2 Í zt +φ − uz 2 where γub,kzi (φ) is the correlation coeficient between time series U and time series Z at lag φ in bucket bki , µ u and uz are the mean values of U and Z, respectively, and σu and σz are the standard deviations of U and Z, respectively.

In this definition, the cross-correlation function can be regarded as a function of lag, and the lag value that makes the cross-correlation function obtain the maximum value is the average delay time of the surrounding segments to the predicted road segment [ 29 ]. The formal definition is expressed as φ ψbLkvi = arдmax cc fub,kzi (φ) , v ∈ [1, N ] (2) (3) where ψbLkvi is the lag value that maximizes cross-correlation of the surrounding road segment Lv to the predicted road segment in bki , and ψbLkvi describes the maximum impact time range of the surrounding segments in diferent buckets on the predicted road segment, which can be employed for eficient selection of spatial neighbors. Consider the predicted road segment Lj in bki and its predicted time interval ∆ t . When the surrounding road segments deliver the trafic flow to the predicted road segments within a given time interval, they influence the predicted road segments, and the road segments beyond this time interval are excluded. Its formal definition is expressed as

RbLkji ← n Lv |∀0 ≤ ψbLkvi ≤ ∆ t , v ∈ [1, N ] o (4) bki in the ith bucket. where RLj is the set of spatial neighbors of the jth road segment as follows:

2.2.2 Dynamic time windows. Considering that the selection of the time window is based on the time series of the predicted road segment, we can select n historical trafic data that have a correlation with the predicted road segment. The autocorrelation function is usually employed to measure the correlation between the time series and its delayed version; thus, it can be used for the selection of the time window, i.e., the lag in which the prediction error is minimized can be set as the window size. Note that the lag in the autocorrelation function describes the delay efect of the time series, and the lag described in Section 2.2.1 is used to characterize the delay efect between diferent time series. Given the time series of the jth road segment Lj in bki , U = {voltLj |∀t ∈ spatial distances. The construction method is described as follows: assume that the predicted road segment Lj at the current time step tc is in data bucket bki and the dimension of the spatiotemporal state matrix is mLj

bki ×nbLkji , which is determined by the method provided in Section 2.2. Then, the spatiotemporal state matrix of the current time step can be expressed as χtLcj mbLkji , nbLkji .The spatiotemporal matrix of the historical time step hi can be deifned as χ Lj mbLkji , nbLkji , where mLj is the spatial dimension of hi bki the spatiotemporal state matrix of the jth predicted road segment in the ith bucket, which is related to the number of elements in the set of spatial neighbors RbLkji . Moreover, nbLkji is the temporal dimension of the spatiotemporal state matrix of the jth predicted road segment in the ith bucket, which is the size of the time window. The time-weighted matrix is defined as W bki , t and the space-weighted matrix is defined as Wsbki . The corresponding elements are wtbki (ti, t j), ti ∈ [1, nbLkji ], t j ∈ [1, nbLkji ] and wsbki (si, sj), si ∈ [1, mbLkji ], sj ∈ [1, mbLkji ], which represent the time weight value and space weight value, respectively, assigned to the jth predicted road segment in the ith bucket. The weight distribution is as follows:

, wbki (ti, t j) =  ti ÍntibL=kj1i ti, ti = t j

 t

 wsbki (si, sj) =  cc fLsvi , Lj    

0, ti , t j ,

Ímsib=Lkj1i cc fLsvi , Lj , si = sj  0, si , sj 

In this definition, the temporal and spatial weights are linearly distributed according to the proximity of the current time step and the predicted road segments. cc fLsvi , Lj is the cross-correlation between the time series of the si spatial neighbor (whose road segment is Lv ) and the predicted road segment Lj . The closer the value is to the predicted time, the greater the weight of the allocation; the greater the relation to the space of the predicted road segment, the greater the weight. By introducing spatiotemporal weights into the original spatiotemporal matrix, the spatiotemporal-weighted state matrices of the current time step ΓLj and the spatiotemporal-weighted state matrices of the histc torical time step ΓLj are denoted by the following:

hi ΓtLcj = Wsbki × χtLcj mbLkji , nbLkji × Wtbki ΓhLij = Wsbki × χhLij mbLkji , nbLkji × Wtbki

By calculating the distance dbki (ΓtLcj , ΓhLij ) between the historical spatiotemporal state matrix and the current spatiotemporal state matrix, candidate neighbors can be selected. The formula is expressed as dbki ΓtLcj , ΓhLij = r trac ΓLj tc − ΓhLij

tc − ΓhLij ′ × ΓLj (10) where trac represents the trace of the matrix. 2.4

Dynamic spatiotemporal parameters

In the KNN model, the spatiotemporal parameters include the K values and the parameters introduced during the method construction (such as the prediction generation functions). The reasonableness of the parameters has substantial influence on the (6) (7) (8) (9) prediction accuracy of the model. The K value is primarily employed to determine the number of candidate neighbors. If the K value is too small, the model becomes more complex and overfitting is possible. If the K value is too large, the model is simpler and under-fitting is possible. Considering that the selection of the K value is significantly influenced by the finite sample nature of the problem, the assignment of its values is usually performed by cross-validation to select the K value that minimizes the model error [ 27 ].

The existing methods usually assume that the K value is globally fixed. When the K value is determined, it is shared throughout the entire space and time. In contrast to the existing method, the selection of the K value in the D-ST-KNN model considers the characteristics of dynamic changes of trafic. Instead of setting a global fixed K value, we can select the optimal K value for diferent buckets, i.e., Kbki , bki ∈ BK, i] ∈ [1, M].

To verify these assumptions, we use cross-validation to set the range of K to [ 1, 40 ] and test the efect of diferent K values on MAPE of the model in diferent buckets, as shown in Fig. 1. 12.5 12.0 )11.5 (%11.0 E PA10.5 M10.0 9.5 9.0 0 10 2K0 30 40

Bucket4

As the K value increases, the prediction error is gradually reduced. When the K value attains a certain value, the error of the model begins to stabilize. Thus, the optimal K value for each bucket can be determined (i.e., Kbk1 = 27, Kbk2 = 23). Compared with diferent buckets, the K values dynamically vary with diferent time periods. The global fixed K value has dificulty describing the dynamic change in trafic. Therefore, the dynamic K value proposed in this paper is reasonable. The parameters of the D-ST-KNN model also contain the parameters introduced by the predicted generation function (refer to Section 2.5). The calibration method of the parameter is shown in Section 3.2. Due to the spatiotemporal state space, the spatiotemporal weight, and the spatiotemporal parameters dynamically change with different buckets; to adapt to this change, the predictive generation function should also dynamically change. This paper transforms the four types of traditional weight distribution methods to enable them to adapt to the dynamics of trafic, including the inverse distance weight [ 23 ], rank-based weight[ 11 ][ 13 ], and Gaussian weight [ 3 ]. Selecting the best prediction function by comparing the performance of diferent predictive functions (refer to Section 3.2). Note that the weight referred to in this section is expressed as the weight assigned by the candidate neighbor, whereas the weight in Section 2.3 represents the weight matrix of the weights assigned to each element in the spatiotemporal state matrix.

Assuming that dbkki is the distance between the kth candidate neighbor and the predicted road segment in the ith bucket obtained by formula (10), volLj

tc +1 the predicted value of the predicted road segment Lj at time step tc + 1 is defined as voltLcj+1 = ÍkK=bk1i vol Lj

hi +1 (k) × φbLkji (k) ÍKbki φbLkji (k) k=1 (11) where tc ∈ [tabki , tbki ] is used to map the current time step into b the corresponding bucket, is used to determine the number of candidate neighbors for the corresponding bucket, vol Lj hi +1 (k) represents the trafic data of the kth candidate neighbor, and hi ∈ [tabki , tbki ]; and φbLkji (k) and represent the weight of the kth b neighbor of the jth predicted road segment in the ith bucket. The form is defined as follows: 1 Kbki

1 dbkki (Kbki − rq + 1)2         φbLkji (k) =  (12)

 4π a1bki exp(− 4dabkbkkii 22 )

Formula (12) corresponds to equal weights, inverse distance weights, the rank-based weight and the Gaussian weight, where rq represents the order of the qth candidate neighbors, and abki is the spatiotemporal parameter whose value is similar to the value of the previously discussed spatiotemporal parameter K, which dynamically values with diferent time periods. The corresponding parameter calibration is shown in Section 3.2.

2.6 Accuracy metrics

Three criteria are selected to verify the prediction accuracy of the D-ST-KNN model, namely, mean absolute error (MAE), mean absolute percentage error (MAPE) and root-mean-square error (RMSE). These indicators depict the essential characteristics of errors from diferent perspectives. The RMSE indicates a fluctuation in the error of the prediction model, and the MAPE indicates the diference between the predicted and the actual trafic data. In contrast, the MAE and RMSE provide a measure of the similarity between the predicted and the actual trafic data [ 12 ]. The MAE, MAPE, and RMSE are defined as follows:

M AE = v u u u u u u u M AP E = t RM S E = v u t 1 1 1

M N S Õ Õ Õ M × N × S i=1 j=1 s=1 voltLcj+1 (s ) − voltc+1 (s )

Lj M × N × S i=1 j=1 s=1 M × N × S i=1 j=1 s=1

M N S Õ Õ Õ M N S Õ Õ Õ voltLcj+1 (s ) − voltc+1 (s )

Lj voltc+1 (s ) voltLcj+1 (s ) − voltc+1 (s )

Lj 2 (13) (14) (15) where M is the number of buckets M = 6, N is the number of predicted road segments, S is the number of test samples, voltLcj+1 (s) and voltLcj+1 (s) indicate the actual trafic data and the predicted trafic data at the next time step of the jth predicted road segment at the current time step, and s indicates the sth test sample in the ith bucket.

3 EXPERIMENTS 3.1 Data preparation

In this study, two diferent data sets are used to evaluate the performance of the prediction model. The first data set is PeMS, which is a high-quality data set with open access. PeMS is extensively applied in the field of trafic prediction. The trafic speed data from 59 consecutive locations on the US 101 freeway from PeMS were downloaded for a total of 60 days; the time period is August 15, 2016, to October 14, 2016 and time interval is 5 min (as shown in Table 1). Each detector represents a position; the positional distribution is shown in Fig. 2. The second data set is the floating car trajectory data obtained from the Beijing road network, which is generated from more than 50,000 vehicles equipped with GPS. The frequency of data acquisition is 5 min, and the time period is March 1, 2012, to April 30, 2012 (as shown in Table 1). In this study, a representative region that contains 30 road segments is used for the experiment with the position distribution shown in Fig. 2. In the two data sets, the last ten days are used as the test data to evaluate the accuracy of the model. The remaining days of data are employed as training data to construct the historical database of the predicting model.

In addition, we normalize the original trafic data and use the ratio of the average trafic speed to the maximum speed limit of each road segment to express the trafic conditions of the road segment. The formal expression is as follows: vdi,t = vi,t , i ∈ [1, N ], t ∈ [t0, tc ] fi,max (16) where vdl,t is the normalized speed of the ith road segment at time step t, vi,t is the real average speed data of the road segment, and fi,max is the speed limit for the ith road segment. yields the lowest performance. The distance function constructed by the Gaussian function assigns weights in the time dimension and space dimension; thus, the performance of the prediction model is significantly improved. However, this method requires additional introduction of the time-weighted parameter α1 and the space-weighted parameter α2 in the construction process, which makes calibration of its parameters and the global optimal combination of parameters dificult. We adopt a similar strategy that uses the linear time distribution weight in the time dimension and the spatial correlation between the surrounding road segments and the target road segment to assign weights in spatial dimensions. Then, a dynamic spatiotemporal weight assignment method is constructed that does not require any additional parameters. The dynamic weight distribution has the lowest MAPE, RMSE and MAE, which reflects the high eficiency of the method compared to that of the other two weight distribution methods. 3.2.2 Determining the optimal predictive function. Based on the discussion in the previous sections, we transform four types of weight distribution methods, including equal weight, inverse distance weight, rank-based weight, and Gaussian weight, which are used to integrate the candidate neighbors to obtain the final predicted value. In the process of cross-validation, we fix the other parameters of the model, such as Kbki and abki , and calculate the influence of diferent weight distribution methods on the prediction accuracy of the D-ST-KNN model to obtain the average error of the entire test data set for diferent weight distribution methods. The results are shown in Fig. 4. The MAPE, RMSE and MAE of the Gaussian weight method are lower than the MAPE, RMSE and MAE of the other three weight distribution methods. In the D-ST-KNN model, we employ the Gaussian function as the weight distribution method for candidate neighbors.

3.2.3 Calibrating hyper-parameters In the D-ST-KNN model, the hyper-parameters primarily include the number of candidate neighbors Kbki and the Gaussian weight parameter abki . In the parameter calibration process, to find the best combination of Kbki and abki that enables the prediction model to obtain the minimum MAPE, we set the range of Kbki to [ 1, 40 ] and the range of abki to [0.001, 0.04]. We apply the cross-validation method to obtain the optimal combination of the parameters for each bucket. The efect of parameter variation on the prediction accuracy of the D-ST-KNN model can be tested by fixing other parameters of the model. For example, we can fix the values of abki and test the performance of the prediction model changes with Kbki (refer to Section 2.4). Because the impact of parameter Kbki on the prediction performance was discussed in Section 2.4, this section focuses on the calibration of parameter abki .

Fig. 5 shows the impact of changes in abki on the performance of the D-ST-KNN model in diferent buckets. The trend in Fig. 5 reveals that the value of abki has a significant influence on the prediction performance. For the minimum abki , the prediction error of the model attains the maximum abki . As abki increases, the prediction error gradually decreases and begins to stabilize. We compare the variation of the parameters among the diferent buckets. For example, in bucket 1, the optimal value of abk1 is 0.017, whereas the optimal value of abk2 in bucket 2 is 0.015. The value of abki also changes dynamically over time. Considering that Kbki also changes dynamically with time, the parameters of the D-ST-KNN model change with time. The calibration results of the entire model are listed in Table 2, and the values of Kbki are shown in Fig. 5. In this analysis, setting the global fixed parameters is unreasonable when constructing the prediction model. We propose the concept of the data bucket, and the prediction model is constructed in diferent time periods, which causes the model parameters to change with the time period to adapt to the dynamic nature of trafic. 3.3.1 Overall results. Based on the variable estimation, we compare our model with several existing trafic prediction models, including the historical average model (HA), Elman neural network (Elman-NN) [ 9 ], traditional KNN model (Original-KNN), and spatiotemporal KNN model (ST-KNN). Fig. 6 shows the prediction performance of diferent models. The HA model, the Elman-NN model, and the Original-KNN model regard the problem of the trafic prediction as a simple time series problem and disregard the influence of the spatial factors on the predicted road segment. Therefore, their prediction performance is lower than the prediction performance of the ST-KNN model and the D-ST-KNN model proposed in this paper by comparing the values of MAPE. The STKNN model introduces the spatiotemporal state matrix, which improves the prediction performance of the model. However, this matrix ignores the spatial heterogeneity and the temporal non-stationarity of the road network and cannot describe the essential characteristics of the trafic dynamics using a static ST- KNN model (including global fixed spatiotemporal matrix and global fixed parameters). The D-ST-KNN model constructs models for diferent time periods by introducing the concept of data buckets. Simultaneously, the dynamic space neighbor, dynamic time window, dynamic spatiotemporal weight, and dynamic spatiotemporal parameters are introduced to construct the D-ST-KNN model, which can adequately adapt to the dynamic changes of trafic conditions. The experimental results indicate that the D-ST-KNN model proposed in this paper is superior to other models.

HA Elman-NN Original-KNN ST-KNN D-ST-KNN 21 20 19 )18 % ( E17 PA M16 15 14 13

3.3.2 Local results. To further evaluate the performance of the D-ST-KNN model, we compare the MAPEs of diferent models in diferent data buckets by averaging the prediction performance of diferent road segments in a single bucket. The experimental results are displayed in Fig. 7. In terms of overall trends, the performance of diferent models corresponds to the degree of congestion of the trafic conditions. For example, in bucket 1, bucket 3, and bucket 6, all models have a lower MAPE than other buckets because the time periods that correspond to the three data buckets are midnight-6:30 am, 10:00-13:30, and 20:30-midnight. The trafic in Beijing during these three time periods belongs to the flat peak period, and road trafic has low congestion and exhibits regular changes. In buckets 2, 4, and 5, all models achieve a relatively poor performance. Bucket 2 corresponds to the time period of 6:30-10:00, bucket 4 corresponds to the time period of 13:30-17:00 and bucket 5 corresponds to the time period of 17:00-20:30. These time periods correspond to the peak period in Beijing. The changes of trafic conditions during these time periods are more complicated than the trafic conditions of the other buckets. In addition, in terms of the performance of diferent models in a single data bucket, the prediction trend of diferent models was similar to those of the overall results. For example, in bucket 1, the ST-KNN and D-ST-KNN models perform better than the HA, Elman-NN, and Original-KNN models, which is due to the benefits of the introduction of spatial factors. However, the D-ST-KNN model considers the spatial heterogeneity and temporal non-stationarity of road networks to adapt to the dynamic characteristics of trafic, making the model performance better than other models in all time periods, especially in the peak period. This also explains why the D-ST-KNN model is superior to the other models in the overall result.

Bucket1 Bucket2 Bucket3 Bucket4 Bucket5 Bucket6 HA

Elman-NN Original-KNN ST-KNN D-ST-KNN

To evaluate the generalization ability of the D-ST-KNN model, we fix all parameters of the model and compare the performance of the diferent methods with the test data set from PeMS; the experimental results are shown in Fig. 8. The results indicate that the prediction accuracy of the D-ST-KNN model on the PeMS data set is significantly improved compared with that of the Beijing floating car data set. The data quality of the PeMS data set is relatively complete, and the data collection area is the expressway. Compared with the trafic conditions of the urban road network, the trafic mode is relatively simple with minimal changes, which enables the prediction model to easily represent the regular trafic pattern characteristics. However, the D-ST-KNN model maintains the same prediction trend; in all predicted models, its MAPE, RMSE, and MAE are lower than the other models, which exhibit excellent predictive performance and generalization ability.

8 7 6 )(5 % EP AM4 3 2 1

HA Elman-NN Original-KNN ST-KNN D-ST-KNN 0.08 0.07 0.06 E0.05 S RM0.04 0.03 0.02 0.01

HA Elman-NN Original-KNN ST-KNN D-ST-KNN 0.070 0.060 0.050 EA0.040 M 0.030 0.020 0.010

HA Elman-NN Original-KNN ST-KNN D-ST-KNN

SUMMARY AND FUTURE WORK

In this paper, we propose a D-ST-KNN model for short-term trafic prediction. The proposed model considers the spatial heterogeneity and temporal non-stationarity of road networks to adapt to the dynamic characteristics of trafic, including dynamic spatial neighbors, time windows, spatiotemporal weights, and spatiotemporal parameters. With cross-correlation and autocorrelation function computation, the automatic selections of spatial neighbors and the time window are realized, which eficiently solve the dimensionality disaster problem encountered in the existing KNN models. The spatiotemporal weights are integrated into a distance function to help identify candidate neighbors. Time variable parameters are also introduced, including the dynamic number of candidate neighbors and dynamic weight allocation parameters, to further adapt to the dynamic and heterogeneous nature of road networks.

Using real trafic data collected from city roads and inter-city expressways, we calculate the number of spatial neighbors and the time window size of each road segments, which reflects the distinct heterogeneity and non-stationarity of urban road trafic. Then, we validate the performance of the proposed D-ST-KNN model with comparisons to HA, Elman-NN, traditional KNN and spatiotemporal KNN models. The experimental results indicate that the D-ST-KNN model has a higher accuracy on short-term trafic prediction than the existing models. In addition, we explore the local performance of diferent models in diferent data buckets and find that all models correspond to the degree of trafifc congestion, and the D-ST-KNN model performs better than other models in all time periods, especially in the morning period and evening peak period. To summarize, compared with the existing models, the proposed D-ST-KNN model significantly improves the accuracy of short-term trafic prediction. Furthermore, we compare the performance of diferent models using the actual trafic data collected from PeMS. The D-ST-KNN model also achieves the best performance, which verifies the generalization ability of the proposed model.

In the follow-up study, the following problems need to be investigated to further improve the D-ST-KNN model. The DST-KNN model behaves slightly diferently in peak and of-peak time periods. Further improvement of the model performance during peak hours will be a constant challenge. Moreover, a multithreaded approach could be used to improve the eficiency of D-ST-KNN. A parallel P-D-ST-KNN model on an existing parallel computing framework is expected to alleviate the pressure of real-time computation. 5

ACKNOWLEDGMENTS

This research is supported by the Key Research Program of the Chinese Academy of Sciences (Grant No. ZDRW-ZS-2016-6-3) and the State Key Research Development Program of China (Grant No. 2016YFB0502104). Their supports are gratefully acknowledged. And we also thank the anonymous referees for their helpful comments and suggestions.

[1]

J Scott

Armstrong . 2006 . Findings from evidence-based forecasting: Methods for reducing forecast error . International Journal of Forecasting 22 , 3 ( 2006 ), 583 - 598 .

[2] Brenda

I. Bustillos

and Yi Chang Chiu . 2011 . Real-Time Freeway-Experienced Travel Time Prediction Using N -Curve and k Nearest Neighbor Methods . Transportation Research Record Journal of the Transportation Research Board 2243 , - 1 ( 2011 ), 127 - 137 .

[3]

Pinlong

Cai , Yunpeng Wang, Guangquan Lu, Peng Chen, Chuan Ding, and

Jianping

Sun . 2016 . A spatiotemporal correlative k -nearest neighbor model for short-term trafic multistep forecasting . Transportation Research Part C Emerging Technologies 62 ( 2016 ), 21 - 34 .

[4]

Chang ,

Lee ,

Yoon , and

Baek . 2012 . Dynamic near-term trafic flow prediction: systemoriented approach based on past experiences . Iet Intelligent Transport Systems 6 , 3 ( 2012 ), 292 - 305 .

[5] Tao

Cheng

, James Haworth, and

Jiaqiu

Wang . 2012 . Spatio-temporal autocorrelation of road network data . Journal of Geographical Systems 14 , 4 ( 2012 ), 389 - 413 .

[6] Tao

Cheng

, Jiaqiu Wang, James Haworth, Benjamin Heydecker, and

Andy

Chow . 2014 . A Dynamic Spatial Weight Matrix and Localized SpaceĺCTime Autoregressive Integrated Moving Average for Network Modeling . Geographical Analysis 46 , 1 ( 2014 ), 75ĺC97 .

[7]

Stephen

Clark . 2003 . Trafic Prediction Using Multivariate Nonparametric Regression . Journal of Transportation Engineering 129 , 2 ( 2003 ), 161 - 168 .

[8]

Peibo

Duan , Guoqiang Mao, Shangbo Wang, Changsheng Zhang, and Bin Zhang. 2016 . STARIMA-based Trafic Prediction with Time-varying Lags . ( 2016 ).

[9] Jefrey

Elman . 1990 . Finding structure in time . Cognitive Science 14 , 2 ( 1990 ), 179 - 211 .

[10] Gaetano

Fusco

, Chiara Colombaroni, and

Natalia

Isaenko . 2016 . Short-term speed predictions exploiting big data on large urban road networks . Transportation Research Part C: Emerging Technologies 73 ( 2016 ), 183 - 201 .

[11]

Kamran

Ghavamifar . 2009 . A decision support system for project delivery method selection in the transit industry . ( 2009 ).

[12] Filmon

Habtemichael and Mecit

Cetin . 2016 . Short-term trafic flow rate forecasting based on identifying similar trafic patterns . Transportation Research Part C 66 ( 2016 ), 61 - 78 .

[13] Filmon

Habtemichael , Mecit Cetin, and Khairul

Anuar . 2015 . METHODOLOGY FOR QUANTIFYING INCIDENT-INDUCED DELAYS ON FREEWAYS BY GROUPING SIMILAR TRAFFIC PATTERNS . Transportation Research Record Journal of the Transportation Research Board ( 2015 ).

[14] James

Douglas

Hamilton . 1994 . Time series analysis . 401 -409 pages.

[15] Haikun

Hong

, Wenhao Huang, Xiabing Zhou , Sizhen Du, Kaigui Bian, and Kunqing Xie . 2016 . Short-term trafic flow forecasting: Multi-metric KNN with related station discovery . In International Conference on Fuzzy Systems and Knowledge Discovery . 1670 - 1675 .

[16] Xiaoyu

Hou

Yisheng

Wang , and

Siyu

Hu . 2013 . Short-term Trafic Flow Forecasting based on Two-tier K-nearest Neighbor Algorithm ąî . Procedia - Social and Behavioral Sciences 96 ( 2013 ), 2529 - 2536 .

[17] Myung

Jiwon

, Dong Kyu Kim, Seung Young Kho, and Chang Ho Park . 2011 . Travel Time Prediction Using k Nearest Neighbor Method with Combined Data from Vehicle Detector System and Automatic Toll Collection System . Transportation Research Record Journal of the Transportation Research Board 20 , 2256 ( 2011 ), 51 - 59 .

[18] Arief

Koesdwiady

, Ridha Soua, and

Fakhreddine

Karray . 2016 . Improving Trafic Flow Prediction With Weather Information in Connected Cars: A Deep Learning Approach . IEEE Transactions on Vehicular Technology 65 , 12 ( 2016 ), 9508 - 9517 .

[19]

Shuangshuang

Li ,

Zhen

Shen ,

and Gang

Xiong . 2012 . A k-nearest neighbor locally weighted regression method for short-term trafic flow forecasting . In International IEEE Conference on Intelligent Transportation Systems . 1596 - 1601 .

[20]

Ma ,

Dai ,

He , J. Ma,

Wang , and

Wang . 2017 . Learning Trafic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction . Sensors 17 , 4 ( 2017 ).

[21]

Wanli

Min and

Laura

Wynter . 2011 . Real-time road trafic prediction with spatio-temporal correlations . Transportation Research Part C: Emerging Technologies 19 , 4 ( 2011 ), 606 - 616 .

[22] Brian

Lee

Smith . 1997 . Forecasting freeway trafic flow for intelligent transportation systems application . Transportation Research Part A 1 , 31 ( 1997 ), 61 .

[23] Brian L Smith , Billy M Williams , and R Keith Oswald . 2002 . Comparison of parametric and nonparametric models for trafic flow forecasting . Transportation Research Part C: Emerging Technologies 10 , 4 ( 2002 ), 303 - 321 .

[24]

Anthony

Stathopoulos and

Matthew G.

Karlaftis . 2003 . A multivariate state space approach for urban trafic flow modeling and prediction . Transportation Research Part C Emerging Technologies 11 , 2 ( 2003 ), 121 - 135 .

[25] Eleni

I. Vlahogianni

, Matthew G. Karlaftis,

and John C.

Golias . 2007 . SpatioTemporal Short-Term Urban Trafic Volume Forecasting Using Genetically Optimized Modular Networks . Computer-Aided Civil and Infrastructure Engineering 22 , 5 ( 2007 ), 317ĺC325 .

[26] Shanhua

, Zhongzhen Yang,

Xiaocong

Zhu , and

Bin

Yu . 2014 . Improved knn for Short-Term Trafic Forecasting Using Temporal and Spatial Information . Journal of Transportation Engineering 140 , 7 ( 2014 ), 04014026 .

[27] Dawen

Xia

, Binfeng Wang,

Huaqing

Li ,

Yantao

Li ,

and Zili

Zhang . 2016 . A distributed spatialĺCtemporal weighted model on MapReduce for short-term trafic flow forecasting . Neurocomputing 179, C ( 2016 ), 246 - 263 .

[28] Bin

, Xiaolin Song, Feng Guan,

Zhiming

Yang , and

Baozhen

Yao . 2016 . kNearest neighbor model for multiple-time-step prediction of short-term trafic condition . Journal of Transportation Engineering 142 , 6 ( 2016 ), 04016018 .

[29]

Yang

Yue and

Anthony

Gar-On Yeh . 2008 . Spatiotemporal trafic-flow dependency and short-term trafic forecasting . Environment and Planning B: Planning and Design 35 , 5 ( 2008 ), 762 - 771 .

[30] Lun

Zhang

, Qian Rao,

Wenchen

Yang ,

and Meng

Zhang . 2013 . An Improved k-NN Nonparametric Regression-Based Short-Term Trafic Flow Forecasting Model for Urban Expressways . In International Conference on Transportation Engineering. 1214 - 1223 .

[31]

Zuduo

Zheng and

Dongcai

Su . 2014 . Short-term trafic volume forecasting: A k -nearest neighbor approach enhanced by constrained linearly sewing principle component algorithm . Transportation Research Part C Emerging Technologies 43 ( 2014 ), 143 - 157 .